MODULE 0 — BEFORE THE MACHINE

History and foundations of AI

Before building, it helps to understand where AI came from. Rules, logic, graphs, search, statistics, neural networks, GPUs, tokens and, finally, LLMs. This module prepares the ground — no fake lessons: each one arrives when it is actually written.

Module principle

LLMs did not fall from the sky. They are where many old ideas meet modern scale.

Lesson complete25 minIntrodução histórica

0.1

AI before LLMs

Why language models are a recent chapter in a larger story

▸ Lesson objective

Understand that AI did not begin with modern chatbots: before LLMs, there were decades of attempts based on logic, rules, search, statistics and neural networks.

AI was not born conversing

Today many people first encounter AI in conversational form: a chatbot that answers questions, writes texts, summarises documents and helps with code. This gives the impression that artificial intelligence was always fluid natural language. It was not.

For much of AI's history, the dream was different: to make machines solve problems, follow rules, prove theorems, plan actions, play chess, recognise patterns, or emulate human experts in specific domains.

First major vision: intelligence as symbolic reasoning

One of the earliest strong bets was symbolic AI. The idea was to represent knowledge using symbols, rules and logic. Instead of learning from millions of examples, the machine would receive explicit rules about the world.

This kind of approach matches statements like: if a person is human, they are mortal; Socrates is human; therefore Socrates is mortal. The machine manipulates symbols according to formal rules. It seems elegant — and it is. But the real world tends to be far messier than a logic exercise.

Expert systems: machines with an expert's manual

Then came expert systems: programs that tried to capture the knowledge of doctors, engineers, analysts or other specialists in a large collection of rules.

They could work well in narrow domains when the rules were clear. But they struggled with exceptions, ambiguity, common sense and new situations. It was like trying to fit the whole world into a filing cabinet of rules. Courageous, but slightly insane — in the best academic sense.

Search: intelligence as exploring possibilities

Another important AI tradition is search. Imagine a game, a maze or a planning problem. The machine needs to explore possible paths until it finds a good solution.

Instead of "understanding" the world the way a person does, it can test states: if I do this, I end up there; if I choose another path, I might arrive better. This view was very important in games, planning and problem solving.

Graphs: knowledge as a network of relations

Graphs appear naturally when we want to represent relations: one thing linked to another. People connected in a social network, cities connected by roads, concepts connected by meaning, pages connected by links.

In AI, graphs help represent knowledge and search. If "coffee" is linked to "drink", "caffeine" and "cup", the machine can navigate these relations. This is powerful for explicit knowledge, but still does not by itself solve the complexity of human language.

The statistical turn: learning patterns from data

Over time, one idea gained force: instead of writing all the rules by hand, we can give examples to the machine and let it learn patterns.

This statistical turn shifts the centre of AI. The question changes from "which rules should we write?" to "which patterns appear in the data?". This opens the way for classification, prediction, speech recognition, machine translation and, later, modern language models.

Neural networks: adjustable connections

Neural networks enter this story as systems with many adjustable connections. They receive an input, transform it through layers and produce an output. During training, the weights of these connections are adjusted to reduce error.

It is tempting to say they "mimic the brain", but this metaphor should be used carefully. For this course, it is better to think of them as flexible mathematical machines that learn transformations from examples.

Why LLMs are a meeting of several traditions

Large language models did not emerge from nowhere. They combine several historical threads: statistics to predict patterns, neural networks to learn representations, tokens to transform text into manipulable units, attention to connect parts of the context, and powerful hardware to do all of this at scale.

When we look at an LLM, we are seeing a recent chapter in a larger story. It does not replace all prior AI; it inherits ideas, solves some old problems and creates new ones. A complete package: brilliant, expensive and occasionally dramatic.

[ practice ]

Initial mental map

Write three different ways to imagine "intelligence" in a machine: following rules, searching for paths and learning patterns. Then give a simple example for each.

See expected answer

Following rules: if it is raining, take an umbrella. Searching for paths: find the shortest route on the map. Learning patterns: see many sentences and notice that after "good" often comes "morning". These three ideas appear at different phases of AI.

✓ What you take away

LLMs are recent but are part of a long history. Before them, AI explored rules, logic, expert systems, search, graphs, statistics and neural networks.

Module map

Upcoming lessons appear as a roadmap — they enter the menu once written.

0.1published

AI before LLMs

The first panorama: from rules and search to modern models.

0.2published

Symbolic AI, logic and Prolog

When intelligence seemed to be writing clear rules about the world.

0.3published

Graphs and networked knowledge

How relations, nodes and paths help machines represent the world.

0.4published

Search and planning

Intelligence as exploring paths, choices and possible states.

0.5published

Statistics and learning from data

The turn from writing rules to learning patterns from examples.

0.6published

Neural networks without mysticism

Inputs, layers, weights, error and adjustment — without too much neuro-poetry.

0.7published

Why GPUs matter

Parallelism, scale and the marriage between modern AI and hardware.

0.8published

From words to tokens

Why text must become numeric blocks before becoming learning.

0.9published

Transformers and LLMs

Attention, scale and next-token prediction as the engine of modern language.

0.10published

The full map of AI

Symbolic, statistical and neural: three families that keep coexisting.

Living glossary

Terms we will encounter in this module.

Symbolic AI

Approach that represents knowledge using symbols, rules and explicit logic.

Symbol

A manipulable representation of a thing, idea, category or relation, such as "human", "mortal" or "lives_in".

Fact

A statement the system considers true within a domain, such as "Ana is a doctor".

Rule

A logical relation that allows drawing conclusions, such as "if someone is a doctor, they work in healthcare".

Inference

The process of reaching a conclusion from facts and rules.

Prolog

A logic programming language associated with symbolic AI, based on facts, rules and queries.

Expert system

A program that tries to mimic expert decisions using a rule base.

Search

The process of exploring possibilities until a suitable solution or path is found.

Graph

A structure formed by nodes and connections, useful for representing relations and paths.

Statistics

A set of methods for observing data, estimating patterns and dealing with uncertainty.

Neural network

A mathematical system with adjustable connections that learns transformations from examples.

LLM

Large Language Model: a large language model trained to process and generate text at scale.

Node

In a graph, each thing represented: a person, city, word or concept.

Edge

In a graph, the link between two nodes, usually with a named relation ("is a", "lives in").

Heuristic

A guess that guides search toward promising paths, without guaranteeing the best solution.

Planning

Laying out in advance a sequence of actions that leads from an initial situation to a goal.

Generalization

The ability to be correct on new cases, not only on the examples seen during training.

Overfitting

When the model memorizes training examples too closely and performs poorly on new cases.

Weight

An adjustable number for a connection in a neural network. Training is the process of adjusting the weights (also called parameters).

Training

The process of showing examples, measuring error and adjusting weights repeatedly until the model errs less.

GPU

Hardware that performs thousands of simple calculations in parallel — ideal for training and running neural networks.

Parallelism

Doing many operations at the same time, instead of one after another.

Transformer

A neural network architecture based on attention that processes context in parallel; the foundation of LLMs.

Attention

A mechanism that weights which parts of the context are most relevant to the next decision.

Hallucination

When a model confidently states something incorrect or invented, by generating plausible-sounding text rather than consulting a truth.

History and foundations of AI

Module principle

LLMs did not fall from the sky. They are where many old ideas meet modern scale.

Lesson complete25 minIntrodução histórica

0.1

AI before LLMs

Why language models are a recent chapter in a larger story

▸ Lesson objective

Understand that AI did not begin with modern chatbots: before LLMs, there were decades of attempts based on logic, rules, search, statistics and neural networks.

AI was not born conversing

First major vision: intelligence as symbolic reasoning

Expert systems: machines with an expert's manual

Then came expert systems: programs that tried to capture the knowledge of doctors, engineers, analysts or other specialists in a large collection of rules.

Search: intelligence as exploring possibilities

Another important AI tradition is search. Imagine a game, a maze or a planning problem. The machine needs to explore possible paths until it finds a good solution.

Graphs: knowledge as a network of relations

The statistical turn: learning patterns from data

Over time, one idea gained force: instead of writing all the rules by hand, we can give examples to the machine and let it learn patterns.

Neural networks: adjustable connections

Why LLMs are a meeting of several traditions

[ practice ]

Initial mental map

Write three different ways to imagine "intelligence" in a machine: following rules, searching for paths and learning patterns. Then give a simple example for each.

See expected answer

✓ What you take away

LLMs are recent but are part of a long history. Before them, AI explored rules, logic, expert systems, search, graphs, statistics and neural networks.