MODULE 3 — MODERN

Evolving toward modern concepts

After the transparent machine, we sophisticate without going back into the fog. Visual mini-labs connect the Small Language Machine to modern concepts: embeddings (map of meaning), attention (focus weights), document search (RAG) and hybrid systems.

Module principle

Modernizing is not mystifying. It is swapping simple pieces for more flexible ones — keeping the mental map.

Embedding map

Select a word and see which ones are closest. Simplified 2D map — real embeddings have hundreds of dimensions.

Word:

Selected

coffee

didactic group: drinks

Nearest neighbours

teadist. 7.2

milkdist. 11.2

cupdist. 12.6

breaddist. 22.5

cakedist. 28.8

Attention simulator

Does not compute real Transformer attention. Illustrates the idea: tokens receive different weights according to how they help interpret the focus.

Example:

Focus

she

AnalentthecoattoJuliabecauseshewascold

Ana28%

lent18%

the4%

coat62%

to20%

Julia90%

because16%

she100%

was42%

cold55%

Mini-RAG: search before answering

Edit the documents, ask a question and see which passages are retrieved. Scoring by word overlap — simple but instructive.

Document base

Question

Supported answer

The most relevant passage seems to be: "Document 3: Attention helps the model weigh which parts of the context are most relevant.".

Document 350% match

Document 3: Attention helps the model weigh which parts of the context are most relevant.

attentionthemodel

Document 10% match

Document 1: A Small Language Machine uses a small corpus to learn continuation patterns.

Document 20% match

Document 2: Embeddings represent words and texts as positions in a numerical space.

Document 40% match

Document 4: RAG combines document search with language generation.

Document 50% match

Document 5: GPUs accelerate many parallel mathematical operations used in neural network training.

What does each piece do well?

The goal is not to pick one winning technique, but to understand how they complement each other.

N-grams

Strength Very transparent and easy to explain.

Limit Depend on exact repetition and generalise little.

Use Teaching, prototypes, prediction demonstrations.

Embeddings

Strength Capture proximity of meaning.

Limit Are approximations and can carry biases from the data.

Use Semantic search, recommendation, text comparison.

Attention

Strength Helps connect relevant parts of the context.

Limit Is not human comprehension and does not guarantee truth.

Use Transformers, LLMs, translation, summarisation, generation.

RAG

Strength Grounds answers in external documents.

Limit Depends on the quality of the search and the sources.

Use Questions about internal bases, research, support.

Lesson complete30 minPonte conceitual

3.1

From counting to neural models

What changes when we leave the n-gram table behind

▸ Lesson objective

Understand the transition between a transparent counting machine and neural models that learn more flexible representations.

What our n-gram machine did well

The interactive machine from Module 2 observed sequences and counted continuations. If after "I like" the word "coffee" appeared, it stored that in the table.

This approach is excellent for learning because it is transparent. We can open the table, see the counts and explain every choice.

The limit of exact repetition

The problem is that the n-gram machine depends heavily on sequences already seen. If the corpus never had "I love coffee", it may not know that this is similar to "I like coffee".

Neural models enter to deal better with similarity, context and generalisation. Instead of depending only on exact repetition, they learn internal representations.

Representations are the leap

The word "coffee" stops being just a label or an entry in a table. It starts being represented by numbers that capture relations with other words.

This leap opens the way to embeddings, attention and modern models. The machine begins to work with neighbourhoods of meaning, not just local counts.

[ practice ]

The gap in counting

Think of a corpus that has "I like coffee" but never had "I love coffee". Why can the n-gram machine not use the similarity between "like" and "love"? What would it need?

See expected answer

For the table, "like" and "love" are different symbols with no relation — it only knows exact sequences. It would need a representation that brought words of similar meaning closer together. That is exactly what embeddings provide.

✓ What you take away

Modern models do not abandon statistics; they sophisticate it. We move from explicit counts to learned representations that allow better generalisation.

Living glossary

Terms we will encounter in this module.

Representation

A numerical or structural form the machine uses to work with an idea, word or document.

Embedding

A numerical representation that positions words or texts in a space of meaning.

Similarity

A measure of proximity between two representations.

Attention

A mechanism that assigns different weights to parts of the context.

Attention weight

A value indicating how much importance a token receives at a given step.

RAG

Retrieval-Augmented Generation: generation supported by document retrieval.

Retrieval

The process of searching for documents or passages relevant to a question.

External context

Information brought in from outside the model to support an answer.

Hybrid system

A system that combines different techniques, such as LLMs, search, rules, graphs and databases.

Evolving toward modern concepts

Module principle

Modernizing is not mystifying. It is swapping simple pieces for more flexible ones — keeping the mental map.