LAB · MODULE 2

Build the machine, step by step

This is the machine from Lessons 7 and 10 — live. You feed the corpus and watch every piece work: tokenization, vocabulary, the continuation table and token-by-token generation, with probabilities on display. No magic, no neural network: pure counting you can see through.

Tokens

pieces observed in the corpus

Vocabulary

unique known tokens

Contexts

entries in the table

Context

words used to predict

The corpus

The little world the machine knows. One sentence per line.

Examples:

14 sentences · 59 tokens · 31 in vocabulary

Tokenization

Each sentence broken into pieces (words and punctuation).

1ilikecoffee

2iliketea

3ilikecoffeewithmilk

4shelikescoffee

5shelikesbread

6thecoffeeiswarm

7theteaiscold

8idrinkcoffeeinthemorning

9shedrinksteaatnight

10coffeegoeswellwithbread

11teagoeswellwithcake

12themachinelearnspatterns

+2 more sentences…

Vocabulary

Unique tokens, each with a number (ID) and how many times it appeared.

1i4×2like3×3coffee6×4tea4×5with3×6milk1×7she3×8likes2×9bread2×10the5×11is3×12warm1×13cold1×14drink1×15in1×16morning1×17drinks1×18at1×19night1×20goes2×21well2×22cake1×23machine2×24learns1×25patterns1×26generates1×27text1×28learning1×29artificial1×30intelligence1×31fun1×

Context size

How many previous words the machine looks at to predict the next one.

Continuation table

The machine's memory: after each context of 2 words, what came next — and how many times.

artificial intelligence→is1

at night→• end1

coffee goes→well1

coffee in→the1

coffee is→warm1

coffee with→milk1

drink coffee→in1

drinks tea→at1

generates text→• end1

goes well→with2

i drink→coffee1

i like→coffee2tea1

in the→morning1

intelligence is→fun1

is cold→• end1

is fun→• end1

is warm→• end1

learning artificial→intelligence1

learns patterns→• end1

like coffee→• end1with1

like tea→• end1

likes bread→• end1

likes coffee→• end1

machine generates→text1

machine learns→patterns1

she drinks→tea1

she likes→bread1coffee1

tea at→night1

tea goes→well1

tea is→cold1

the coffee→is1

the machine→generates1learns1

the morning→• end1

the tea→is1

well with→bread1cake1

with bread→• end1

with cake→• end1

with milk→• end1

Generation: the machine writes

It looks at the context, checks the table, picks the next token and repeats.

Start with (optional)

How to choose

Temperature · 0.8low = more obvious and stable · high = more varied and risky

That is it. By changing the corpus, context size and selection method, you see where every word comes from — and why the same machine can sound predictable or creative. Giant models follow the same cycle, with embeddings and attention instead of raw counting.

← Back to Lesson 10

LAB · MODULE 2

Build the machine, step by step

Tokens

pieces observed in the corpus

Vocabulary

unique known tokens

Contexts

entries in the table

Context

words used to predict

The corpus

The little world the machine knows. One sentence per line.

Examples:

14 sentences · 59 tokens · 31 in vocabulary

Tokenization

Each sentence broken into pieces (words and punctuation).

1ilikecoffee

2iliketea

3ilikecoffeewithmilk

4shelikescoffee

5shelikesbread

6thecoffeeiswarm

7theteaiscold

8idrinkcoffeeinthemorning

9shedrinksteaatnight

10coffeegoeswellwithbread

11teagoeswellwithcake

12themachinelearnspatterns

+2 more sentences…

Vocabulary

Unique tokens, each with a number (ID) and how many times it appeared.

Context size

How many previous words the machine looks at to predict the next one.

Continuation table

The machine's memory: after each context of 2 words, what came next — and how many times.

artificial intelligence→is1

at night→• end1

coffee goes→well1

coffee in→the1

coffee is→warm1

coffee with→milk1

drink coffee→in1

drinks tea→at1

generates text→• end1

goes well→with2

i drink→coffee1

i like→coffee2tea1

in the→morning1

intelligence is→fun1

is cold→• end1

is fun→• end1

is warm→• end1

learning artificial→intelligence1

learns patterns→• end1

like coffee→• end1with1

like tea→• end1

likes bread→• end1

likes coffee→• end1

machine generates→text1

machine learns→patterns1

she drinks→tea1

she likes→bread1coffee1

tea at→night1

tea goes→well1

tea is→cold1

the coffee→is1

the machine→generates1learns1

the morning→• end1

the tea→is1

well with→bread1cake1

with bread→• end1

with cake→• end1

with milk→• end1

Generation: the machine writes

It looks at the context, checks the table, picks the next token and repeats.

Start with (optional)

How to choose

Temperature · 0.8low = more obvious and stable · high = more varied and risky

← Back to Lesson 10