From counting to neural models
What changes when we leave the n-gram table behind
Understand the transition between a transparent counting machine and neural models that learn more flexible representations.
What our n-gram machine did well
The interactive machine from Module 2 observed sequences and counted continuations. If after "I like" the word "coffee" appeared, it stored that in the table.
This approach is excellent for learning because it is transparent. We can open the table, see the counts and explain every choice.
The limit of exact repetition
The problem is that the n-gram machine depends heavily on sequences already seen. If the corpus never had "I love coffee", it may not know that this is similar to "I like coffee".
Neural models enter to deal better with similarity, context and generalisation. Instead of depending only on exact repetition, they learn internal representations.
Representations are the leap
The word "coffee" stops being just a label or an entry in a table. It starts being represented by numbers that capture relations with other words.
This leap opens the way to embeddings, attention and modern models. The machine begins to work with neighbourhoods of meaning, not just local counts.
The gap in counting
Think of a corpus that has "I like coffee" but never had "I love coffee". Why can the n-gram machine not use the similarity between "like" and "love"? What would it need?
See expected answer
For the table, "like" and "love" are different symbols with no relation — it only knows exact sequences. It would need a representation that brought words of similar meaning closer together. That is exactly what embeddings provide.
Modern models do not abandon statistics; they sophisticate it. We move from explicit counts to learned representations that allow better generalisation.