def calculate_total(items): as a sequence of tokens, just like it treats “The quick brown fox” as tokens. The magic is that it’s seen so much code that its predictions respect syntax anyway.function is one token but getElementById is fourfunction or return become single tokens. Rare identifiers like calculateMonthlyRevenue get split into subwords: calculate + Monthly + Revenue.{, =>, ::. This means code fills up context windows faster than English — a 128K-token window holds less code than you’d expect.function and def end up near each other in vector space because they appear in similar contexts across training data.a = b is different from b = a. Positional encodings (like RoPE, used in most modern models) are added to each embedding so the model knows where each token sits in the sequence, not just what it is.for in Python, for in JavaScript, and for in Go occupy similar regions in vector space because they appear in structurally similar patterns. The model learns language-agnostic coding concepts.items.reduce((sum, item) => sum +, the model attends heavily to item, reduce, and the variable names — learning that this pattern typically ends with a property access like item.price.item.price (72%), item.cost (15%), item.amount (8%), with thousands of other tokens sharing the remaining 5%.import pandas as pd, it’s not checking if pandas is installed — it’s predicting what usually follows the patterns it’s seen.