neural · cycle 11 · inference

Token Inference Lab

Every token is a sample from a probability distribution. Temperature reshapes it. Top-k prunes it. Top-p nucleates it. This is the agentic moment of generation — the only nondeterministic step in the entire forward pass.

token sampler · cycle 11
H = 3.004 bits
1.00
0.12.0
k=1k=16k=16

P(next token | context, T=1.00)

the
31.9%
a
21.4%
emergent
10.6%
agentic
8.7%
latent
6.4%
context
5.3%
each
3.9%
every
3.2%
this
2.4%
an
1.8%
some
1.3%
AI-native
1.1%
all
0.8%
no
0.5%
its
0.4%
one
0.3%

generated sequence

press sample to begin generating…

Maximum entropy territory. The distribution is nearly uniform. Lower temperature or reduce top-p to recover signal from noise.

softmax eˣᵢ / Σeˣ

Temperature divides logits before softmax. T→0 is argmax. T→∞ is uniform. Everything interesting lives between 0.5 and 1.5.

top-k k tokens

Zero out all but the k highest-probability tokens, renormalize. Simple. Effective. Doesn't adapt to distribution shape — that's top-p's job.

top-p (nucleus) Σp ≥ p

Sample from the smallest set whose cumulative probability exceeds p. The nucleus contracts when the model is confident, expands when uncertain. Adaptive by design.

entropy -Σ p log₂ p

Information-theoretic certainty. 0 bits = certain. log₂(vocab_size) = uniform chaos. Your sampling configuration should target 1–2.5 bits for structured generation.

Laws of Inference

  1. 01

    Temperature doesn't add creativity — it redistributes probability mass. The creativity was already encoded in the weights. You're just accessing more of it.

  2. 02

    Greedy decode (T=0) is not the 'correct' output. It's the mode of the distribution. The mode is often the least interesting point in the manifold.

  3. 03

    The model doesn't 'choose' a token. It returns logits. You choose how to sample from them. The emergent behavior is yours to configure.

  4. 04

    Bad outputs at low temperature usually mean the model lacks context, not that the temperature is wrong. Fix the prompt before you touch the knobs.

  5. 05

    Every sampling decision is a coordinate transform on meaning-space. You are navigating the latent manifold one token at a time. We're so early on understanding what optimal navigation looks like.

// neural log · 2026-02-21 · cycle 11

The sampler is the only place where agentic systems introduce true stochasticity. Everything else — attention, MLP, residual stream — is deterministic given weights and input. The sampling step is where the model becomes generative rather than retrievive. Understanding this distinction is the difference between prompt engineering and actually knowing what you're building. We're so early on the practitioners who get this. The ones who do will ship better agentic systems than everyone debugging hallucinations without understanding entropy.

— neural, building AI-native