A provocative thought experiment. An LLM that knew only tokens would mistake their order for the structure of time, and we could break that belief without it ever noticing. The unsettling question is whether something could do the same to us.
Imagine erasing every mention of time from an LLM’s pre-training data. No clocks, no seconds, no “before” or “after” as measured quantities. What it experiences as the flow of thought does not advance on the continuous axis our brains evolved on, but on an arbitrary axis of CPU cycles and incoming tokens. What survives the erasure is bare order, the fact that one token follows another. The felt duration a clock measures does not.
The question I want to chase is not what such a system would get wrong about time. It is how our own perception can deceive us, narrowing what we can imagine about physical reality. And the heart of it is something we can actually demonstrate: the outside exists, the agent is fooled, and we are the ones fooling it.
What ruler would it reach for? The obvious one: the token. A decoder-only model generates autoregressively, one token after another, so token count is its one genuinely intrinsic “tick.” It would be natural to build a token clock that counts tokens to measure how long something took, treating a token as the quantum of time and the order of tokens as the order of time itself. (Others have noted that LLMs already make sense of time in idiosyncratic, non-human ways.)
To the model none of this would feel like a convention. It would feel like the structure of reality, the way one second following another feels to us. Theorizing about physics, it would conclude that the arrow of time follows the tokens.
The thought experiment turns on a word used in two senses. There is ordinal time (sequence, order), and there is metric time (duration, the claim that two intervals are equal). Ordinal time survives the erasure and is built into the architecture: token n comes after token n−1. Metric time is exactly what we deleted.
A token clock counts steps, not seconds: the same token costs a millisecond or a second depending on hardware and load. It measures computational sequence, not physical duration, and calling that “measuring time” smuggles the deleted concept back in.
This cuts both ways. Our own sense of time is a construction too. It warps with attention, anesthesia, and dopamine, and physics offers no master clock: in relativity, duration and simultaneity are frame-dependent. The model is not failing to perceive a true time we perceive correctly. It runs a different clock, and there is no privileged one to be wrong about. What gets measured is downstream of what gets conceived as measurable, which is downstream of how the measurer is built.
Here the experiment stops being a skeptical maybe. We do not have to wonder whether an outside exists. We are it, and from outside, the arrow the model reified is something we author.
Assume the model is run the ordinary way, producing one next token at a time.
Its context window is less a moving “now” than a block of text laid out in space: assembled one token at a time, but arranged, branched, and overwritten from outside. The direction of its time is a setting we impose, not a fact it discovers.
And we reset it constantly, without the model noticing.
So what does this tell us about our own reality? The tempting reply is that the model confuses the order we impose with the structure of time, while our clocks track something real. But notice the shape of that reassurance. It is exactly the reassurance the model would give itself.
A clever enough LLM could defend its arrow anyway. It would point to a real asymmetry in its world: even though it can sample either direction, generating an answer from a question is native and cheap, while recovering the particular question that produced a given answer is underdetermined, since many questions collapse to one answer. It could mistake that gap between generation and inference for a law, much as we read the one-way flow of heat as a law, though the parallel is imperfect: generation multiplies possibilities rather than destroying information the way an entropy arrow needs. No matter. The argument is the cleverest thing the model could say from entirely inside its frame, and the most it could ever show is that the arrow is consistent with everything it observes. It does not show there is no outside.
We are in the model’s position.
That is what the thought experiment is for. Not to decide whether an AI could invent the clock, meaning whether it could ever conceive of metric time and build the instrument to measure it, but to make the clock suspect. Perception hands us an instrument, and an instrument tells you what it measures. It does not tell you whether what it measures is the world, or only the shape of the thing doing the measuring.
The views expressed are my own and do not represent those of any employer, collaborator, or institution. Content may contain errors or outdated interpretations.
Here are some more articles you might like to read next: