The Long Contract

Vrothjelka's weather forecasts were running smoothly. The stone-equipped network predicted monsoons, droughts, and storms with enough lead time for the agricultural ministry to prepare. Trviksha moved on to the next customer.

The Trader

Phlontjek was a senior contract arbitrator for the Sonhlagot Trading Federation. He dealt in commodity contracts — agreements between parties for the delivery of goods under specified conditions. The contracts were written in formal legal Sonhlagoti, a dialect that was verbose, precise, and full of cross-references.

Phlontjek: My problem is understanding contracts. A typical grain futures contract runs to a hundred and fifty clauses. Clause 112 might say: "Subject to the conditions established in Clause 3 and modified by Clause 47, the delivery timeline shall be..." I need a system that can read a contract and answer questions about it.

Trviksha: I can encode the contract as a sequence and process it with a recurrent network.

Phlontjek: Try it.

She tokenised the contracts — breaking the formal Sonhlagoti text into sub-word units — and fed them into a stone-equipped recurrent network. The sequences were long: a hundred and fifty clauses averaged two thousand tokens.

The network processed the contract token by token, from start to finish, building up its hidden state and stone memory. By the end of the contract, the final hidden state was supposed to summarise the entire document.

The Failure

On short contracts — twenty to thirty clauses — the system answered questions reasonably well. On long contracts, it failed systematically.

Phlontjek: I asked: "What is the delivery deadline under the standard terms?" The answer is in Clause 3. Your system gave a wrong date — it confused Clause 3's deadline with Clause 89's amendment. It seems to remember the recent clauses but not the early ones.

Trviksha: The stones help, but two thousand tokens is too many. By the time the network reaches token 2,000, the information from token 50 has been compressed through nearly two thousand steps of gating. Even with the memory highway, the forget gate occasionally erases information, and over such a long sequence, early clauses fade.

She ran diagnostics. For the hundred-and-fifty-clause contract, she measured how well the network could recall information from each clause position. The results followed a predictable curve: clauses within the last twenty were recalled accurately. Clauses between twenty and fifty positions back were recalled with decreasing accuracy. Clauses more than fifty positions back were essentially lost.

A long scroll of contract clauses stretching from left to right. A velociraptor stands at the right end (Clause 150), looking backward. The recent clauses (130-150) are bright and clear. The middle clauses (50-100) are faded. The early clauses (1-20) are barely visible, with Clause 3 nearly illegible. An arrow from Clause 112 points all the way back to Clause 3, crossing the faded zone

The Sequential Bottleneck

Blortz: The fundamental issue is not the memory mechanism. It is the sequential processing itself.

Trviksha: What do you mean?

Blortz: The network reads the contract one token at a time, from left to right. Token 1, then token 2, then token 3. Information from token 1 must survive through tokens 2, 3, 4, all the way to token 2,000. Even with the stones, the information passes through two thousand sequential steps. Each step is a chance for the information to be compressed, overwritten, or forgotten.

Trviksha: The stones help — they provide a direct path for information to flow across steps. But yes, the information still has to pass through each intermediate step. There is no shortcut from token 50 to token 2,000.

Blortz: Compare this to how Phlontjek himself reads a contract. When he reaches Clause 112 and it says "subject to Clause 3," he does not try to remember what Clause 3 said. He flips back to Clause 3 and reads it again.

Phlontjek: Exactly. I go directly to the relevant clause. I do not read the entire contract again to reach it. I look up what I need.

Trviksha: The network cannot do that. It processes tokens in order. It cannot skip backward. It cannot "look up" an earlier position. Everything it knows about earlier tokens is compressed into the current hidden state and stone — and over two thousand tokens, that compression is lossy.

The Question

Trviksha: What if the network could look directly at any previous position, without passing through all the intermediate ones?

Glagalbagal: You mean skip the sequence?

Trviksha: I mean: when processing token 2,000, instead of relying on a compressed summary of the entire past, the network could reach back and examine any earlier token directly. Token 50. Token 3. Token 1,847. Whatever is relevant.

Blortz: That would require the network to somehow know which earlier tokens are relevant before looking at them.

Trviksha: That is the hard part. Phlontjek knows to look at Clause 3 because Clause 112 explicitly says "subject to Clause 3." But in general, the network would need to figure out which earlier positions are relevant to the current position — without being told.

Phlontjek: A lookup. Each position should be able to query every other position and receive the relevant information. Like a clerk in my office who can check any previous document, not just the one on top of the pile.

Trviksha: A lookup across the entire sequence. That is what I need to build.

Trviksha has identified the fundamental limitation of sequential processing: information must travel through every intermediate step to get from one position to another. Even with gated memory (LSTMs), very long sequences cause early information to degrade — not because the memory mechanism is bad, but because the sequential architecture forces information to travel through many steps. The human solution to this problem — flipping back to the relevant page — is a direct lookup, skipping all intermediate positions. Building a system that can perform such direct lookups across arbitrary positions in a sequence is the challenge that motivated the next major breakthrough in neural network architecture. When you search for a specific fact in a textbook, you do not re-read the entire book from page one. You use the index, or you flip to the chapter you remember. What makes this possible is that each page exists independently — you can access any page without going through the ones before it. Can the network do the same?