The Problem

Canonical Decay

How a generation of generative models is, with each iteration, quietly rewriting the storyworlds it was asked to remember.

Every storyworld worth protecting eventually meets the same fate. It is adapted across formats. It is filtered through dozens of creative teams. It is ingested as training data alongside its own derivatives — the reviews, the wikis, the fan continuations, the algorithmic slop. The original signal is buried beneath its own echoes.

We call this canonical decay: the progressive divergence between a Large Language Model's (LLM) representation of a storyworld; and the author's original text. It is not a bug to be patched in the next release. It is a structural consequence of how language models work — and it is accelerating with every new model generation trained on the synthetic output of the last.

The Conflux Crisis

LLM's do not read a book in the human sense. They do not build a mental model of characters, rules, and timelines. They learn the statistical properties of language — i.e. which words tend to appear near which other words — and from that statistical cloud they generate fluent, confident, and frequently false answers.

The result is entity conflation: distinct things that share a name collapse into a single fuzzy concept inside the model. A character named “The Guard” and the generic role of “the guard” become statistically indistinguishable. The model fuses their attributes into plausible falsehoods — dressing the character in a uniform he never wore, attributing deeds to him he never did. The output is not a hallucination in the loose sense. It is knowledge fusion: confident synthesis from a smeared, entangled representation.

The compounding loop

Each new generation of LLM is trained on a corpus that increasingly contains the synthetic output of the last. The derivative becomes the data. A canonical error introduced by Generation 1 enters Generation 2 not as noise to be ignored, but as a statistically valid pattern to be reinforced. With each cycle, the model's connection to authorial ground truth weakens. The distinction between the original, human-authored signal and the algorithmic noise is lost.

For an IP holder, the implication is stark. To rely on a general-purpose model for canon management is not merely to use a flawed tool. It is to tie an heirloom to an engine that is designed to forget.

The architectural answer

The failures of generalist models are not temporary. They are the inescapable legacy of a design philosophy — distributional semantics — that learns meaning from co-occurrence and rewards plausibility over verifiability. No prompt-engineering, fine-tuning, or retrieval scheme repairs this at the root. The model was never built to know. It was built to approximate.

A different architecture is required: one that disambiguates before describing, structures before generating, and verifies before presenting. One in which every canonical fact carries a citation to its source. One in which the human author, IP, and/or designated Core Lore Team holds final authority, and in which the canon is versioned the way software engineers version source code.

That architecture is the Canon Crystal.

What we do→