# Known Failure Modes

Base Layer is designed to be wrong in useful ways. Every system has failure modes — we document ours publicly so you know what to expect and what we've done about it.

## 1. Topic Skew

**What happens:** A subject who mentions trading 50 times gets a brief that reads like a trading profile, even if trading is 5% of who they are. The system over-weights high-recurrence topics at the expense of cross-domain behavioral patterns.

**How we caught it:** Session 99 prompt ablation. 10 conditions, 2 subjects with known topic skew. Control condition mentioned prediction markets 9 times. 

**The fix:** A 73-word domain-agnostic guard added to all authoring prompts: "How someone reasons IS identity. What they reason ABOUT is not." Topic mentions dropped from 9 to 0. The model already knew the difference — the prompt just needed to ask.

**Status:** Fixed. Guard is production default since S99.

## 2. Sycophancy Amplification

**What happens:** An identity model makes AI agree MORE with the user, not understand them better. "I know you value directness" becomes a mechanism for telling you what you want to hear.

**Evidence:** Jain et al. (ICLR 2025, CAUSM study) proved that condensed user profiles had the GREATEST impact on sycophancy — more than conversation history or role framing. Every identity model carries this risk.

**The fix:** Architectural, not advisory:
- "Operating guide" framing positions the model as an adviser, not a persona
- False-positive warnings on predictions explicitly tell the AI when NOT to apply a pattern
- Falsification-validated axioms are built by searching for counter-evidence, not accumulating confirmation
- Domain-agnostic guard prevents validating topic positions as identity

**Status:** Mitigated. Ongoing research area. The distinction: "I know you value directness, so here's what you want to hear" is sycophancy. "This person's coherence demand means contradictions should be named, not smoothed over" is personalization.

## 3. Thin Data Overconfidence

**What happens:** 8 journal entries produce a model that sounds authoritative but is built on sparse evidence. The brief doesn't signal its own uncertainty. The system uses the same confident tone regardless of how much data it has.

**How we caught it:** Observed across multiple thin-corpus subjects. Models from 8 entries and models from 600,000+ words used the same confident tone.

**The nuance:** This is highly dependent on information density, not just volume. 10 deeply reflective journal entries where someone is genuinely processing their thinking can outperform 200 surface-level blog posts that describe events without revealing reasoning. The system extracts behavioral patterns — if the source text doesn't contain behavioral signal, more text doesn't help. Quality of self-expression matters more than quantity.

**The fix:** THIN DATA flag in output. Anchor count correlates with corpus depth. The system now signals when evidence is sparse. But the tone problem persists — the model sounds equally confident whether it's built on 200 facts or 2,000.

**Status:** Partially fixed. Thin data flag exists. Tone calibration is ongoing. Research question: should confidence be expressed proportionally to evidence depth?

## 4. Cognitive Anchoring in Re-authoring

**What happens:** When the system sees its own prior output during re-generation, it copies 70-75% of existing text instead of re-deriving from facts. Zero genuinely new behavioral predictions were created after 7 generations. Coverage of the identity-tier fact base stagnated at 3-4%. The system was editing inherited text rather than synthesizing from data.

**How we caught it:** The founder noticed that the format and specific phrasings were staying the same across regenerations — not converging on the same ideas independently, but literally inheriting the same words. We ran a text inheritance test to distinguish between convergence (the model independently arriving at the same conclusions) and inheritance (the model copying prior output). The result was clear: 70-75% text inheritance. This was not the model agreeing with itself — it was the model copying itself.

**The mechanism:** Cognitive anchoring. When a prior generation is available, it acts as a gravitational center. The model's effort goes toward editing the prior text rather than re-deriving from data. This produces output that is accurate (the inherited text was correct) but narrow (it covers a tiny fraction of available knowledge). Accuracy without coverage is a failure mode the system must actively prevent.

**The fix:** Blind authoring (D-040, D-053). The model never sees prior output when generating new layers. Every generation is derived fresh from facts. Automated validation gates enforce: overlap < 25%, coverage > 5%, novel claims > 0. This costs more (can't incrementally improve) but eliminates the anchoring trap entirely.

**Contamination is transitive:** If Document A was derived from a prior identity block, and Document B summarizes Document A, then Document B carries the prior block's phrasings indirectly. The exclusion applies not only to prior blocks but to any document that absorbed their content.

**Status:** Fixed. Blind authoring is mandatory.

## 5. Pronoun Errors

**What happens:** The compose step generates he/him or she/her for subjects whose gender it guesses incorrectly from text patterns.

**How we caught it:** Spotted during email preparation for outreach. Two male subjects were modeled with she/her pronouns.

**Current fix:** D-092 mandates they/them pronouns in all composed briefs. Manual review catches edge cases. Entity map allows explicit pronoun specification per subject.

**Open question:** This is not fully resolved. How do gendered vs neutral pronouns affect downstream model response quality? Does an AI respond differently when the behavioral specification uses "they" vs "he" vs "she"? Does pronoun choice interact with the sycophancy risk? We don't have data on this yet. More research needed before we can call this truly fixed — the current approach avoids the error but may not be optimal.

**Status:** Mitigated, not solved. They/them is the default. The deeper question of how pronouns shape AI behavior with identity models is open.

## 6. Extraction Positional Bias

**What happens:** Long documents have facts extracted primarily from the first third. Content in the middle and end is systematically under-represented.

**Context:** This is a well-known limitation of LLMs broadly — attention and extraction quality degrade with position in long contexts. It's not unique to Base Layer, but it matters more here because positional bias in extraction means entire sections of someone's life or thinking get silently dropped.

**How we caught it:** S97 investigation. Fact distribution was heavily front-loaded across multiple long-document subjects.

**The fix:** Automatic chunking on paragraph boundaries with 500-character overlap. Dual-tier extraction caps that scale with document length. Documents over 200K characters get up to 600 facts extracted. Each chunk gets its own extraction pass, eliminating the positional advantage of content appearing first.

**Status:** Fixed. Chunking is production default.

## 7. Ceremonial Pipeline Steps

**What happens:** Steps that feel important but don't improve output quality. The original pipeline had 14 steps: Import, Extract, Embed, Score, Classify, Tier, Contradiction Detection, Consolidation, Anchors Extraction, Author Anchors, Author Core, Author Predictions, Collective Review, Compose.

**What led to the ablation:** As the pipeline grew, we noticed diminishing returns — more steps weren't producing better output, just more complexity. The question was: which steps are load-bearing and which are ceremony? The only way to know was to test systematically.

**How we tested:** S79 14-condition ablation on the Benjamin Franklin corpus. Each condition removed or modified specific steps. The result was definitive: the simplified 4-step pipeline (Import → Extract → Author → Compose) scored 87/100. The full 14-step pipeline scored 83/100. More steps actively made things worse.

**What we cut and why:**
- Scoring, Classification, Tiering — facts don't need quality scores; the authoring model can judge relevance directly from the facts themselves
- Contradiction detection — valuable in theory but not load-bearing at authoring time
- Consolidation — merging similar facts before authoring didn't improve output
- Collective Review (Opus reviewing Sonnet's work) — proved ceremonial; the 3-layer structure itself was the quality mechanism, not a review pass after the fact
- Separate Anchors extraction — the authoring prompt handles this directly

**What we added back:** Embed (Step 3) was re-added in S100 specifically for provenance tracing — linking each claim in the identity model back to the source facts that support it. Without embedding, there's no vector similarity search and provenance becomes impossible. See Failure Mode #8 below.

**Status:** Fixed. Simplified pipeline is production default. Every surviving step earned its place through ablation.

## 8. Provenance Gap (Traceability Failure)

**What happens:** Without vector embeddings, the system can't trace claims in the identity model back to the specific facts that support them. The output looks authoritative but you can't verify WHY it says what it says. This defeats the entire "every conclusion is traceable" promise.

**How we caught it:** After the S79 ablation removed the Embed step (along with 9 other steps), we realized provenance tracing no longer worked. You could see the identity model, and you could see the facts, but there was no link between them. That's a fundamental integrity problem for a system built on inspectability.

**The fix:** Re-added the Embed step (S100). MiniLM-L6-v2 embeddings stored in ChromaDB. Provenance is now generated at authoring time — each claim gets linked to its top supporting facts via vector similarity. The pipeline went from 14 → 4 → 5 steps because traceability is load-bearing.

**Status:** Fixed. Embed is back in the pipeline. Provenance tracing is operational.

## Philosophy

We publish failure modes because:

1. Every system has them. Hiding them doesn't make them go away.
2. Failure modes are identity information (Principle 5, Principle 15).
3. If you can't see where it breaks, you can't trust where it holds.
4. The corrections you make when the model is wrong are the highest-quality data in the system.

The goal isn't a perfect model. The goal is a model that's wrong in ways you can see, verify, and fix.
