# Base Layer: Behavioral Compression Research

# Pipeline Ablation Study

**Date:** March 2026
**Authors:** Aarik Gulaya
**Cost:** ~$16 total

---

## Abstract

We conducted a systematic ablation study to determine which steps in a 14-step behavioral compression pipeline are load-bearing. Using Benjamin Franklin's autobiography as the test corpus, we evaluated 14 conditions by selectively removing pipeline stages and measuring output quality. The result was decisive: a 4-step pipeline (Import, Extract, Author, Compose) scored 87/100, outperforming the full 14-step pipeline at 83/100. Ten of fourteen steps, including scoring, classification, tiering, contradiction detection, consolidation, anchor extraction, embedding, and collective review, are ceremonial.

## Methodology

The full pipeline consists of 14 sequential steps: Import, Extract, Embed, Score, Classify, Tier, Contradictions, Consolidate, Anchors, Author Layers, Collective Review, Compose, Assemble, and Serve. We designed 14 experimental conditions (C0–C13), each removing one or more steps while holding input data constant. All conditions used the same Franklin autobiography corpus. Output briefs were evaluated on a 100-point scale assessing behavioral fidelity, predictive utility, and structural coherence.

Key conditions:
- **C0:** Full 14-step pipeline (baseline)
- **C1–C7:** Drop individual intermediate steps (scoring, classification, tiering, contradictions, consolidation, anchors, embedding)
- **C8–C10:** Skip authoring stages
- **C11:** Author 3 layers + Compose, no collective review
- **C12:** Direct fact injection (raw facts into compose, no authoring)
- **C13:** Single-layer authoring (no 3-layer architecture)

## Results

| Condition | Description | Score |
|-----------|-------------|-------|
| C0 | Full 14-step pipeline | 83 |
| C1–C7 | Drop individual steps | 81–83 |
| C8–C10 | Skip authoring | 77–80 |
| C11 | Author + Compose, no review | **87** |
| C12 | Direct fact injection | 77 |
| C13 | Single-layer authoring | 83 |

## Key Findings

1. **The 3-layer architecture is load-bearing.** C11 (3 layers, 87) vs C13 (single layer, 83) confirms that the Anchors/Core/Predictions decomposition adds real signal. This is the structural contribution of the pipeline.

2. **Collective review is ceremonial.** Removing the multi-model review process improved the score from 83 to 87. The review step introduced conservatism that dampened useful signal.

3. **Intermediate enrichment steps are ceremonial.** Scoring, classification, tiering, contradiction detection, consolidation, and anchor extraction, six distinct processing stages, produced no measurable improvement when removed individually (C1–C7 range: 81–83 vs baseline 83).

4. **Raw facts without synthesis lose signal.** C12 (direct fact injection, 77) was the worst condition. The authoring step, where facts are synthesized into behavioral patterns, is where compression happens. Facts alone are not a brief.

5. **The pipeline is really 4 steps, not 14.** Import, Extract, Author (3 layers), Compose. Everything else was scaffolding needed to discover this. *(Update, March 31 2026: the pipeline was expanded to 5 steps with the addition of an Embed step for traceability. The embed step does not improve quality — it exists purely so every claim can be traced back to its source facts. This was discovered during S100 when provenance traces were found to be completely broken without vector embeddings.)*

## Limitations

- Single subject (Franklin). The simplified pipeline has been validated on Franklin and is being tested on Marks, Douglass, and Buffett, but results are pending.
- Evaluation used a single scoring rubric. Different rubrics might weight different pipeline contributions.
- The 14-step pipeline was developed iteratively over 79 sessions. Some steps may have been load-bearing at earlier stages of development and became redundant as other steps improved.
- False positive guards in the predictions layer were never tested in isolation, an untested gap in the ablation design.
