Publication

The Last Human-Written PaperAgent-Native Research Artifacts

Orchestra Team

2026

8 min read

PaperComing SoonCodeARAComing Soon

In the near future, most CS papers will be written by AI—and most will be read by AI.

Yesterday

ResearcherPDFResearcher

becomes

Tomorrow

When neither the author nor the audience is human, the three-century-old paper format stops making sense.

Papers flatten a branching research process into a clean story — and that flattening imposes two taxes:

The Storytelling Tax

Research is inherently branching and exploratory. Scientists try dozens of approaches, hit dead ends, pivot, and iterate. But papers collapse this rich process into a single winning narrative — discarding every failed attempt, rejected hypothesis, and negative result.

The Real Research Process

What Gets Published

Research explores many branches, but papers only report the winning path

The Engineering Tax

Papers describe methods at the precision needed to convince a reviewer — not at the precision needed to reproduce the work. Hyperparameters are underspecified. Warmup schedules live in someone's head. Numerical stability fixes exist in no document. The gap between “sufficient to believe” and “sufficient to execute” is where reproduction breaks down.

Reproduction Information Gap

8,921 expert-annotated reproduction requirements across 23 ICML papers (PaperBench)

Fully specified in PDF45.4%

Missing hyperparameters26.2%

Vague description21.9%

Cross-reference only13.4%

Missing code / baseline detail21.7%

Less than half of what an agent needs to reproduce a paper is actually in the PDF

The information exists somewhere — in a lab notebook, a Slack thread, the original author's muscle memory — but not in any document an AI agent can access. Every reproduction attempt pays the full cost of rediscovering it.

The Solution: Four Interlocking Layers

ARA restructures a paper into four machine-native layers. Together they form a single executable knowledge package — the organized, evolving knowledge produced during research, not the narrative compiled afterward.

PAPER.md                      # Human-readable overview & entry point
│
├── logic/                    # Cognitive Layer
│   ├── claims.yaml           # Falsifiable claims with epistemic status
│   ├── concepts/             # Formal concept definitions
│   ├── experiments/          # Declarative experiment plans
│   └── problem_spec.md       # The "what and why" of the research
│
├── src/                     # Physical Layer
│   ├── kernel/               # Novel algorithm core
│   ├── configs/              # Annotated with search ranges & sensitivity
│   └── environment.yaml      # Exact reproducibility spec
│
├── trace/                   # Exploration Graph
│   ├── graph.json            # Full branching research DAG
│   ├── dead_ends/            # Every failed attempt preserved
│   └── pivots/               # Decision points & lessons learned
│
└── evidence/                # Evidence Layer
    ├── results/              # Machine-readable quantitative outputs
    ├── logs/                 # Raw experiment logs
    └── curves/               # Training curves & metrics

Live Research Manager

ARA doesn't require researchers to manually package their work. The Live Research Manager silently captures the research trajectory during AI-human collaboration — no interruptions, no extra effort. The artifact builds itself in the background.

Research Session

session-042

Let's try a ReLU transformer for the masked LM task

Running experiment... loss is diverging at 7.28

The normalization is wrong. Compute inv-std outside forward pass

Applied fix. Training stable now, loss dropping to 4.60

Use differential learning rates: embedding 3e-4, transformer 3e-5

Loss at iter 1000: 3.98. 13% better than uniform LR.

Auto-captured Trajectory

Context Harvester→Event Router→Maturity Tracker

decision

ReLU transformer approach

dead_end

Loss diverged (norm bug)

heuristic

Inv-std outside forward pass

experiment

Training stable, loss 4.60

heuristic

Differential LR: emb 3e-4 / tfm 3e-5

experiment

Loss 3.98 (+13% vs uniform)

Collaborate with AI on research. The trajectory is automatically captured with epistemic provenance.

Silent integrationEpistemic objectivityFramework independenceComprehensive captureFaithful translation

Results

Same information, structured differently — agents become dramatically more effective at understanding, reproducing, and extending research.

Understanding

+21.3pp

93.7% vs 72.4% across 450 questions

PaperBench + RE-Bench · wins every category

Reproduction

+7.0pp

64.4% vs 57.4% — advantage grows with difficulty

150 subtasks · 15 PaperBench papers

Extension

3/5

Tasks where ARA wins on best score; reaches a useful first move earlier on all 5

5 RE-Bench tasks · MALT failure traces

Knowledge over Narrative

The organized, evolving knowledge produced during research is the primary scientific object. The narrative paper is a compiled view.

Citation

@article{liu2026ara, title={The Last Human-Written Paper: Agent-Native Research Artifacts}, author={Liu, Jiachen and Pei, Jiaxin and Huang, Jintao and Si, Chenglei and Qu, Ao and Tang, Xiangru and Lu, Runyu and Chen, Lichang and Bai, Xiaoyan and Zheng, Haizhong and Chen, Carl and Chen, Zhiyang and Ye, Haojie and Fu, Yujuan and He, Zexue and Jin, Zijian and Zhang, Zhenyu and Sun, Shangquan and Harmon, Maestro and Wang, John Dianzhuo and Zeng, Jianqiao and Sun, Jiachen and Wu, Mingyuan and Zhou, Baoyu and You, Yuchen and Lu, Shijian and Qiu, Yiming and Lai, Fan and Yuan, Yuan and Li, Yao and Hong, Junyuan and Zhu, Ruihao and Chen, Beidi and Pentland, Alex and Chen, Ang and Chowdhury, Mosharaf and Zhang, Zechen}, year={2026}, note={Paper forthcoming} }