Category: Large Language Models
-
Self-Scaffolding LLMs: How Ornith-1.0 Rewrites Its Own Harness Mid-Training
A technical breakdown of DeepReinforce’s self-improving agentic coding models There’s a quiet assumption baked into most RL post-training pipelines for coding agents: a human designs the harness, and the model just gets better at using it. The scaffold — memory management, error handling, tool orchestration, retry logic — stays fixed. The policy is the…