LITEPAPER
Reasoning engines will push artificial agents over the edge to become useful in the real world.
Problem__
Agents are autoregressive. An agent with 90% accuracy at each step, performing a task of 100 steps will have a near zero probability of success. Hypothetically if, by reasoning, we can increase the step accuracy to 99%, the success rate becomes 0.99^100, i.e., 36%. The moral of the story is that small percentage increases yield ever increasingly significant results in the long run.
To achieve this increase in step accuracy agents must learn to reason.
OpenAI has shown an example with its o1 reasoning model - reinforcement learning can be used to post-train foundation models into reasoning engines, showing clear signs of general artificial intelligence. If we explicitly train foundation models to think in steps (instead of answering immediately) these systems show self-correcting capabilities, which translates in better agentic performance.
More importantly, this idea scales beyond training time. We can scale up computation in inference time and gain better accuracy and correctness. Allowing an agent to think harder about a problem results in better agent performance.
Solution__
As closed (and open) source labs build out their post-training infrastructure with large, talent-dense teams, there is a pressing need for the open-source community to join forces and develop its own tooling. RELIGN will lay the foundation for this framework, which will help machine learning engineers research, train, and evaluate their models.
RELIGN builds ready-to-use implementations of reinforcement learning algorithms, inference strategies, reward/judge models, and datasets/environments. This will help builders streamline large-scale experimentation and model development as post-training schemes will increase in complexity, model sizes continue to grow, and cost of compute continues to decrease.
While building this framework, and ensuring that it works correctly, we will also post-train and release models on domains like math, coding, health/medicine, and other community-built benchmarks. Sharing - often costly - models and training recipes is crucial for the open-source community.
In essence, RELIGN will offer:
- A framework for post-training large language models - Allowing developers to train their models according to their own reward/evaluation specifications, with various algorithms/strategies.
- Open-source datasets for public benefit and research
- A benchmarking tool for performance indicators, and an ecosystem where researchers can obtain such benchmarks for their projects
RELIGN is the path to autonomous, self-owned agents, and a step closer to achieving open-source AGI.
Roadmap__
Short-term
Geared towards making RELIGN a modular, research-friendly framework.
- Create domain-specific (math, coding) reasoning fine-tunes of a 7B-parameter Deepseek model, using RELIGN.
- Developer bounties: Work on RELIGN by solving bounties (implement specific algorithms, environments, train strategies, or adapters) will be rewarded via RELIGN tokens depending on the complexity of the task
- Build/research and evaluate Monte-Carlo tree search inference strategies
- Opensource math/code reasoning post train finetunes. Showcasing the capabilities of the framework (i.e., opensource model X post trained with RELIGN yields a Y percent increase in reasoning capabilities on task Z)
- Test evaluation benchmarks on various tasks (general Q&A, trading, coding, math, research)
Long-term
- Train an o1-caliber open-source reasoning model with RELIGN
- Developers can specify and train their reasoning models on specific tasks and environments.
- Fine-tune and run inference on own hardware / or by burning RELIGN tokens in the cloud, (CLOUD=1)
What's needed for success
Talent + Research - The complexity of this build is daunting. However, open source is the vehicle to achieve this, through a community of AI researchers, developers, and evangelists of the RELIGN framework.
Compute - Funded through the RELIGN ecosystem treasury, compute is crucial in fine-tuning open-source models for benchmarking and public use.
Team__
Machine learning engineer and researcher. B.Sc in Computer Science, M.Sc in Artificial Intelligence from TU Delft. Thesis on quantum reinforcement learning. Previous ventures: quant, generative AI and 3D configurable e-commerce.
Multidisciplinary builder and creative. Experienced in design, strategy, and communications. Previous ventures: generative AI, 3D configurable e-commerce, programming education, CPG, experience design.
Web3 marketing maniac. Experienced community builder and operator. Currently helping creators @pizza_ninjas | spaces host @bonkradio @GetFrego crypto class of 2021
Summary__
Reasoning will be the key to agent success. We need great frameworks and great reasoning models to make agents useful.
Closed-source companies are not looking out for the interests of independent AI developers. Don’t let them control our alignment.