bounties
deepseek r1 grpo
250k relign
implement deepseek r1's GRPO in relign and evaluate it on gsm8k math.
medical reasoning
250k relign
create a complex medical reasoning task specification + verifier based on HuatuoGPT-o1.
multistep learning
250k relign
implement a multi step learning inference strategy
submit a bounty