bounties

deepseek r1 grpo

250k relign

implement deepseek r1's GRPO in relign and evaluate it on gsm8k math.

medical reasoning

250k relign

create a complex medical reasoning task specification + verifier based on HuatuoGPT-o1.

multistep learning

250k relign

implement a multi step learning inference strategy

submit a bounty

thank you. relign will contact you within 24 hours.
try again