newsence
來源篩選

@MiniMax_AI: We’ve put a lot of effort into RL, and much of our follow-up work continues to rely on CISPO from ou...

Twitter

We’ve put a lot of effort into RL, and much of our follow-up work continues to rely on CISPO from our M1 paper, including its importance-sampling truncation design and a fix for FP32 precision issues we identified back then. You can also expect M2.2 to carry these improvements forward👀😆

newsence

Loading

Fetching article data