[veRL] FSDP SFT trainer,SFT vs. RL,交叉熵损失 | loss mask | learning rate scheduler
本期 code:https://github.com/chunhuizhang/llm_rl/blob/main/tutorials/infra/verl/verl_sft.ipynb
自回归模型交叉熵损失:av1102325267
training vs. inference: av1102325267
FSDP 分布式训练:BV1Kx4y187Te
立即观看