Pinned Loading
-
gradientregularization_trl
gradientregularization_trl PublicImplementation for our paper "Gradient Regularization prevents Reward Hacking in RLHF and RLVR". Implemented TRL and for Huggingface Transformers
Python 12
-
OffPolicyCorrectedRewardModeling
OffPolicyCorrectedRewardModeling PublicImplementation for our COLM paper "Off-Policy Corrected Reward Modeling for RLHF"
Python 8
-
tf2multiagentrl
tf2multiagentrl PublicClean implementation of Multi-Agent Reinforcement Learning methods (MADDPG, MATD3, MASAC, MAD4PG) in TensorFlow 2.x
-
OfflineRLStructuredNonstationarity
OfflineRLStructuredNonstationarity PublicImplementation for RLC paper "Offline Reinforcement Learning from Datasets with Structured Non-Stationarity".
Python 7
If the problem persists, check the GitHub status page or contact support.



