Skip to content
Change the repository type filter

All

    Repositories list

    • 【ACL 2026】LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety
      0200Updated Apr 23, 2026Apr 23, 2026
    • AISafetyLab: A comprehensive framework covering safety attack, defense, evaluation and paper list.
      Python
      MIT License
      1524100Updated Apr 21, 2026Apr 21, 2026
    • MATLAB
      1000Updated Apr 4, 2026Apr 4, 2026
    • CROPI

      Public
      [ACL'26] Official Repository for for paper "Data-Efficient RLVR via Off-Policy Influence Guidance"
      Python
      02000Updated Mar 29, 2026Mar 29, 2026
    • IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation (ACL 2026)
      Python
      0410Updated Mar 9, 2026Mar 9, 2026
    • Survive at All Costs: Exploring LLM's Risky Behaviors under Survival Pressure
      Python
      MIT License
      1610Updated Mar 6, 2026Mar 6, 2026
    • IF-CRITIC

      Public
      IF-CRITIC: Towards a Fine-Grained LLM Critic for Instruction-Following Evaluation (ACL 2026)
      Shell
      0620Updated Feb 6, 2026Feb 6, 2026
    • MIR-SafetyBench: Evaluating Multi-image Reasoning Safety of Multimodal Large Language Models
      Python
      MIT License
      0500Updated Jan 22, 2026Jan 22, 2026
    • JPS

      Public
      [MM'25] JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering
      Python
      42020Updated Dec 23, 2025Dec 23, 2025
    • Glyph

      Public
      Official Repository for "Glyph: Scaling Context Windows via Visual-Text Compression"
      Python
      5058790Updated Nov 4, 2025Nov 4, 2025
    • [ACL'25] SocialEval: Evaluating Social Intelligence of Large Language Models
      Python
      MIT License
      01200Updated Oct 20, 2025Oct 20, 2025
    • CogFlow

      Public
      Think Socially via Cognitive Reasoning
      Python
      1600Updated Oct 2, 2025Oct 2, 2025
    • [EMNLP'24] CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language Models
      Python
      Apache License 2.0
      3650140Updated Oct 2, 2025Oct 2, 2025
    • Crisp

      Public
      [EMNLP'25] Crisp: Cognitive Restructuring of Negative Thoughts through Multi-turn Supportive Dialogues
      Python
      01110Updated Sep 2, 2025Sep 2, 2025
    • Python
      MIT License
      0600Updated Aug 15, 2025Aug 15, 2025
    • Python
      MIT License
      913710Updated Aug 11, 2025Aug 11, 2025
    • [AAAI'25] CharacterBench: Benchmarking Character Customization of Large Language Models
      Python
      12200Updated Aug 1, 2025Aug 1, 2025
    • ShieldVLM

      Public
      Python
      0800Updated Jul 31, 2025Jul 31, 2025
    • Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]
      Python
      MIT License
      1428700Updated Jul 28, 2025Jul 28, 2025
    • VPO

      Public
      Python
      Apache License 2.0
      12520Updated Jul 20, 2025Jul 20, 2025
    • [ACL 2025] LongSafety: Evaluating Long-Context Safety of Large Language Models
      Python
      MIT License
      01600Updated Jun 18, 2025Jun 18, 2025
    • SPaR

      Public
      Python
      Apache License 2.0
      34710Updated Jun 11, 2025Jun 11, 2025
    • [ACL 2025] Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints
      Python
      11900Updated May 23, 2025May 23, 2025
    • HPSS

      Public
      HPSS: Heuristic Prompting Strategy Search for LLM Evaluators (ACL 2025 Findings)
      Python
      1300Updated May 23, 2025May 23, 2025
    • Python
      MIT License
      63210Updated May 22, 2025May 22, 2025
    • BARREL

      Public
      [ICLR 2026] BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs
      Python
      MIT License
      11800Updated May 21, 2025May 21, 2025
    • MAPS

      Public
      Official Implementation of ICLR25 paper "MAPS: Advancing Multi-modal Reasoning in Expert-level Physical Science"
      Python
      11000Updated Mar 12, 2025Mar 12, 2025
    • Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)
      Python
      MIT License
      1210250Updated Feb 20, 2025Feb 20, 2025
    • MiniPLM

      Public
      [ICLR 2025] MiniPLM: Knowledge Distillation for Pre-Training Language Models
      Python
      MIT License
      97750Updated Nov 23, 2024Nov 23, 2024
    • Python
      11700Updated Nov 7, 2024Nov 7, 2024
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.