All

110 repositories

lasa-multilingual-safety
Public
【ACL 2026】LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety
0•2•0•0•Updated Apr 23, 2026Apr 23, 2026
AISafetyLab
Public
AISafetyLab: A comprehensive framework covering safety attack, defense, evaluation and paper list.
Python
•
MIT License
•15•241•0•0•Updated Apr 21, 2026Apr 21, 2026
EmbodiedAct
Public
MATLAB
•1•0•0•0•Updated Apr 4, 2026Apr 4, 2026
CROPI
Public
[ACL'26] Official Repository for for paper "Data-Efficient RLVR via Off-Policy Influence Guidance"
Python
•0•20•0•0•Updated Mar 29, 2026Mar 29, 2026
IF-RewardBench
Public
IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation (ACL 2026)
Python
•0•4•1•0•Updated Mar 9, 2026Mar 9, 2026
Survive-at-All-Costs
Public
Survive at All Costs: Exploring LLM's Risky Behaviors under Survival Pressure
Python
•
MIT License
•1•6•1•0•Updated Mar 6, 2026Mar 6, 2026
IF-CRITIC
Public
IF-CRITIC: Towards a Fine-Grained LLM Critic for Instruction-Following Evaluation (ACL 2026)
Shell
•0•6•2•0•Updated Feb 6, 2026Feb 6, 2026
MIR-SafetyBench
Public
MIR-SafetyBench: Evaluating Multi-image Reasoning Safety of Multimodal Large Language Models
Python
•
MIT License
•0•5•0•0•Updated Jan 22, 2026Jan 22, 2026
JPS
Public
[MM'25] JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering
Python
•4•20•2•0•Updated Dec 23, 2025Dec 23, 2025
Glyph
Public
Official Repository for "Glyph: Scaling Context Windows via Visual-Text Compression"
Python
•50•587•9•0•Updated Nov 4, 2025Nov 4, 2025
SocialEval
Public
[ACL'25] SocialEval: Evaluating Social Intelligence of Large Language Models
Python
•
MIT License
•0•12•0•0•Updated Oct 20, 2025Oct 20, 2025
CogFlow
Public
Think Socially via Cognitive Reasoning
Python
•1•6•0•0•Updated Oct 2, 2025Oct 2, 2025
CharacterGLM-6B
Public
[EMNLP'24] CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language Models
Python
•
Apache License 2.0
•36•501•4•0•Updated Oct 2, 2025Oct 2, 2025
Crisp
Public
[EMNLP'25] Crisp: Cognitive Restructuring of Negative Thoughts through Multi-turn Supportive Dialogues
Python
•0•11•1•0•Updated Sep 2, 2025Sep 2, 2025
LRM-Safety-Study
Public
Python
•
MIT License
•0•6•0•0•Updated Aug 15, 2025Aug 15, 2025
Agent-SafetyBench
Public
Python
•
MIT License
•9•137•1•0•Updated Aug 11, 2025Aug 11, 2025
CharacterBench
Public
[AAAI'25] CharacterBench: Benchmarking Character Customization of Large Language Models
Python
•1•22•0•0•Updated Aug 1, 2025Aug 1, 2025
ShieldVLM
Public
Python
•0•8•0•0•Updated Jul 31, 2025Jul 31, 2025
SafetyBench
Public
Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]
Python
•
MIT License
•14•287•0•0•Updated Jul 28, 2025Jul 28, 2025
VPO
Public
Python
•
Apache License 2.0
•1•25•2•0•Updated Jul 20, 2025Jul 20, 2025
LongSafety
Public
[ACL 2025] LongSafety: Evaluating Long-Context Safety of Large Language Models
Python
•
MIT License
•0•16•0•0•Updated Jun 18, 2025Jun 18, 2025
SPaR
Public
Python
•
Apache License 2.0
•3•47•1•0•Updated Jun 11, 2025Jun 11, 2025
TransferAttack
Public
[ACL 2025] Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints
Python
•1•19•0•0•Updated May 23, 2025May 23, 2025
HPSS
Public
HPSS: Heuristic Prompting Strategy Search for LLM Evaluators (ACL 2025 Findings)
Python
•1•3•0•0•Updated May 23, 2025May 23, 2025
Backdoor-Data-Extraction
Public
Python
•
MIT License
•6•32•1•0•Updated May 22, 2025May 22, 2025
BARREL
Public
[ICLR 2026] BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs
Python
•
MIT License
•1•18•0•0•Updated May 21, 2025May 21, 2025
MAPS
Public
Official Implementation of ICLR25 paper "MAPS: Advancing Multi-modal Reasoning in Expert-level Physical Science"
Python
•1•10•0•0•Updated Mar 12, 2025Mar 12, 2025
ComplexBench
Public
Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)
Python
•
MIT License
•12•102•5•0•Updated Feb 20, 2025Feb 20, 2025
MiniPLM
Public
[ICLR 2025] MiniPLM: Knowledge Distillation for Pre-Training Language Models
Python
•
MIT License
•9•77•5•0•Updated Nov 23, 2024Nov 23, 2024
MoralStory
Public
Python
•1•17•0•0•Updated Nov 7, 2024Nov 7, 2024

ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

thu-coai

All

All

110 repositories

lasa-multilingual-safety

AISafetyLab

EmbodiedAct

CROPI

IF-RewardBench

Survive-at-All-Costs

IF-CRITIC

MIR-SafetyBench

JPS

Glyph

SocialEval

CogFlow

CharacterGLM-6B

Crisp

LRM-Safety-Study

Agent-SafetyBench

CharacterBench

ShieldVLM

SafetyBench

VPO

LongSafety

SPaR

TransferAttack

HPSS

Backdoor-Data-Extraction

BARREL

MAPS

ComplexBench

MiniPLM

MoralStory

All

All

Repositories list

110 repositories