Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
-
Updated
Dec 12, 2025 - Python
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
🐢 Open-Source Evaluation & Testing library for LLM Agents
[EMNLP 2024 Demo] MarkLLM: An Open-Source Toolkit for LLM Watermarking
The open-sourced Python toolbox for backdoor attacks and defenses.
[ICML 2024] TrustLLM: Trustworthiness in Large Language Models
Deliver safe & effective language models
🔥🔥🔥[AAAI 2026 Oral] Official Implementation of Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding
Proof of thought : LLM-based reasoning using Z3 theorem proving with multiple backend support (SMT2 and JSON DSL)
Moonshot - A simple and modular tool to evaluate and red-team any LLM application.
The testing platform for AI teams. Bring engineers, PMs, and domain experts together to generate tests, simulate (adversarial) conversations, and trace every failure to its root cause.
[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models
🚀 A fast safe reinforcement learning library in PyTorch
[NeurIPS-2023] Annual Conference on Neural Information Processing Systems
[NeurIPS'24] "Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration"
A comprehensive toolbox for model inversion attacks and defenses, which is easy to get started.
A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)
[ICLR'26, NAACL'25 Demo] Toolkit & Benchmark for evaluating the trustworthiness of generative foundation models.
A toolkit for tools and techniques related to the privacy and compliance of AI models.
[ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"
The official implementation for ICLR23 paper "GNNSafe: Energy-based Out-of-Distribution Detection for Graph Neural Networks"
Add a description, image, and links to the trustworthy-ai topic page so that developers can more easily learn about it.
To associate your repository with the trustworthy-ai topic, visit your repo's landing page and select "manage topics."