trustworthy-ai

Here are 163 public repositories matching this topic...

Trusted-AI / adversarial-robustness-toolbox

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

python machine-learning privacy ai attack extraction inference artificial-intelligence evasion red-team poisoning adversarial-machine-learning blue-team adversarial-examples adversarial-attacks trusted-ai trustworthy-ai

Updated Dec 12, 2025
Python

Giskard-AI / giskard-oss

Sponsor

Star

🐢 Open-Source Evaluation & Testing library for LLM Agents

ai-security mlops fairness-ai responsible-ai ml-validation red-team-tools trustworthy-ai ml-testing llm ai-red-team ai-testing llmops llm-security llm-eval llm-evaluation rag-evaluation agent-evaluation

Updated Apr 7, 2026
Python

THU-BPM / MarkLLM

Star

[EMNLP 2024 Demo] MarkLLM: An Open-Source Toolkit for LLM Watermarking

toolkit safety watermark trustworthy-ai large-language-models llm

Updated Mar 17, 2026
Python

THUYimingLi / BackdoorBox

Star

The open-sourced Python toolbox for backdoor attacks and defenses.

backdoor-attacks trustworthy-machine-learning backdoor-learning trustworthy-ai backdoor-defenses

Updated Sep 27, 2025
Python

HowieHwong / TrustLLM

Star

[ICML 2024] TrustLLM: Trustworthiness in Large Language Models

nlp benchmark natural-language-processing ai toolkit evaluation dataset pypi-package trustworthy-machine-learning trustworthy-ai large-language-models llm

Updated Jun 24, 2025
Python

PacificAI / langtest

Star

Deliver safe & effective language models

nlp artificial-intelligence benchmarks benchmark-framework model-assessment ai-safety mlops responsible-ai ml-safety trustworthy-ai ethics-in-ai ml-testing large-language-models llm ai-testing llm-test llm-evaluation-toolkit llm-as-evaluator llm-testing

Updated Mar 26, 2026
Python

jqtangust / Robust-R1

Star

🔥🔥🔥[AAAI 2026 Oral] Official Implementation of Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding

multi-modal reasoning robustness trustworthy-ai large-language-models multimodel-large-language-model

Updated Jan 20, 2026
Python

DebarghaG / proofofthought

Star

Proof of thought : LLM-based reasoning using Z3 theorem proving with multiple backend support (SMT2 and JSON DSL)

z3 automated-reasoning trustworthy-ai llm llm-inference llm-reasoning

Updated Apr 2, 2026
Python

aiverify-foundation / moonshot

Star

Moonshot - A simple and modular tool to evaluate and red-team any LLM application.

benchmarking evaluation-framework red-teaming trustworthy-ai llm

Updated Feb 5, 2026
Python

rhesis-ai / rhesis

Star

The testing platform for AI teams. Bring engineers, PMs, and domain experts together to generate tests, simulate (adversarial) conversations, and trace every failure to its root cause.

open-source test-generation quality-assessment test-management test-execution responsible-ai trustworthy-ai generative-ai llmops llm-evaluation llm-evaluation-framework

Updated Apr 8, 2026
Python

sleeepeer / PoisonedRAG

Star

[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models

security machine-learning ai rag trustworthy-ai retrieval-augmented-generation

Updated Jan 27, 2026
Python

liuzuxin / FSRL

Star

🚀 A fast safe reinforcement learning library in PyTorch

library reinforcement-learning robotics decision-making pytorch sac safety-critical trpo ppo cpo safe-rl trustworthy-ai cvpo

Updated Sep 30, 2024
Python

yunqing-me / AttackVLM

Star

[NeurIPS-2023] Annual Conference on Neural Information Processing Systems

deep-generative-model adversarial-attack trustworthy-ai foundation-models large-language-models text-to-image-generation generative-ai vision-language-model image-to-text-generation

Updated Dec 22, 2024
Python

tsinghua-fib-lab / ANeurIPS2024_SPV-MIA

Star

[NeurIPS'24] "Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration"

membership-inference-attack trustworthy-ai large-language-models

Updated Mar 13, 2025
Python

ffhibnese / Model-Inversion-Attack-ToolBox

Star

A comprehensive toolbox for model inversion attacks and defenses, which is easy to get started.

machine-learning privacy toolbox benchmarks model-inversion model-inversion-attacks trustworthy-ai

Updated Sep 23, 2025
Python

thu-ml / MMTrustEval

Star

A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)

benchmark privacy toolbox safety multi-modal fairness robustness claude gpt-4 trustworthy-ai truthfulness mllm

Updated Jun 27, 2025
Python

TrustGen / TrustEval-toolkit

Star

[ICLR'26, NAACL'25 Demo] Toolkit & Benchmark for evaluating the trustworthiness of generative foundation models.

machine-learning deep-learning toolkit evaluation text-to-image vlm trustworthy-ai llm generative-ai

Updated Aug 22, 2025
Python

IBM / ai-privacy-toolkit

Star

A toolkit for tools and techniques related to the privacy and compliance of AI models.

python machine-learning privacy ai ml artificial-intelligence gdpr anonymization mlops ai-models trustworthy-ai

Updated Sep 17, 2025
Python

Alsace08 / Chain-of-Embedding

Star

[ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"

interpretability trustworthy-ai large-language-models mechanistic-interpretability self-evaluation hallucination-detection iclr-2025

Updated Dec 19, 2024
Python

qitianwu / GraphOOD-GNNSafe

Star

The official implementation for ICLR23 paper "GNNSafe: Energy-based Out-of-Distribution Detection for Graph Neural Networks"

deep-learning pytorch artificial-intelligence outlier-detection label-propagation geometric-deep-learning node-classification graph-neural-networks anamoly-detection pytorch-geometric out-of-distribution-detection large-graph trustworthy-ai distribution-shift out-of-distribution-generalization

Updated Jul 27, 2023
Python

Improve this page

Add a description, image, and links to the trustworthy-ai topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the trustworthy-ai topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trustworthy-ai

Here are 163 public repositories matching this topic...

Trusted-AI / adversarial-robustness-toolbox

Giskard-AI / giskard-oss

THU-BPM / MarkLLM

THUYimingLi / BackdoorBox

HowieHwong / TrustLLM

PacificAI / langtest

jqtangust / Robust-R1

DebarghaG / proofofthought

aiverify-foundation / moonshot

rhesis-ai / rhesis

sleeepeer / PoisonedRAG

liuzuxin / FSRL

yunqing-me / AttackVLM

tsinghua-fib-lab / ANeurIPS2024_SPV-MIA

ffhibnese / Model-Inversion-Attack-ToolBox

thu-ml / MMTrustEval

TrustGen / TrustEval-toolkit

IBM / ai-privacy-toolkit

Alsace08 / Chain-of-Embedding

qitianwu / GraphOOD-GNNSafe

Improve this page

Add this topic to your repo