A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
Apr 9, 2026 - Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Unsloth Studio is a web UI for training and running open models like Qwen3.5, Gemma 4, DeepSeek, gpt-oss locally.
[MLsys2026]: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.
Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!
A Next-Generation Training Engine Built for Ultra-Large MoE Models
open-source healthcare ai
🚀 Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support
Deploy open-source LLMs on AWS in minutes — with OpenAI-compatible APIs and a powerful CLI/SDK toolkit.
MCore-Bridge: Providing Megatron-Core model definitions for state-of-the-art large models and making Megatron training as simple as Transformers.
What does gpt-oss tell us about OpenAI's training data?
GGUF Loader with its Agentic Mode, and floating button, ai Models | Open Source & Offline. Mistral, Deepseek, llama, gemma, qwen
agentsculptor is an experimental AI-powered development agent designed to analyze, refactor, and extend Python projects automatically. It uses an OpenAI-like planner–executor loop on top of a vLLM backend, combining project context analysis, structured tool calls, and iterative refinement. It has only been tested with gpt-oss-120b via vLLM.
Batch processing for overnight tasks with gpt-oss 20b
A local RAG + web search pipeline with gpt-oss and other similar scale models powered by llama.cpp
[AICI-26] Difficulty-Aware Adaptive Reasoning for Vietnamese VQA with GPT-OSS
This project implements a text classification system powered by Large Language Models (LLMs) running locally. The goal is to leverage the capabilities of modern LLMs to automatically categorize and label text data without relying on external APIs or manual human labeling, ensuring privacy, autonomy, and efficiency in text processing tasks.
No Hopper architecture (RTX 5090, etc.) required! <16 GB VRAM, Windows.
Add a description, image, and links to the gpt-oss topic page so that developers can more easily learn about it.
To associate your repository with the gpt-oss topic, visit your repo's landing page and select "manage topics."