Skip to content

Learning LLM Systems by Building

introduction

This organization is a collection of projects created during my journey of learning Large Language Models (LLMs), retrieval systems, and AI infrastructure.

Instead of only studying theory, I try to learn by building real systems — from inference servers to RAG pipelines and evaluation frameworks.


What I'm Exploring

  • How to serve models efficiently (GPU / TensorRT / batching)
  • How to route and manage multiple LLMs
  • How retrieval works (dense / sparse / hybrid / multi-vector)
  • How to evaluate LLM outputs and reduce hallucination
  • How to design practical LLM applications

Project Overview

These projects are not perfect or production-ready — they reflect my learning process and experiments.

Inference & Serving

Routing

Retrieval & RAG

  • Tiny-RAGFlow
    A lightweight RAG framework to understand hybrid retrieval and reranking.

Tools

  • LLM Tools
    A unified interface for interacting with LLMs, embeddings, and rerankers.

Data Processing

  • file2md
    Converting different file formats into Markdown for downstream LLM usage.

Evaluation

  • llm-evals
    Experimenting with LLM evaluation and LLM-as-a-judge approaches.

Research Exploration

ML + Database

  • ML2SQL
    Exploring how ML models can run directly inside databases using SQL.

Why This Exists

I believe the best way to understand LLM systems is to:

Build them piece by piece.

Each repository focuses on a different part of the stack, and together they form a rough picture of how modern LLM systems work.


Still Learning

This is an ongoing journey.
Many things are incomplete, naive, or experimental — and that’s intentional.

If you’re also learning, feel free to explore, use, or build on top of these projects.

Contact

If you have any questions, ideas, or just want to chat about LLMs, feel free to:

  • Open an issue
  • Or reach out via email

milk333445@gmail.com

Pinned Loading

  1. file2md file2md Public

    file2md is a versatile tool for converting multiple file formats to Markdown.

    Python 5

  2. TensorrtServer TensorrtServer Public

    A high-performance deep learning model inference server based on TensorRT, supporting fast inference for Embedding, Reranker, and NLI models.

    Python 5

  3. ML2SQL ML2SQL Public

    Compile tree-based machine learning models into SQL inference queries, enabling model predictions to run directly inside the database.

    Python 5

  4. LLM-Router-Server LLM-Router-Server Public

    LLM Router Server is a high-performance routing service designed for multi-model deployment scenarios, used to uniformly manage and orchestrate multiple local Large Language Model (LLM) services, E…

    Python 6 1

Repositories

Showing 10 of 10 repositories
  • LLM-Router-Server-Dashboard Public

    One-Stop LLM Model Management and Monitoring Platform

    LLMSystems/LLM-Router-Server-Dashboard’s past year of commit activity
    Python 3 MIT 0 0 0 Updated Apr 9, 2026
  • BehaviorRL-Hallucination Public

    Learning When to Answer: Behavior-Oriented Reinforcement Learning for Hallucination Mitigation

    LLMSystems/BehaviorRL-Hallucination’s past year of commit activity
    Python 7 MIT 0 0 0 Updated Apr 8, 2026
  • file2md Public

    file2md is a versatile tool for converting multiple file formats to Markdown.

    LLMSystems/file2md’s past year of commit activity
    Python 5 MIT 0 0 0 Updated Apr 8, 2026
  • .github Public
    LLMSystems/.github’s past year of commit activity
    0 0 0 0 Updated Mar 29, 2026
  • LLM-Router-Server Public

    LLM Router Server is a high-performance routing service designed for multi-model deployment scenarios, used to uniformly manage and orchestrate multiple local Large Language Model (LLM) services, Embedding models, Re-ranking models, and other inference services.

    LLMSystems/LLM-Router-Server’s past year of commit activity
    Python 6 MIT 1 0 0 Updated Mar 28, 2026
  • llm_tools Public

    A comprehensive Python toolkit for LLM integration with chat, embeddings, and reranking. Supports Azure OpenAI, local models, async operations, memory management, and response caching.

    LLMSystems/llm_tools’s past year of commit activity
    Python 6 MIT 0 0 0 Updated Mar 20, 2026
  • Tiny-RAGFlow Public

    Tiny-RAGFlow is a lightweight Retrieval-Augmented Generation (RAG) framework designed for quickly building efficient vector retrieval systems.

    LLMSystems/Tiny-RAGFlow’s past year of commit activity
    Python 5 MIT 0 0 0 Updated Mar 19, 2026
  • llm-evals Public

    A framework for evaluating large language models (LLMs) across a variety of tasks.

    LLMSystems/llm-evals’s past year of commit activity
    Python 4 MIT 0 0 0 Updated Mar 18, 2026
  • TensorrtServer Public

    A high-performance deep learning model inference server based on TensorRT, supporting fast inference for Embedding, Reranker, and NLI models.

    LLMSystems/TensorrtServer’s past year of commit activity
    Python 5 MIT 0 0 0 Updated Mar 18, 2026
  • ML2SQL Public

    Compile tree-based machine learning models into SQL inference queries, enabling model predictions to run directly inside the database.

    LLMSystems/ML2SQL’s past year of commit activity
    Python 5 MIT 0 0 0 Updated Mar 18, 2026

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Python

Most used topics

Loading…