Vectorless is an ultra-performant reasoning-native document intelligence engine for AI, with the core written in Rust. It transforms documents into rich semantic trees and uses LLMs to intelligently traverse the hierarchy — retrieving the most relevant content through structural reasoning and deep contextual understanding.
⭐ Drop a star to help us grow!
Technical Manual (root)
├── Chapter 1: Introduction
├── Chapter 2: Architecture
│ ├── 2.1 System Design
│ └── 2.2 Implementation
└── Chapter 3: API Reference
Each node gets an AI-generated summary, enabling fast navigation.
When you ask "How do I reset the device?":
- Analyze — Understand query intent and complexity
- Navigate — LLM guides tree traversal
- Retrieve — Return the exact section with context
- Verify — Check if more information is needed
| Aspect | Traditional RAG | Vectorless |
|---|---|---|
| Infrastructure | Vector DB + Embedding Model | Just LLM API |
| Document Structure | Lost in chunking | Preserved |
| Context | Fragment only | Section + surrounding context |
| Setup Time | Hours to Days | Minutes |
| Best For | Unstructured text | Structured documents |
Input:
Document: 100-page technical manual (PDF)
Query: "How do I reset the device?"
Output:
Answer: "To reset the device, hold the power button for 10 seconds
until the LED flashes blue, then release..."
Source: Chapter 4 > Section 4.2 > Reset Procedure
✅ Good fit:
- Technical documentation
- Manuals and guides
- Structured reports
- Policy documents
- Any document with clear hierarchy
❌ Not ideal:
- Unstructured text (tweets, chat logs)
- Very short documents (< 1 page)
- Pure Q&A datasets without structure
Python
pip install vectorlessfrom vectorless import Engine, IndexContext
# Create engine (uses OPENAI_API_KEY env var)
engine = Engine(workspace="./data")
# Index a document
ctx = IndexContext.from_file("./report.pdf")
doc_id = engine.index(ctx)
# Query
result = engine.query(doc_id, "What is the total revenue?")
print(f"Answer: {result.content}")Rust
[dependencies]
vectorless = "0.1"cp vectorless.example.toml ./vectorless.tomluse vectorless::Engine;
#[tokio::main]
async fn main() -> vectorless::Result<()> {
let client = Engine::builder()
.with_workspace("./workspace")
.build()?;
let doc_id = client.index("./document.pdf").await?;
let result = client.query(&doc_id,
"What are the system requirements?").await?;
println!("Answer: {}", result.content);
println!("Source: {}", result.path);
Ok(())
}| Feature | Description |
|---|---|
| Zero Infrastructure | No vector DB, no embedding model — just an LLM API |
| Multi-format Support | PDF, Markdown, DOCX, HTML out of the box |
| Incremental Updates | Add/remove documents without full re-index |
| Traceable Results | See the exact navigation path taken |
| Feedback Learning | Improves from user feedback over time |
| Multi-turn Queries | Handles complex questions with decomposition |
Just set OPENAI_API_KEY and you're ready to go:
export OPENAI_API_KEY="sk-..."Python
from vectorless import Engine
# Uses OPENAI_API_KEY from environment
engine = Engine(workspace="./data")Rust
use vectorless::Engine;
let client = Engine::builder()
.with_workspace("./workspace")
.build().await?;| Variable | Description |
|---|---|
OPENAI_API_KEY |
LLM API key |
VECTORLESS_MODEL |
Default model (e.g., gpt-4o-mini) |
VECTORLESS_ENDPOINT |
API endpoint URL |
VECTORLESS_WORKSPACE |
Workspace directory |
For fine-grained control, use a config file:
cp config.toml ./vectorless.tomlPython
from vectorless import Engine
# Use full configuration file
engine = Engine(config_path="./vectorless.toml")
# Or override specific settings
engine = Engine(
config_path="./vectorless.toml",
model="gpt-4o", # Override model from config
)Rust
use vectorless::Engine;
// Use full configuration file
let client = Engine::builder()
.with_config_path("./vectorless.toml")
.build().await?;
// Or override specific settings
let client = Engine::builder()
.with_config_path("./vectorless.toml")
.with_model("gpt-4o", None) // Override model
.build().await?;Later overrides earlier:
- Default configuration
- Auto-detected config file (
vectorless.toml,config.toml,.vectorless.toml) - Explicit config file (
config_path/with_config_path) - Environment variables
- Constructor/builder parameters (highest priority)
- Index Pipeline — Parses documents, builds tree, generates summaries
- Retrieval Pipeline — Analyzes query, navigates tree, returns results
- Pilot — LLM-powered navigator that guides retrieval decisions
- Metrics Hub — Unified observability for LLM calls, retrieval, and feedback
See the examples/ directory for more usage patterns.
Contributions welcome! If you find this useful, please ⭐ the repo — it helps others discover it.
Apache License 2.0