A highly memory-efficient vector search engine for codebase indexing and semantic search. Built on the Google DeepMind/NYU research paper Online Vector Quantization with Near-optimal Distortion Rate (ICLR 2026), it compresses vector embeddings (up to 14.2x RAM reduction) while maintaining dot-product accuracy via a 1-bit Quantized Johnson-Lindenstrauss (QJL) residual correction.
This system allows developers and AI coding agents to semantically search massive, multi-file codebases in real-time with negligible memory footprints, avoiding the limitations of LLM context windows and Out-Of-Memory (OOM) errors.
π Live Landing Page & RAM Simulator: https://blackmoon87.github.io/TurboQuantex/
All resources are completely isolated inside the .TurboQuantex/ folder, and the ONNX model files are stored in a shared global directory (~/.turboquantex/model) to prevent duplicate downloads across multiple projects.
[Your Project Root]
βββ .TurboQuantex/
βββ app.py # Flask Daemon (API & File Watcher)
βββ turboquantex.py # Core Compression Math (PolarQuant & QJL)
βββ tq.py # CLI: Index, Search, Update, Stats & Batch Search
βββ turboquantex_skill.py # Reusable AI Agent Programmatic Skill API
βββ example_usage.py # Demo script showing full indexing/updating lifecycle
βββ setup.bat # Windows Environment & Dependency Setup Script
βββ setup.sh # macOS/Linux Environment & Dependency Setup Script
βββ index.tq # Compressed codebase vector index (auto-generated)
βββ example_project/ # Sample codebase directory for demonstration
βββ app/Http/Controllers/UserController.php
βββ scripts/data_processor.py
βββ README.md
If you are working with an AI coding assistant (such as Cursor, VS Code AI, or Antigravity) inside a project, simply type this single prompt:
active & use skills : https://github.com/blackmoon87/TurboQuantex
The AI agent will autonomously:
.TurboQuantex folder into your project root.~/.turboquantex/model). If not, it downloads it once globally.If you prefer to configure the environment manually via the terminal, use the automated setup scripts provided inside the folder:
Run:
.\.TurboQuantex\setup.bat
Run:
chmod +x .TurboQuantex/setup.sh
./.TurboQuantex/setup.sh
Start the Flask daemon to enable three capabilities simultaneously:
python .TurboQuantex/app.py
What happens when you start the daemon:
| Capability | Description |
|---|---|
| Embedding Cache | The ONNX model stays loaded in RAM β searches drop from ~3s (cold) to <9ms (warm) |
| File Watcher | A background thread polls the project directory every 10 seconds. When files change, it waits 5 seconds (debounce) then auto-updates the vector index β no manual commands needed |
| API Server | REST API on http://127.0.0.1:59402 for health checks and programmatic queries |
To verify the daemon and file watcher status:
curl http://127.0.0.1:59402/api/health
Response:
{
"status": "ok",
"uptime_seconds": 142.3,
"model_loaded": true,
"index_detected": true,
"file_watcher": {
"active": true,
"changes_detected": 5,
"status": "Successfully updated. Indexed 2 files, removed 0 files."
}
}
tq.py)A terminal utility to scan, chunk, index, search, and update large directories recursively.
To scan a directory, generate embeddings, compress them, and save the binary database:
python .TurboQuantex/tq.py index --dir . --index .TurboQuantex/index.tq
| Flag | Description |
|---|---|
--dir |
Directory containing source code to scan |
--index |
Path for the output compressed index file (.tq) |
--bits |
Quantization bits: 2, 3, 4, or auto (adaptive) |
--use-qjl |
Enable 1-bit QJL residual correction (default: True) |
--qjl-dim |
QJL sketch dimension: 64, 128, or 256 (default: 128) |
--chunk-size |
Maximum characters per code chunk (default: 1200) |
--overlap |
Character overlap between chunks (default: 200) |
--extensions |
Comma-separated extensions filter (e.g., .py,.php,.js) |
Text output (human readable):
python .TurboQuantex/tq.py search --index .TurboQuantex/index.tq --query "database insert user record" --top-k 5
JSON output (for AI agents and scripts):
python .TurboQuantex/tq.py search --index .TurboQuantex/index.tq --query "database insert user record" --top-k 5 --format json
Response:
{
"status": "success",
"results": [
{
"file_path": "app/Models/User.php",
"start_line": 45,
"end_line": 78,
"score": 0.8432,
"language": "php",
"scope": "function insertUser",
"text": "..."
}
]
}
Filter by programming language:
python .TurboQuantex/tq.py search --index .TurboQuantex/index.tq --query "auth logic" --language python --format json
Run multiple queries with a single index load β ideal for AI agents exploring a codebase:
# Comma-separated queries
python .TurboQuantex/tq.py search-batch --index .TurboQuantex/index.tq --queries "auth logic,database connection,file upload" --top-k 3 --format json
# Or from a file (one query per line)
python .TurboQuantex/tq.py search-batch --index .TurboQuantex/index.tq --queries queries.txt --top-k 3 --format json
When code files change, only modified or new files are re-indexed. Unmodified files are loaded instantly from the cache:
python .TurboQuantex/tq.py update --dir . --index .TurboQuantex/index.tq --format json
Note: If the daemon with file watcher is running, updates happen automatically β you donβt need this command.
View index statistics and compression metrics:
# Human-readable
python .TurboQuantex/tq.py stats --index .TurboQuantex/index.tq
# JSON output
python .TurboQuantex/tq.py stats --index .TurboQuantex/index.tq --format json
JSON response:
{
"file_path": "index.tq",
"version": 2,
"model_id": "all-MiniLM-L6-v2",
"total_chunks": 86,
"dimensions": 384,
"bits": 4,
"compression_ratio": 7.11,
"savings_percent": 85.94,
"original_bytes": 132096,
"compressed_bytes": 18576,
"disk_bytes": 156287
}
Register an auto-update git hook so the index updates on every commit:
python .TurboQuantex/tq.py install-hook
turboquantex_skill.py)Other scripts or AI agents can import the codebase search skill programmatically:
import sys
sys.path.append('./.TurboQuantex')
from turboquantex_skill import index_codebase, query_codebase, update_codebase, query_codebase_batch
# 1. Full codebase indexing (defaults to adaptive bits)
stats = index_codebase(dir_path=".", index_file=".TurboQuantex/index.tq", bits="auto")
print(f"Compressed RAM Footprint: {stats['disk_size_kb']} KB")
# 2. Semantic query β returns file_path, start_line, end_line, score, language, scope
matches = query_codebase(index_file=".TurboQuantex/index.tq", query="password hashing logic", top_k=3)
for m in matches:
print(f"{m['file_path']}:{m['start_line']} [{m['language']}] score={m['score']:.4f}")
# 3. Batch query β multiple queries, single index load
batch = query_codebase_batch(
index_file=".TurboQuantex/index.tq",
queries=["auth middleware", "database connection", "file upload handler"],
top_k=3
)
for query, results in batch.items():
print(f"\n--- {query} ---")
for r in results:
print(f" {r['file_path']}:{r['start_line']} ({r['score']:.4f})")
# 4. Incremental update after modifying files
update_stats = update_codebase(dir_path=".", index_file=".TurboQuantex/index.tq")
print(update_stats['status'])
TurboQuantex uses a versioned index format to prevent silent corruption:
| Field | Description |
|---|---|
version |
Index format version (current: 2). Incremented when the data schema changes |
model_id |
Embedding model identifier (all-MiniLM-L6-v2). Ensures search results are consistent |
If you update TurboQuantex and the index format has changed, youβll get a clear error message:
Error: Index was created with version 1, current version is 2.
Please re-index with: python .TurboQuantex/tq.py index --dir . --index .TurboQuantex/index.tq
Every indexed code chunk carries rich metadata:
| Field | Description | Example |
|---|---|---|
file_path |
Relative path to the source file | app/Models/User.php |
start_line |
First line number of the chunk | 45 |
end_line |
Last line number of the chunk | 78 |
language |
Auto-detected programming language from file extension | python, php, javascript |
scope |
Active function/class context at the chunk boundary | def process_payment |
score |
Cosine similarity score (0 to 1) | 0.8432 |
Language detection covers 18+ extensions natively. Unknown extensions fall back to "unknown".
We have packaged a demo script example_usage.py that runs through the complete lifecycle. Run the script from the command line:
python .TurboQuantex/example_usage.py
This will:
example_project directory.The simplest setup β start the daemon once and forget about it:
python .TurboQuantex/tq.py index --dir . --index .TurboQuantex/index.tq
python .TurboQuantex/app.py
When working with an AI coding assistant (Cursor, VS Code AI, or Antigravity), the agent automatically discovers and uses TurboQuantex:
.cursorrules or important_instruction_4coder_agent.md at the project root.turboquantex_skill.py and runs queries:
from turboquantex_skill import query_codebase
results = query_codebase(index_file=".TurboQuantex/index.tq", query="database connection settings")
For developers who prefer commit-triggered updates:
python .TurboQuantex/tq.py install-hook
git commit -m "feat: add user login endpoint"
Use JSON output for automated analysis:
# Check index health in CI
python .TurboQuantex/tq.py stats --index .TurboQuantex/index.tq --format json | jq '.compression_ratio'
# Batch-search for code patterns
python .TurboQuantex/tq.py search-batch --index .TurboQuantex/index.tq --queries "hardcoded password,SQL injection,eval(" --format json
Measured on a standard development machine (no GPU required):
| Metric | Value |
|---|---|
| Compression throughput | 11,489 vectors/sec |
| Search throughput | 19,505 vectors/sec |
| Per-chunk search latency | ~53 Β΅s |
| Warm query (86 chunks) | 8.9 ms |
| Cold query (model load + search) | 342 ms |
| Batch 5 queries (warm) | 38.4 ms (7.7 ms/query) |
| 1-file incremental update (warm) | ~50 ms |
| Compression ratio (4-bit + QJL) | 7.11x |
| RAM savings | 85.94% |
| Pearson correlation (fidelity) | 0.912 |
When the daemon is running on http://127.0.0.1:59402:
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/health |
Daemon health, file watcher status, uptime |
GET |
/api/status |
In-memory document count and compression stats |
POST |
/api/local_query |
Search a .tq index file via API |
POST |
/api/index |
Index a text document into in-memory store |
POST |
/api/search |
Search in-memory documents |
GET |
/api/config |
Current engine configuration |
POST |
/api/embed |
Generate embedding for text |
POST |
/api/reset |
Clear in-memory document store |
curl -X POST http://127.0.0.1:59402/api/local_query \
-H "Content-Type: application/json" \
-d '{"query": "user authentication", "index_file": ".TurboQuantex/index.tq", "top_k": 3}'