RAGFuse — Modular Python RAG Toolkit (OSS)
A vendor‑agnostic, pluggable toolkit that lets teams swap embeddings, vector stores, and LLMs like LEGO — no rewrites, no lock‑in, production‑ready ergonomics.
I’m documenting the journey on Medium — @amanSinghRajput.media
How it started
I dove deep into RAG while building real apps. Every repo felt the same: tightly coupled to one embedding provider, one vector database, and one LLM. Swapping any piece meant rewriting the pipeline. That’s fine for demos, but it collapses under real‑world constraints like cost changes, outages, or policy shifts.
What it is
RAGFuse abstracts RAG complexity through a layered architecture with clear separation of concerns. Use high‑level APIs for quick wins, or compose lower‑level components for customization.
- Document Processing Pipeline: Convert docs to searchable embeddings
- Vector Storage Layer: Unified interface across vector DB backends
- Retrieval–Generation Chain: Semantic search + LLM generation
Today
- Providers: Embeddings (Hugging Face default all‑MiniLM‑L6‑v2), Vector Stores (Chroma with persist dir + filters; Pinecone v3 serverless with namespace/metadata; Weaviate Cloud with BM25 +
with_hybrid
), LLMs (OpenAI chat). - Docs: TXT, PDF (pypdf), DOCX (python‑docx) with chunking.
- Interfaces: Python API + CLI + FastAPI with parity (/ingest, /query, /delete, /purge).
- Tests: Unit tests stub network (OpenAI/Chroma) — fast, deterministic CI.
What makes it different
- Interfaces, not vendors: EmbeddingProvider, VectorStore, LLMProvider, DocumentProcessor are tiny contracts. Swap implementations in minutes.
- Separation of concerns: Ingestion → storage → retrieval → generation are independent layers for maintainability.
- Production‑first: Error handling, persistence, filters, and testability are built in from day one.
- Contribution‑ready: New providers need a small adapter + stubbed tests — simple to extend.
Where we are now (v0)
- HF sentence‑transformers (MiniLM‑L6‑v2), ChromaDB with persistence
- PDF/TXT/DOCX processors with chunking
- End‑to‑end retrieval + generation via
RAGPipeline
from ragfuse.pipeline import RAGPipeline
rag = RAGPipeline()
rag.add_documents(["mydoc.pdf", "Some inline text"])
resp = rag.ask("What's in the document?")
print(resp.answer)
Why it exists
- Reality > demos: Real apps evolve. Teams try new vectors/models/providers.
- No lock‑in: Cost/policy/outage risks — one‑vendor stacks become incidents.
- Developer joy: Small clean interfaces → faster iteration, fewer rewrites.
Components
- RAGPipeline: add_documents(), ask(), configure(), get_similar_documents()
- EmbeddingProvider: HuggingFace, OpenAI, Cohere
- VectorStore: ChromaStore, PineconeStore, WeaviateStore
- DocumentProcessor: PDF, Text, Markdown, Docx
- LLMProvider: OpenAI, HF, Anthropic; Chains: QA, Summarization, Compliance, Custom
Quick start (CLI)
# Ingest a file into Chroma (local persist)
export RAGFUSE_VECTOR_STORE_PROVIDER=chroma
poetry run ragfuse ingest docs/sample.txt --collection demo --persist-dir ./chroma_db
# Ask a question
poetry run ragfuse ask "What is RAG?" --k 5 --collection demo --persist-dir ./chroma_db
# Weaviate (hybrid)
export RAGFUSE_VECTOR_STORE_PROVIDER=weaviate
export RAGFUSE_VECTOR_STORE_WEAVIATE_URL="https://<cluster>.<region>.weaviate.cloud"
export RAGFUSE_VECTOR_STORE_WEAVIATE_API_KEY="<wcs_api_key>"
export RAGFUSE_VECTOR_STORE_WEAVIATE_CLASS="RAGFuseDocuments"
poetry run ragfuse ingest docs/sample.txt
poetry run ragfuse ask "What is RAG?" --hybrid
# Pinecone v3 (serverless)
export RAGFUSE_VECTOR_STORE_PROVIDER=pinecone
export RAGFUSE_VECTOR_STORE_PINECONE_API_KEY="pcsk_***"
export RAGFUSE_VECTOR_STORE_PINECONE_INDEX="ragfuse-dev"
export RAGFUSE_VECTOR_STORE_PINECONE_NAMESPACE="default"
export RAGFUSE_VECTOR_STORE_PINECONE_ENVIRONMENT="us-east-1"
poetry run ragfuse ingest docs/sample.txt
poetry run ragfuse ask "What is RAG?" --k 5
Python API
from ragfuse import RAGPipeline
pipe = RAGPipeline(embedding_provider="huggingface", vector_store="chroma", llm_provider="openai")
pipe.add_documents(["/docs/policies.pdf", "/docs/guide.md"])
print(pipe.ask("What is the leave policy?"))
CLI and REST server also supported for ingestion and query flows. Swagger at /docs.
Resilience & Security
- Typed exceptions: DocumentProcessingError, EmbeddingError, VectorStoreError, LLMError, ConfigurationError
- Retry with jitter, circuit breaker, structured logging
- Secrets hygiene, TLS everywhere, PII controls, audit logs, RBAC
Performance
- Batch embeddings with GPU acceleration, model caching, quantization
- Approximate search (HNSW), filter‑first retrieval, hot‑query caching
- Streaming/chunked processing; connection pooling; memory monitoring
Deployment
# docker-compose.yml
services:
ragfuse:
build: .
ports:
- "8000:8000"
environment:
- VECTOR_STORE=chroma
- EMBEDDING_MODEL=all-MiniLM-L6-v2
volumes:
- ./data:/app/data
# Kubernetes (excerpt)
apiVersion: apps/v1
kind: Deployment
metadata:
name: ragfuse-api
spec:
replicas: 3
template:
spec:
containers:
- name: ragfuse
image: ragfuse:latest
env:
- name: VECTOR_STORE
value: "pinecone"
Testing
Unit + integration tests with pytest; coverage target ≥80%.
# Run all tests
pytest
# Coverage
pytest --cov=ragfuse
# Specific module
pytest tests/test_embeddings.py
Why open‑source?
- Flexibility: standardize interfaces, not vendors.
- Trust: transparent code paths for security & privacy reviews.
- Community: adapters and patterns evolve with real use‑cases.
Publishing soon. Follow for updates:
Status
Status: Building in public. First release: Chroma/FAISS, HF/OpenAI embeddings, QA chain, CLI, docs, examples. Pinecone and Weaviate adapters next.