RAGFuse — Modular Python RAG Toolkit (OSS)

A vendor‑agnostic, pluggable toolkit that lets teams swap embeddings, vector stores, and LLMs like LEGO — no rewrites, no lock‑in, production‑ready ergonomics.

I’m documenting the journey on Medium — @amanSinghRajput.media

How it started

I dove deep into RAG while building real apps. Every repo felt the same: tightly coupled to one embedding provider, one vector database, and one LLM. Swapping any piece meant rewriting the pipeline. That’s fine for demos, but it collapses under real‑world constraints like cost changes, outages, or policy shifts.

What it is

RAGFuse abstracts RAG complexity through a layered architecture with clear separation of concerns. Use high‑level APIs for quick wins, or compose lower‑level components for customization.

Document Processing Pipeline: Convert docs to searchable embeddings
Vector Storage Layer: Unified interface across vector DB backends
Retrieval–Generation Chain: Semantic search + LLM generation

Today

Providers: Embeddings (Hugging Face default all‑MiniLM‑L6‑v2), Vector Stores (Chroma with persist dir + filters; Pinecone v3 serverless with namespace/metadata; Weaviate Cloud with BM25 + with_hybrid), LLMs (OpenAI chat).
Docs: TXT, PDF (pypdf), DOCX (python‑docx) with chunking.
Interfaces: Python API + CLI + FastAPI with parity (/ingest, /query, /delete, /purge).
Tests: Unit tests stub network (OpenAI/Chroma) — fast, deterministic CI.

What makes it different

Interfaces, not vendors: EmbeddingProvider, VectorStore, LLMProvider, DocumentProcessor are tiny contracts. Swap implementations in minutes.
Separation of concerns: Ingestion → storage → retrieval → generation are independent layers for maintainability.
Production‑first: Error handling, persistence, filters, and testability are built in from day one.
Contribution‑ready: New providers need a small adapter + stubbed tests — simple to extend.

Where we are now (v0)

HF sentence‑transformers (MiniLM‑L6‑v2), ChromaDB with persistence
PDF/TXT/DOCX processors with chunking
End‑to‑end retrieval + generation via RAGPipeline

from ragfuse.pipeline import RAGPipeline

rag = RAGPipeline()
rag.add_documents(["mydoc.pdf", "Some inline text"])
resp = rag.ask("What's in the document?")
print(resp.answer)

Why it exists

Reality > demos: Real apps evolve. Teams try new vectors/models/providers.
No lock‑in: Cost/policy/outage risks — one‑vendor stacks become incidents.
Developer joy: Small clean interfaces → faster iteration, fewer rewrites.

Components

RAGPipeline: add_documents(), ask(), configure(), get_similar_documents()
EmbeddingProvider: HuggingFace, OpenAI, Cohere
VectorStore: ChromaStore, PineconeStore, WeaviateStore
DocumentProcessor: PDF, Text, Markdown, Docx
LLMProvider: OpenAI, HF, Anthropic; Chains: QA, Summarization, Compliance, Custom

Quick start (CLI)

# Ingest a file into Chroma (local persist)
export RAGFUSE_VECTOR_STORE_PROVIDER=chroma
poetry run ragfuse ingest docs/sample.txt --collection demo --persist-dir ./chroma_db

# Ask a question
poetry run ragfuse ask "What is RAG?" --k 5 --collection demo --persist-dir ./chroma_db

# Weaviate (hybrid)
export RAGFUSE_VECTOR_STORE_PROVIDER=weaviate
export RAGFUSE_VECTOR_STORE_WEAVIATE_URL="https://<cluster>.<region>.weaviate.cloud"
export RAGFUSE_VECTOR_STORE_WEAVIATE_API_KEY="<wcs_api_key>"
export RAGFUSE_VECTOR_STORE_WEAVIATE_CLASS="RAGFuseDocuments"
poetry run ragfuse ingest docs/sample.txt
poetry run ragfuse ask "What is RAG?" --hybrid

# Pinecone v3 (serverless)
export RAGFUSE_VECTOR_STORE_PROVIDER=pinecone
export RAGFUSE_VECTOR_STORE_PINECONE_API_KEY="pcsk_***"
export RAGFUSE_VECTOR_STORE_PINECONE_INDEX="ragfuse-dev"
export RAGFUSE_VECTOR_STORE_PINECONE_NAMESPACE="default"
export RAGFUSE_VECTOR_STORE_PINECONE_ENVIRONMENT="us-east-1"
poetry run ragfuse ingest docs/sample.txt
poetry run ragfuse ask "What is RAG?" --k 5

Python API

from ragfuse import RAGPipeline

pipe = RAGPipeline(embedding_provider="huggingface", vector_store="chroma", llm_provider="openai")
pipe.add_documents(["/docs/policies.pdf", "/docs/guide.md"]) 
print(pipe.ask("What is the leave policy?"))

CLI and REST server also supported for ingestion and query flows. Swagger at /docs.

Resilience & Security

Typed exceptions: DocumentProcessingError, EmbeddingError, VectorStoreError, LLMError, ConfigurationError
Retry with jitter, circuit breaker, structured logging
Secrets hygiene, TLS everywhere, PII controls, audit logs, RBAC

Performance

Batch embeddings with GPU acceleration, model caching, quantization
Approximate search (HNSW), filter‑first retrieval, hot‑query caching
Streaming/chunked processing; connection pooling; memory monitoring

Deployment

# docker-compose.yml
services:
  ragfuse:
    build: .
    ports:
      - "8000:8000"
    environment:
      - VECTOR_STORE=chroma
      - EMBEDDING_MODEL=all-MiniLM-L6-v2
    volumes:
      - ./data:/app/data

# Kubernetes (excerpt)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ragfuse-api
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: ragfuse
        image: ragfuse:latest
        env:
        - name: VECTOR_STORE
          value: "pinecone"

Testing

Unit + integration tests with pytest; coverage target ≥80%.

# Run all tests
pytest

# Coverage
pytest --cov=ragfuse

# Specific module
pytest tests/test_embeddings.py

Why open‑source?

Flexibility: standardize interfaces, not vendors.
Trust: transparent code paths for security & privacy reviews.
Community: adapters and patterns evolve with real use‑cases.

Publishing soon. Follow for updates:

Status

Status: Building in public. First release: Chroma/FAISS, HF/OpenAI embeddings, QA chain, CLI, docs, examples. Pinecone and Weaviate adapters next.

← Home ← Back to all work