RAG Tech Stack for Long-form Text

As of June 2024:

LLM: Gemini 1.5 Flash
Database: Vespa
Embeddings: text-embedding-3-small (Open AI)
Reranker: Cohere rerank 3

Relevance Scoring: Hybrid (normalized cosine similarity and bm25), over several fields
Chunking: 1024 tokens, one document with an array of chunks
Overlap: None
Retrieval: Either entire document, or best chunk n with chunk n-1 and n+1

VPS: Hetzner, cloud in Virginia, US