All articles

RAG and embeddings: how project search finds the right moment

Background vectors, hybrid search, and why you don't send whole transcripts every time

Project chat in Scriba can span dozens of linked meetings. Sending every transcript to the model on every message would be slow, expensive, and noisy. Instead we use retrieval-augmented generation: a background pipeline turns your content into vectors, and at query time we fetch only the passages that matter.

The embedding pipeline

When transcript segments, summaries, knowledge items, or notes change, Scriba enqueues work in embedding_queue. A background worker drains that queue, calls the embedding API, and stores 1536-dimensional float vectors in the local embeddings table. Nothing blocks the UI — indexing happens continuously while you work.

  • Managed mode routes embed requests through Supabase's embed edge function (Voyage-backed).
  • BYOK mode calls OpenAI text-embedding-3-small directly with your key.
  • Sliding windows — five segments with a step of three — capture multi-utterance context, not isolated sentences.
  • If no embedding key is available, search falls back to FTS5 keyword retrieval only.

When you ask a project question, Scriba does not grep the whole corpus blindly. semantic_search combines vector similarity (cosine distance on stored embeddings) with SQLite FTS5 full-text search (BM25 ranking). Vector search finds paraphrases and conceptual matches — "budget freeze" still surfaces a segment that said "we're pausing spend." Keyword search catches exact names, ticket IDs, and acronyms embeddings sometimes smear.

RAG is not magic memory. It is a budget: pull the best N chunks, inject them into the system prompt, and let the model reason over focused evidence instead of a hundred thousand tokens of transcript.

How it shows up in chat

Project chat system prompts are tiered. Persistent project Brain memory provides the long arc. RAG retrieval adds meeting-specific evidence for the current question. The saved project summary sits outside the retrieval budget so the model can update it in place. Meeting chat uses Brain memory per meeting; project chat is where hybrid search earns its keep.

Reindex and status

You can inspect pipeline health from the app — queue depth, indexed chunk counts, last error. trigger_reindex forces a full pass when you bulk-import old recordings or fix a bad transcript. Managed embeddings share the same quota envelope as other AI features; BYOK users pay OpenAI directly. Either way, vectors stay in your local SQLite — not in a shared multi-tenant search index on our servers.

The goal is simple: ask "what did we decide about the API versioning?" across six sprints of standups and get an answer grounded in the actual standup, not a hallucinated consensus. Embeddings are the index; hybrid search is the librarian.

Keep reading