Skip to content
All work
RAG · Enterprise2024

Enterprise RAG Copilot for a 12k-doc Knowledge Base

A copilot for 900 internal analysts that grounds answers in 12,000 policy, legal and commercial documents, with citations they can defend to clients.

RAG · 12,400 DOCS · <800MS
RAG · Enterprise
Enterprise RAG Copilot for a 12k-doc Knowledge Base
3.1×
Analyst throughput
780ms
p95 answer latency
0
Hallucinated citations
12,400
Docs indexed
The problem

Analysts spent 40% of the week searching proprietary PDFs. Compliance refused to let any content leave the VPC.

A previous vendor shipped a chatbot that hallucinated clause numbers — one lawsuit scare later, it was torn out.

The ask was strict: grounded answers, citable page spans, zero data egress, under 1s p95.

The approach
  1. Ran a 10-day readiness sprint: document taxonomy, chunking strategy per doc type, eval set of 320 expert questions.

  2. Deployed Pinecone in the client VPC with hybrid search (BM25 + dense), page-span citations and strict answer-grounding rules.

  3. Built a LangGraph supervisor: router → retriever → verifier → answerer, with refusal when grounding confidence is low.

  4. Shipped a Next.js workspace with analyst-side tools: quoting, redlining, export to their brief templates.

Architecture
Rendering architecture diagram…
Outcome
  • 3.1x analyst throughput on briefing tasks, measured by tracked time-to-first-draft.

  • p95 answer latency under 800ms even on complex multi-doc questions.

  • 0 documented hallucinated citations in the first 8 weeks of rollout.

  • Passed internal security and legal review on first submission.

What I owned end-to-end
  • Retrieval architecture and eval harness
  • LangGraph agent design and refusal logic
  • Front-end analyst workflow with shadcn/ui
  • Vendor and cost model for OpenAI + Pinecone
Next up

Multi-Agent Ops Swarm for E-commerce Back Office