RAG Systems — Learning Track

Stage 1: What is RAG?

You’ll know this when… you can explain why RAG exists, when it’s better than fine-tuning or long context windows, and sketch out how a basic RAG pipeline works.

Key Concepts

The problem RAG solves: giving LLMs access to your specific data without retraining
RAG vs. fine-tuning vs. large context windows — when to use each
The basic pipeline: chunk → embed → store → retrieve → generate
Why “garbage in, garbage out” applies to retrieval more than generation
Real-world use cases: documentation search, customer support, knowledge bases

Recommended Resources

Practice Project

Analyze your own data for RAG potential. Take your Intelligence Hub briefings from src/content/intelligence/ and answer: How would you chunk them? By article? By section? By sentence? Write out a chunking strategy document with examples from 3 real briefings, explaining your rationale.

Stage 2: Embeddings & Vector Databases

You’ll know this when… you understand what embeddings are, can generate them via API, and can store/query them in a vector database.

Key Concepts

Embeddings: turning text into numbers that capture meaning
Similarity search: cosine similarity, nearest neighbors
Vector database options: Neon pgvector, Pinecone, Chroma, Weaviate
Embedding models: Voyage AI, OpenAI, Cohere
Chunking strategies: fixed-size, semantic, recursive splitting
Metadata filtering — combining vector search with traditional filters

Recommended Resources

Practice Project

Embed your briefings into Neon. Use Voyage AI (or another embedding API) to generate embeddings for each briefing item across all your Intelligence Hub data. Store them in a Neon Postgres database with pgvector. Write a script that takes a natural language query and returns the 5 most relevant briefing items. Test with: “What happened with Claude recently?” and “MCP server updates.”

Stage 3: Building a RAG Pipeline

You’ll know this when… you can build a complete RAG system: user asks a question, you retrieve relevant context, and Claude generates an answer grounded in your data.

Key Concepts

The retrieval step: query → embed → search → rank → select top-k
Context window management: how much retrieved context to include
Prompt design for RAG: instructing Claude to use only the provided context
Handling “I don’t know” — when retrieved context doesn’t answer the question
Citation and source attribution in generated answers
Hybrid search: combining vector similarity with keyword matching

Recommended Resources

Practice Project

Build “Ask the Intelligence Hub.” Create an endpoint or CLI tool where you can ask questions about AI trends and get answers grounded in your briefing history. Pipeline: embed the question → search your Neon vector DB → take top 5 results → send to Claude with the prompt “Answer based only on these briefing items, cite your sources.” Test with 10 questions and evaluate answer quality.

Stage 4: Evaluation & Optimization

You’ll know this when… you can measure RAG system quality, identify failure modes, and systematically improve retrieval and generation.

Key Concepts

RAG evaluation metrics: retrieval precision/recall, answer faithfulness, relevance
Common failure modes: wrong chunks retrieved, hallucination despite context, lost in the middle
Chunking optimization: testing different sizes and overlap strategies
Re-ranking: using a second model to re-score retrieved results
Contextual retrieval: prepending chunk-level context before embedding
Evaluation frameworks: RAGAS, manual scoring rubrics

Recommended Resources

Practice Project

Evaluate and improve your “Ask the Hub” system. Create a test set of 20 questions with known answers (manually verified from your briefings). Run your RAG pipeline on all 20, score each answer for correctness and faithfulness. Identify the worst 5 answers, diagnose why they failed (bad retrieval? bad chunking? bad prompt?), fix the root cause, and re-test. Track your score improvement.