❤️❤️❤️ Does my content help you? Consider a small donation if this blog helped you❤️❤️❤️

2026-Apr

Retrieval Systems AI

I was building a docs assistant and kept seeing two-tower retrieval and RAG used as if they were the same thing.
They are related, but they solve different layers of the stack, so this is the quick mental model I use.

1. Two-tower embedding model (dual encoder)

# Query tower and document tower produce vectors in the same space.
q_vec = query_tower.encode("how to reset password")
doc_vecs = document_tower.encode_batch(documents)

# Fast retrieval via vector similarity.
scores = cosine_similarity(q_vec, doc_vecs)
top_k_docs = top_k(documents, scores, k=5)

A two-tower model is about efficient retrieval at scale.
I can precompute document embeddings once, then only encode the query at runtime and run nearest-neighbor search.

Typical use cases:

Search candidate retrieval
Recommendations (products, ads, videos)
First-stage ranking over very large corpora

Docs: Vector search overview (Azure AI Search)

2. RAG (Retrieval-Augmented Generation)

# 1) Retrieve relevant chunks
chunks = retriever.search("how do I rotate storage keys?", k=5)

# 2) Ground the prompt with retrieved context
prompt = build_prompt(question="how do I rotate storage keys?", context=chunks)

# 3) Generate answer with an LLM
answer = llm.generate(prompt)

RAG is a system pattern: retrieve context first, then generate an answer grounded in that context.
It improves factuality and makes answers reflect your own documents instead of only model pretraining.

Typical use cases:

Chatbot that can answer questions about some unstructured data:

Internal knowledge chat
Question answering over private docs

Support assistants with current company data

Docs: Retrieval Augmented Generation (RAG) in Azure AI Search

3. How they fit together

User query
  -> Two-tower retrieval (fast candidate search)
  -> Optional reranker
  -> RAG prompt assembly
  -> LLM answer

A practical setup is to use a two-tower model for fast retrieval and then pass the best chunks into a RAG pipeline.
So: two-tower is mostly about finding relevant content quickly, and RAG is about using that content to produce better answers.

Topic	Two-tower model	RAG
Main goal	Fast relevance retrieval	Grounded answer generation
Core output	Ranked docs/items	Natural language answer
Runtime pattern	Encode query + vector search	Retrieve + prompt + generate
Common pairing	Used as retriever	Uses retriever output

Docs: What is Azure AI Search? and Use embeddings and vector search

❤️❤️❤️ Does my content help you? Consider a small donation if this blog helped you❤️❤️❤️