AI

Vector Search vs Semantic Search: They're Not the Same Thing

Vector search, semantic search, keyword search, hybrid search — these terms get used interchangeably but they mean different things. This post breaks down what each actually does, when each matters, and why hybrid search wins for RAG.

Alexandre Agius

Alexandre Agius

AWS Solutions Architect

12 min read
Share:

“We need semantic search” has become the default request for any project involving GenAI. But when you dig into what people actually mean, you find four different concepts being used interchangeably: keyword search, vector search, semantic search, and hybrid search. They’re related, but they’re not the same thing — and picking the wrong one for your RAG system means either missing relevant results or burning budget on unnecessary complexity.

The Problem

The terminology is a mess. Vendor marketing doesn’t help — every database with a vector column now claims to offer “semantic search.” The confusion leads to two common mistakes:

  1. Teams implement pure vector search and call it semantic search. They embed documents, run k-NN similarity, and wonder why searching for “error code E-4012” returns generic error handling docs instead of the specific error definition.

  2. Teams skip keyword search entirely because it feels old. BM25 is a 30-year-old algorithm, so it must be obsolete now that we have embeddings. Except it consistently outperforms vector search for exact-match queries — and in most enterprise datasets, users search for specific codes, IDs, and terms more often than vague concepts.

The result: RAG systems that give impressive demos but fail on real queries.

The Solution

There are four distinct search approaches, each building on the previous one. Understanding what each does — and what it misses — is the key to picking the right one.

Search taxonomy showing how keyword, vector, semantic, and hybrid search each process the same query, what they find, and what they miss

The punchline: for most RAG workloads, you want hybrid search — keyword and vector search running in parallel with score fusion. It’s the only approach that catches both exact matches and meaning-based matches. Here’s why.

How It Works

Keyword Search (BM25)

BM25 is the algorithm behind keyword search in OpenSearch, Elasticsearch, and most search engines built in the last three decades. It scores documents based on three things:

  • Term Frequency (TF) — How often does the search term appear in the chunk? More occurrences score higher, but with diminishing returns — 10 mentions isn’t 10x better than 1.
  • Inverse Document Frequency (IDF) — Is the term rare across all chunks? Rare terms score higher. “E-4012” scores much higher than “the.”
  • Document Length — Short chunks with the term score higher than long chunks with the same term. The term is more concentrated.
Query: "error code E-4012"

Chunk A (200 words): "...error code E-4012 occurs when the connection pool..."
  -> High score: exact terms present, short chunk, "E-4012" is rare (high IDF)

Chunk B (2000 words): "...various error codes include E-1001, E-2003, E-4012..."
  -> Lower score: term present but chunk is long, appears once among many

Chunk C (200 words): "...the application crashes due to timeout issues..."
  -> Zero score: none of the query terms appear

BM25 uses an inverted index — a pre-built lookup table mapping every term to the documents containing it. This makes keyword search extremely fast. No ML model, no GPU, no embedding. Just a dictionary lookup with scoring.

Catches: Exact terms, codes, IDs, product names, error codes, specific phrases.

Misses: Synonyms, paraphrases, conceptual similarity. “App keeps crashing” won’t find “system instability due to resource exhaustion” because they share no words.

Vector Search (k-NN)

Vector search converts text into a mathematical representation (a vector of floats) using an embedding model. Texts with similar meaning end up as nearby points in a high-dimensional space. At query time, you convert the question into a vector and find the k nearest neighbors.

Embedding model converts:
  "application crashes intermittently"  -> [0.023, -0.841, 0.112, ...]
  "system experiences sporadic failures" -> [0.019, -0.830, 0.098, ...]
  "error code E-4012"                   -> [0.445, 0.221, -0.667, ...]

The first two are close together (similar meaning).
The third is far away (unrelated meaning).

The k-NN search finds the closest vectors using distance metrics — cosine similarity, L2 (Euclidean), or inner product. On OpenSearch, you can choose which library performs this search:

EngineHow It WorksTrade-off
FAISSIn-memory graph (HNSW) or inverted file (IVF)Fastest, but needs RAM for vectors
LuceneDisk-based HNSW with segment cachingSlower, but much cheaper (vectors on disk)
NMSLIBIn-memory HNSWBest recall, but no filtering during search

All three are free, open-source libraries bundled into OpenSearch. The engine choice affects cost through infrastructure sizing, not licensing. For a deeper dive on engine selection, see the RAG chunking and testing guide.

Catches: Meaning, intent, conceptual similarity. “App crashes” finds “system instability.”

Misses: Specific identifiers. “E-4012” is just a string to the embedding model — it has no semantic meaning. The vector for “E-4012” might be near “E-4013” or “error code” generically, but not specifically near the chunk that explains what E-4012 is.

Semantic search is vector search plus additional intelligence layers. The term is often used loosely, but a proper semantic search system adds:

  • Query understanding — Expanding, reformulating, or enriching the query before embedding. “Lambda cold start” might be expanded to include “initialization latency” and “function startup time.”
  • Reranking — A cross-encoder model that takes each (query, result) pair and scores them together. Unlike embeddings which encode query and document independently, rerankers see both at once and produce much better relevance scores.
  • Context awareness — Using conversation history, user profile, or domain context to adjust results.
Pure vector search:
  Query -> Embed -> k-NN -> Top 5 results

Semantic search:
  Query -> Expand/Reformulate -> Embed -> k-NN -> Top 20 candidates
       -> Rerank (cross-encoder scores each pair) -> Top 5 results

Reranking alone typically improves retrieval quality by 5-15% over pure vector search. On AWS, two rerankers are available:

RerankerPricingNotes
Amazon Rerank 1.0IncludedNot available in us-east-1
Cohere Rerank 3.5$2.00/1K queriesAvailable in more regions

Catches: Everything vector search catches, but with better ranking. Fewer irrelevant results in the top positions.

Misses: Still misses exact codes and identifiers — it’s still fundamentally based on meaning, not terms.

Hybrid search runs keyword (BM25) and vector (k-NN) in parallel on the same query, then combines the scores. This is the only approach that catches both exact matches and semantic matches.

Query: "Why does error E-4012 cause the app to crash?"
          |                              |
          v                              v
    BM25 (keyword)                 k-NN (vector)
          |                              |
    Finds: "E-4012 is a              Finds: "application crashes
    DB connection pool               due to connection pool
    timeout error"                   exhaustion and retry
                                     failures"
          |                              |
          v                              v
         Score Fusion (combine & rank)
                      |
                      v
              Both chunks go to LLM

The LLM now has what E-4012 is (from keyword) and how to fix the crash (from vector). Pure vector search would have missed the E-4012 definition. Pure keyword search would have missed the crash remediation.

On AWS, OpenSearch is the only native service with built-in hybrid search — BM25 and k-NN run in a single query. If you’re using another vector store (Aurora pgvector, S3 Vectors, MemoryDB), you’d need to run keyword and vector searches separately and merge results yourself.

# OpenSearch hybrid query — single request, both engines
hybrid_query = {
    "size": 5,
    "query": {
        "hybrid": {
            "queries": [
                {
                    "match": {
                        "content": "error E-4012 application crash"
                    }
                },
                {
                    "knn": {
                        "embedding": {
                            "vector": query_embedding,
                            "k": 5
                        }
                    }
                }
            ]
        }
    }
}

Catches: Both exact terms and semantic meaning. The most complete retrieval approach.

Misses: Very little. The main trade-off is cost and complexity — you need a search engine that supports both BM25 and k-NN (OpenSearch), and your index stores both text fields and vector fields.

The Comparison Table

Keyword (BM25)Vector (k-NN)SemanticHybrid
Matches onExact wordsMeaningMeaning + rankingWords + meaning
”error E-4012”Finds itLikely missesLikely missesFinds it
”app keeps crashing”Misses synonymsFinds themFinds + ranks themFinds them
Needs ML modelNoEmbedding modelEmbedding + rerankerEmbedding model
SpeedFastestEngine-dependentSlower (reranking)Both run in parallel
Index storageText (inverted index)Vectors (RAM or disk)Vectors + rerankerText + vectors
AWS serviceAny OpenSearchAny with k-NNBedrock KB + rerankerOpenSearch only

Where Each Approach Shines

Use keyword search alone when:

  • Users search for specific identifiers, codes, or exact phrases
  • Your data is structured (logs, tickets, records with known fields)
  • You need maximum speed with zero ML infrastructure

Use vector search alone when:

  • Queries are conversational (“how do I fix this?”)
  • Documents are conceptual (whitepapers, guides, Q&A)
  • Budget is constrained and you’re using S3 Vectors or Aurora pgvector
  • Users never search for specific codes or IDs

Use semantic search when:

  • You’re already doing vector search and want better ranking
  • The top-5 results matter more than the top-20 (reranking improves precision at the top)
  • Budget allows for a reranking step

Use hybrid search when:

  • Your data contains both specific identifiers and conceptual content (most enterprise data)
  • Retrieval quality directly impacts business outcomes
  • You’re building a RAG system for IT support, legal, manufacturing, healthcare, or finance — any domain with codes, IDs, and natural language mixed together

Domain Impact

DomainUsers search for…Keyword catchesVector catchesNeed hybrid?
IT supportError codes, ticket IDs, service names + symptomsCodes, IDsSymptoms, troubleshootingYes
LegalArticle numbers, case references + legal conceptsStatute IDsInterpretationsYes
ManufacturingPart numbers, machine IDs + failure descriptionsPart codesFailure modesYes
HealthcareDrug codes, ICD codes + symptom descriptionsMedical codesSymptoms, treatmentsYes
General Q&AMostly “how do I…” questionsLimited valueHigh valueOptional

If your domain has specific identifiers that users search for, hybrid search isn’t optional — it’s required.

Cost Reality

Hybrid search means OpenSearch, and OpenSearch means either managed clusters or Serverless:

OptionMinimum CostHybrid Search
OpenSearch Managed~$470/month (small cluster)Yes
OpenSearch Serverless~$700/month (4 OCUs min)Yes
S3 Vectors$0 minimum (pay-per-query)No
Aurora pgvector~$60/month (small instance)No (vector only)

If your workload is low-volume and purely semantic (no codes or IDs), S3 Vectors or Aurora pgvector save significant cost. If you need hybrid search, OpenSearch Managed gives you the lowest entry point. For a detailed comparison of all vector store options on AWS, see the vector store guide.

What I Learned

  • Vector search is a mechanism, semantic search is a capability — Vector search (k-NN) is the algorithm that finds nearest neighbors. Semantic search is the broader system that uses vector search plus query understanding, reranking, and context. Calling k-NN “semantic search” is like calling a database query “business intelligence.”
  • BM25 is 30 years old and still essential — Every benchmark shows that hybrid search (BM25 + vector) outperforms pure vector search by 10-20%. Old doesn’t mean obsolete. Exact-match retrieval solves problems that embeddings fundamentally cannot.
  • Hybrid search is the right default for enterprise RAG — If your documents contain any codes, IDs, product names, or specific terms, hybrid search is not a nice-to-have. It’s the difference between finding “E-4012 is a timeout error” and returning generic error handling documentation.
  • The cost of hybrid search is the cost of OpenSearch — There’s no free hybrid search option on AWS today. This is the real trade-off: ~$470+/month for better retrieval quality, or $0-60/month with pure vector search. For production RAG systems, the retrieval quality usually justifies the cost.

What’s Next

  • Benchmark hybrid search vs pure vector search on a real enterprise dataset — quantify the 10-20% improvement claim with actual recall@5 and faithfulness scores
  • Test Bedrock Knowledge Bases with OpenSearch Serverless backend in hybrid mode vs S3 Vectors backend — same queries, same documents, measure quality delta
  • Explore OpenSearch neural search plugin for query expansion — automatic synonym and concept injection before retrieval
  • Build a cost model: at what query volume does hybrid search on OpenSearch become cheaper per-query than vector-only on S3 Vectors + reranking?

References:

Alexandre Agius

Alexandre Agius

AWS Solutions Architect

Passionate about AI & Security. Building scalable cloud solutions and helping organizations leverage AWS services to innovate faster. Specialized in Generative AI, serverless architectures, and security best practices.

Related Posts

Back to Blog