Retrieval infra

Vector storage and semantic search systems for production retrieval.

Use this page when your application needs a reliable place to store embeddings, run similarity search, filter results, and support retrieval at scale.

Search layer

These projects sit behind RAG systems, semantic search products, recommendation features, and other embedding-heavy workloads.

Not the chat app

A vector database stores and retrieves data. It usually works alongside an app, framework, model runtime, and ingestion pipeline.

Why it works

  • Operational vector stores

    Qdrant, Chroma, Weaviate, Milvus, LanceDB, Marqo, and Vespa are common choices when you need metadata filtering, predictable APIs, and retrieval-specific operations.

  • Search engines and hybrid retrieval

    Typesense, Meilisearch, Elasticsearch, and OpenSearch fit teams that need keyword search, filters, facets, and vector or hybrid search together.

  • Embedded and database-native retrieval

    pgvector, pgvectorscale, and sqlite-vec keep vector search close to application data when adding another service is not worth the operational cost.

Curated repositories

Vector databases and retrieval storage

14 projects
qdrant

qdrant/qdrant

qdrant

30.8k

Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

2.2k|Rust
Apache-2.0
neural-networksearch-engineknn-algorithm
chroma-core

chroma-core/chroma

chroma-core

27.6k

Data infrastructure for AI

2.2k|Rust
Apache-2.0
databaserustrust-lang
weaviate

weaviate/weaviate

weaviate

16.1k

Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database​.

1.3k|Go
BSD-3-Clause
search-enginesemantic-searchsemantic-search-engine
milvus-io

milvus-io/milvus

milvus-io

44k

Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search

4k|Go
Apache-2.0
annsnearest-neighbor-searchfaiss
lancedb

lancedb/lancedb

lancedb

10.1k

Developer-friendly OSS embedded retrieval library for multimodal AI. Search More; Manage Less.

863|HTML
Apache-2.0
approximate-nearest-neighbor-searchimage-searchnearest-neighbor-search
pgvector

pgvector/pgvector

pgvector

21k

Open-source vector similarity search for Postgres

1.2k|C
NOASSERTION
nearest-neighbor-searchapproximate-nearest-neighbor-search
typesense

typesense/typesense

typesense

25.7k

Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences

879|C++
GPL-3.0
search-enginesearchtypo-tolerance
meilisearch

meilisearch/meilisearch

meilisearch

57.3k

A lightning-fast search engine API bringing AI-powered hybrid search to your sites and applications.

2.5k|Rust
NOASSERTION
search-enginetypo-tolerancesite-search
elastic

elastic/elasticsearch

elastic

76.6k

Free and Open Source, Distributed, RESTful Search Engine

25.9k|Java
NOASSERTION
elasticsearchjavasearch-engine
opensearch-project

opensearch-project/OpenSearch

opensearch-project

12.8k

🔎 Open source distributed and RESTful search engine.

2.5k|Java
Apache-2.0
searchsearch-engineanalytics
vespa-engine

vespa-engine/vespa

vespa-engine

6.9k

AI + Data, online. https://vespa.ai

711|Java
Apache-2.0
vespasearch-enginebig-data
marqo-ai

marqo-ai/marqo

marqo-ai

5k

Ecommerce Search and Discovery - marqo.ai

233|Python
Apache-2.0
multi-modalsearch-enginemachine-learning
asg017

asg017/sqlite-vec

asg017

7.5k

A vector search SQLite extension that runs anywhere!

306|C
Apache-2.0
sqlitesqlite-extension
timescale

timescale/pgvectorscale

timescale

3k

Postgres extension for vector search (DiskANN), complements pgvector for performance and scale. Postgres OSS licensed.

135|Rust
PostgreSQL
Retrieval-Augmented Generation (RAG) & Knowledge

Selection guide

How to choose a self-hosted vector database

Choose based on deployment model, filtering, hybrid search, scale, and whether retrieval should live in a standalone service or inside an existing database.

  • Standalone vector databases

    Qdrant, Weaviate, Milvus, Marqo, and Vespa are stronger fits when retrieval needs a dedicated service with scaling and operational controls.

  • Developer-friendly retrieval layers

    Chroma, LanceDB, and sqlite-vec can be easier to start with for local apps, notebooks, embedded retrieval, or multimodal workflows.

  • Postgres-first stacks

    pgvector and pgvectorscale fit when embeddings should stay with relational data, SQL, joins, and existing Postgres operations.

  • Search engines with vector support

    Typesense, Meilisearch, Elasticsearch, and OpenSearch fit teams that need full-text, filters, facets, and vector or hybrid search in the same retrieval layer.

Retrieval quality

Hybrid search and metadata filtering matter

For RAG and semantic search, ranking quality often depends on filtering, full-text search, sparse vectors, reranking, and metadata design, not only nearest-neighbor speed.

  • RAG backends

    Look for reliable indexing, predictable APIs, language SDKs, backup options, and observability around retrieval behavior. Dedicated vector stores and search engines solve different parts of the same retrieval problem.

Operations

Embedded, standalone, or distributed

A small private assistant may not need a distributed vector database. Larger retrieval products may need replication, sharding, RBAC, multi-tenancy, Kubernetes support, and backup plans.

  • Lower ops burden

    Chroma, LanceDB, sqlite-vec, or pgvector can be simpler when scale is modest or the team already runs Postgres or SQLite.

  • Higher scale

    Milvus, Weaviate, Qdrant, Vespa, Elasticsearch, and OpenSearch are better fits when retrieval becomes a dedicated production service.

Suggested additions

Strong candidates not yet in the registry

Apache Solr

apache/solr

9.4/10

A mature search engine with dense vector search, filtering, and hybrid retrieval patterns. Strong candidate when search infrastructure matters as much as vector similarity.

View repository

Infinity

infiniflow/infinity

9/10

An AI-native database and search engine with dense, sparse, tensor, full-text, and hybrid search. Promising fit, but newer than the established search and vector database options.

View repository

RediSearch

RediSearch/RediSearch

8.4/10

Adds full-text, vector similarity, and filtering capabilities to Redis. Useful for Redis-heavy stacks, with licensing and Redis version details to review before adoption.

View repository

ParadeDB

paradedb/paradedb

7.9/10

A Postgres-native search and analytics project with a strong retrieval story. Keep suggested until its vector and hybrid-search positioning is clearly registry-worthy for this page.

View repository

Vald

vdaas/vald

7.1/10

A Kubernetes-native distributed vector search engine. Relevant for ANN service deployments, but heavier and less broad than the main curated options.

View repository

Orama

oramasearch/orama

7/10

A lightweight search engine with full-text, vector, and hybrid search for browser, server, and edge use. Better as an embedded search layer than a classic vector database.

View repository

NucliaDB

nuclia/nucliadb

7/10

An AI search database for RAG and unstructured data. Relevant to retrieval storage, but lower-adoption and more specialized than the main list.

View repository

HelixDB

HelixDB/helix-db

6.8/10

A newer graph-vector database project. Interesting for graph-plus-vector retrieval, but still early compared with established vector and search infrastructure.

View repository

Related pages

Keep browsing

FAQ

Questions answered

What is a vector database used for?

It stores embeddings and supports similarity search, filtering, and retrieval for RAG, semantic search, and recommendation features.

Do I always need a dedicated vector database?

No. Some stacks use Postgres with pgvector or hybrid search layers. A dedicated vector store helps when retrieval scale, filtering, or operational separation matters.

Is pgvector enough for production RAG?

It can be, especially for Postgres-centric products that need SQL, joins, transactions, and simpler operations. Dedicated vector databases can be better when retrieval needs separate scaling, multi-tenancy, or specialized vector operations.

Which self-hosted vector database has the lowest ops burden?

For many small or Postgres-heavy deployments, pgvector has the lowest extra operational burden. sqlite-vec is useful for embedded SQLite workflows. Chroma and LanceDB can also be simpler for local or embedded retrieval. Qdrant, Weaviate, Milvus, Vespa, Elasticsearch, and OpenSearch fit better as dedicated services.

Should Elasticsearch, OpenSearch, Typesense, and Meilisearch count as vector databases?

They are better described as search engines with vector or hybrid retrieval support. They belong here when the job is semantic retrieval with keyword search, filtering, facets, and ranking in the same layer.