Profil picture of Clément Renault

Clément Renault

January 20, 2025 — 2 comments

Meilisearch vs Qdrant: Tradeoffs, Strengths and Weaknesses

Warning

I made a mistake using Qdrant; it's not slow when set up correctly. I noted that Qdrant allows searching and filtering without proper parameters, which can lead to misuse. Slowness can arise from a misconfigured index or if filterable attributes aren't listed. You can find the actual results in this comment. It's still interesting to notice that Meilisearch is more relevant when constrained than when searching the whole dataset.

Meilisearch is a highly performant, ultra-relevant search engine, and this article will help you better understand the latter adjective. The engine uses an internal library called arroy (made by Meilisearch) to organize and perform approximate nearest neighbor searches (ANNs). Meilisearch also supports a hybrid search feature, absolutely not based on fusion ranking (thank God!), which brings search results that are always relevant whether the query is keyword-based or requires semantic understanding.

To measure the performance of our random projections-based vector store, we compared it to the well-known Qdrant engine. We measured the indexing time, recall, cumulated search time, and disk usage on different datasets. You can review and run everything on our benchmarks GitHub repository.

Recall refers to the proportion of relevant documents retrieved from a search that are considered ideal based on the targeted query and the closest documents surrounding it. Recall@100 indicates that we have identified the top 100 documents before executing the search on Qdrant and Meilisearch, and we will count the number of documents they return from this perfect set. A higher recall score is better, with 1.00 representing a perfect recall.

Show Raw Benchmark data and more details

In this benchmark, we measure a cumulated search time by running a hundred random searches on the same dataset in different subsets of the large, one million document dataset. That's why the search time seems large—it's not a single search but a hundred. In the tables below, I divided the cumulated searches by the number of performed searches (100) to get the average search time.

The following screenshot compares an angular distance with a cosine one. Those two names refer to the same distance function. We recently changed "angular" to "cosine" on Meilisearch to align with our competitors and reduce confusion. You probably also noticed the arroy name. It's the name of the internal vector store engine we built for Meilisearch.

Finally, Qdrant exposes an "exact" parameter for the search that can return perfect results, but the downside is that it is very slow, too slow for normal end-user search usage. That's why we decided not to show it on our benchmarks. The goal of this parameter seems to be debug-oriented and not used in production.

Raw Qdrant vs Meilisearch data

Searching in 100% of the Dataset

As we can see in the following table, Qdrant is more efficient than Meilisearch at indexing and searching the one million OpenAI (1536 dims) embeddings. However, enabling the binary quantization on both contenders shows a highly reduced disk usage on the Meilisearch side but also has the downside of highly reducing the relevancy.

Version Recall @1 Recall @20 Recall @100 Indexing time Average Search time On disk size
Meilisearch 1.00 0.85 0.89 1205s 84ms 13 GiB
Meilisearch Binary Quantized 1.00 0.63 0.66 237s 40ms 2 GiB
Qdrant 1.00 0.97 0.96 104s 16ms 9 GiB
Qdrant Binary Quantized 1.00 0.97 0.96 99s 13ms 8 GiB

Searching in 12% of the Dataset

This part of the benchmarks is where the results become interesting. We decided to filter the one million vectors we were searching on to only search in a subset of 12%, equivalent to searching 120,000 documents. We can see that Meilisearch is more relevant and faster than its competitor. Even when comparing the binary quantized versions, Qdrant shows weaknesses in this use case.

Qdrant still uses more than half of the disk space Meilisearch uses. However, searching this document subset becomes 3x to 6x slower than Meilisearch. Interestingly, the more you constrain Meilisearch, the better it gets in relevancy and search time.

Version Recall @1 Recall @20 Recall @100 Indexing time Average Search time On disk size
Meilisearch 0.77 0.93 0.95 1206s 60ms 13 GiB
Meilisearch Binary Quantized 0.57 0.67 0.68 237s 30ms 2 GiB
Qdrant 0.55 0.54 0.50 101s 190ms 8 GiB
Qdrant Binary Quantized 0.58 0.56 0.51 99s 190ms 8 GiB

Searching in 6% of the Dataset

We tried to reduce the subset further to see if Qdrant uses a different algorithm when the subset of documents is even smaller. We benchmarked by searching in a 60,000 vector subset this time.

As you may see, Qdrant relevancy is even worth than when searching in 12% of the dataset. This time, even the binary quantized version of Meilisearch is between 6 and 36 points more relevant than the non-quantized Qdrant version. It indeed seems that the engine doesn't switch algorithms when the subset becomes very small. On the other hand, Meilisearch became even more relevant. As spotted before, the engine is more relevant and up to 17x faster when the use case is to search in a very constrained subset of documents.

Version Recall @1 Recall @20 Recall @100 Indexing time Average Search time On disk size
Meilisearch 0.82 0.94 0.97 1206s 60ms 13 GiB
Meilisearch Binary Quantized 0.49 0.68 0.67 238s 30ms 2 GiB
Qdrant 0.29 0.20 0.17 101s 450ms 8 GiB
Qdrant Binary Quantized 0.27 0.30 0.27 102s 510ms 8 GiB

Searching in 1% of the Dataset

For the sport, we further reduced the subset to see how well Meilisearch was performing, and we indeed noticed nearly perfect relevancy results when using the non-quantized version. However, Qdrant struggles to reach 22% recall in about 10x longer than Meilisearch.

As you can see, the Meilisearch binary quantized version is good enough and clearly better than any Qdrant version. So, if you have disk size constraints and want to reduce the indexing time by six while still being more than three times more relevant than Qdrant, you can still use this solution.

Version Recall @1 Recall @20 Recall @100 Indexing time Average Search time On disk size
Meilisearch 0.85 0.98 0.99 1208s 100ms 13 GiB
Meilisearch Binary Quantized 0.64 0.68 0.68 237s 60ms 2 GiB
Qdrant 0.20 0.22 0.20 100s 670ms 8 GiB
Qdrant Binary Quantized 0.14 0.10 0.09 102s 900ms 8 GiB

What we Learned?

Meilisearch doesn't use an HNSW or a DiskANN for its performant vector store. Instead, it uses a random projections approach to split the dataset into multiple random trees and store everything in LMDB. After our filtering system extracted the subset of documents to search for, we dive through the tree nodes and filter the documents along the way. Computing so many nodes takes a long time to index but unlocks a snappy search service with high relevancy and recall. Everything is a matter of tradeoffs in computer science.

Still, as it is too slow, we plan to investigate and improve the indexing time. We will perform tweaking, like reducing the number of trees and tree nodes, and review how it affects relevancy, obviously.

On the other hand, Qdrant uses a RocksDB-backed HNSW with an in-house filtering system. It takes 10x less time to index the documents but still uses considerable space. Unfortunately, this algorithm is inefficient when the amount of data is large enough and the subset is small. We also benched with 100k documents, and Qdrant performed well. But at Meilisearch, it is more common to see customers with more than a million documents than the contrary.

What are the Use Cases?

Meilisearch is ideal for multi-tenancy systems where multiple users' data is merged into a single index. For instance, in a Bluesky-like app, all posts can be stored together, allowing users to search for relevant content from the last month or specific hashtags.

Similarly, in an eCommerce or marketplace platform, products are stored in one index with details like price, category, description, and shipping region. Users typically filter by price or category first and then perform a semantic search, significantly reducing the data set for approximate nearest neighbors (ANNs) on which Meilisearch performs better.

Furthermore, Meilisearch is designed for direct use by end-users from the front end of any application or eCommerce website. The search must be as snappy and quick as possible, without latency. However, when doing a semantic search, the embedding must be generated by a third-party API, e.g., OpenAI or Anthropic, which already brings a lot of latency. As you have seen, Meilisearch pulls down the latency as much as possible when Qdrant shows latency, just for the local ANN search, which is as high as a second.

Now that you understand Meilisearch's capabilities and the internal vector store, Arroy, I hope you will create something amazing with this wonderful open-source software. 🚀

🙂 ✚
2 comments, join the discussion