Binary vector format for flat and hnsw vectors #14078

benwtrent · 2024-12-17T22:22:50Z

This provides a binary vector format for vectors. The key ideas are:

Centroid centered vectors
Asymmetric quantization
Individually optimized scalar quantization

This allows Lucene to have a single scalar quantization format that allows for high quality vector retrieval, even down to a single bit.

For all similarity types, on disk it looks like.

quantized vector	lower quantile	upper quantile	additional correction	sum quantized components
(vector_dimension/8) bytes	float	float	float	short

During segment merge & HNSW building, another temporary file is written containing the query quantized vectors over the configured centroids. One downside is that this temporary file will actually be larger than the regular vector index. This is because we use asymmetric quantization to keep good information around. But once the merge is complete, this file is deleted. I think eventually, this can be removed.

Here are the results for Recall@10|50

Dataset	old PR	this one	Improvement
Cohere 768	0.933	0.938	0.5%
Cohere 1024	0.932	0.945	1.3%
E5-Small-v2	0.972	0.975	0.3%
GIST-1M	0.740	0.989	24.9%

Even with the optimization step, indexing time with HNSW is only marginally increased.

Dataset	OLD PR	This One	Difference
Cohere 768	368.62s	372.95s	+1%
Cohere 1024	307.09s	314.08s	+2%
E5-Small-v2	227.37s	229.83s	< +1%

The consistent improvement in recall and flexibility for various bits makes this format and quantization technique much preferred.

Eventually, we should consider moving scalar quantization to utilize this new optimized quantizer. Though, the on disk format and scoring will change, so, I didn't do that in this PR.

supersedes: #13651

Co-Authors: @tveasey @john-wagster @mayya-sharipova @ChrisHegarty

gaoj0017 · 2024-12-18T07:02:10Z

Hi @benwtrent , I am the first author of the RaBitQ paper and its extended version. As your team have known, our RaBitQ method brings breakthrough performance on binary quantization and scalar quantization.

We notice that in this pull request, you mention a method which individually optimizes the lower bound and upper bound of scalar quantization. This idea is highly similar to our idea of individually looking for the optimal rescaling factor of scalar quantization as described in our extended RaBitQ paper, which we shared with your team in Oct 2024. An intuitive explanation can be found in our recent blog. The mathematical equivalence between these two ideas is listed in Remark 2.

In addition, the contribution of our RaBitQ has not been properly acknowledged at several other places. For example, in a previous post from Elastic - Better Binary Quantization (BBQ) in Lucene and Elasticsearch, the major features of BBQ are introduced, yet it is not made clear that all these features originate from our RaBitQ paper. In a press release, Elastic claims that "Elasticsearch’s new BBQ algorithm redefines vector quantization", however, BBQ is not a grandly new method, but a variant of RaBitQ with some minor adaption.

We note that when a breakthrough is made, it is always easy to derive its variants or to restate the method in different languages. One should not claim a variant to be a new method with a new name and ignore the contribution of the original method. We hope that you would understand our concern and acknowledge the contributions of our RaBitQ and its extension properly in your pull requests and/or blogs.

Remark 1. The BBQ feature fails on the GIST dataset because it removes the randomization operation of the RaBitQ method. With the randomization operation, RaBitQ is theoretically guaranteed to perform stably on all datasets.
Remark 2. Let $B$ be the number of bits for scalar quantization. The scalar quantization can be represented in two equivalent ways.
1. Scalar quantization can be determined by the lower bound $v_l$ and the upper bound $v_r$. The algorithm first computes $\Delta =(v_r-v_l) / (2^{B}-1)$ and then maps each real value $x$ to the nearest integer of $(x-v_l) / \Delta$.
2. Based on the process above, scalar quantization can be equivalently determined by a rescaling factor $\Delta$ and a shifting factor $v_l$.

mayya-sharipova · 2024-12-18T20:07:30Z

lucene/core/src/java/org/apache/lucene/codecs/lucene102/package-info.java

+ * </tr>
+ * <tr>
+ * <td>{@link org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsFormat Vector values}</td>
+ * <td>.vec, .vem, .veq, vex</td>


should we also add veb and vemb files to the list?

mayya-sharipova · 2024-12-18T20:08:37Z

.../core/src/java/org/apache/lucene/codecs/lucene102/Lucene102BinaryQuantizedVectorsFormat.java

+import org.apache.lucene.index.SegmentWriteState;
+
+/**
+ * Copied from Lucene, replace with Lucene's implementation sometime after Lucene 10 Codec for


Also should we remove this line?

mayya-sharipova · 2024-12-18T20:23:11Z

.../core/src/java/org/apache/lucene/codecs/lucene102/Lucene102BinaryQuantizedVectorsFormat.java

+    return "Lucene102BinaryQuantizedVectorsFormat(name="
+        + NAME
+        + ", flatVectorScorer="
+        + scorer


nit: should we also add + ", rawVectorFormat=" + rawVectorFormat?

benwtrent · 2024-12-19T16:49:01Z

@gaoj0017 Thank you for your feedback!

Truly, y'all inspired us on improving scalar quantization. RaBitQ showed that it is possible to achieve 32x reduction while achieving high recall without product quantization. And, to my knowledge, we have attributed inspiration where the particulars of the algorithm were used.

As for this change, it is not mathematically the same or derived from y'all's new or old paper. Indeed, your new paper is interesting and provides the same flexibility to various bit sizes and shows that it's possible. However, we haven’t tested it, nor implemented it.

Here are some details about this implementation.

https://www.elastic.co/search-labs/blog/scalar-quantization-optimization

gaoj0017 · 2024-12-26T13:45:59Z

@benwtrent Thanks for your reply.

First, in the blog - Better Binary Quantization at Elastic and Lucene - the BBQ method is a variant of our RaBitQ with no major differences. The claimed major features of BBQ all originate from our RaBitQ paper (as we have explained in our last reply). There is only one attribution to our method, where it is mentioned (in one sentence) that BBQ is based on some inspirations from RaBitQ. We think this attribution is not sufficient - it should be made clear that the mentioned features of BBQ all originate from RaBitQ.

Second, for the new method described in this pull request, there is no attribution to our extended RaBitQ method at all - we note that we shared with your team the extended RaBitQ paper more than 2 months ago. To our understanding, the method is highly similar to our extended RaBitQ at its core (which also supports quantizing a vector to 1-bit, 2-bit, ... per dimension). They share the major idea of optimizing the scalar quantization method by trying different parameters. In your new blog, it is mentioned that “Although the RaBitQ approach is conceptually rather different to scalar quantization, we were inspired to re-evaluate whether similar performance could be unlocked for scalar quantization.” This is not true since our extended RaBitQ corresponds to an optimized scalar quantization method.

Given that our extended RaBitQ method is a prior art of the method introduced in the blog and our method was known to your team more than 2 months ago, you should not have ignored it. Discussions on the differences between two methods if any should be clearly explained and experiments of comparing the two methods should be provided as well.

msokolov · 2024-12-30T14:57:39Z

@gaoj0017 it sounds to me as if your concern is about lack of attribution in the blog post you mentioned, and doesn't really relate to this pull request (code change) - is that accurate?

mikemccand · 2025-01-03T13:47:34Z

+1 for proper attribution.

We should give credit where credit is due. The evolution of this PR clearly began with the RaBitQ paper, as seen in the opening comment on the original PR as well as the original issue.

Specifically for the open source changes proposed here (this pull request suggesting changes to Lucene's ASL2 licensed source code):

The CHANGES.txt entry should link to both RaBitQ papers?
The javadoc for the new Lucene102BinaryQuantizedVectorsFormat should also link to both papers, and describe the provenance (e.g. the algorithm described by these papers) along with how this implementation differs from the original papers? We try to do this when a paper inspires changes in Lucene, e.g. the algorithm for efficiently building our FSTs, the paper that inspired our block-tree terms dictionary, the HNSW approximate KNN search algorithm.

Linking to the papers that inspired important changes in Lucene is not only for proper attribution but also so users have a deep resource they can fall back on to understand the algorithm, understand how tunable parameters are expected to behave, etc. It's an important part of the documentation too! Also, future developers can re-read the paper and study Lucene's implementation and maybe find bugs / improvement ideas.

For the Elastic specific artifacts (blog posts, press releases, tweets, etc.): I would agree that Elastic should also attribute properly, probably with an edit/update/sorry-about-the-oversight sort of addition? But I do not (no longer) work at Elastic, so this is merely my (external) opinion! Perhaps a future blog post, either Elastic or someone else, could correct the mistake (missed attribution).

Finally, thank you to @gaoj0017 and team for creating RaBitQ and publishing these papers -- this is an impactful vector quantization algorithm that can help the many Lucene/OpenSearch/Solr/Elasticsearch users building semantic / LLM engines these days.

benwtrent · 2025-01-03T17:18:24Z

To head this off, this implementation is not an evolution of RabitQ in any way. It's intellectually dishonest to say it's an evolution of RaBitQ. I know that's pedantic, but it's a fact.

This is the next step of the global vector quantization optimization done already in Lucene. Instead of global, it's local and utilizes anisotropic quantization. I am still curious as to what in particular is considered built on RaBitQ here. Just because things reach the same ends (various bit level quantization) doesn't mean they are the same.

We can say "so this idea is unique from RaBitQ in these ways" to keep attribution, but it seems weird to call out another algorithm to simply say this one is different.

I agree, Elastic stuff should be discussed and fixed in a different forum.

gaoj0017 · 2025-01-06T12:39:00Z

Hi @msokolov , the discussion here is not only about the blog posts but also related to the pull request here. In this pull request (and its related blogs), it claims a new method without properly acknowledging the contributions/inspirations from our extended RaBitQ method as we have explained in our last reply. Besides, we believe this discussion is relevant to the Lucene community because Lucene is a collaborative project, used and contributed to by many teams beyond Elastic.

Thanks @mikemccand for your kind words - we truly appreciate them! It is also an honor to us that RaBitQ and the extended RaBitQ are seen as impactful in improving industry productivity.

Our responses to @benwtrent are as follows.
Point 1 in our last reply: This has been ignored in Ben’s reply. We would like to emphasize once again that the so-called “BBQ” method from Elastic is largely based on our RaBitQ method, with only minor modifications - this can be reflected in the previous PRs. Yet Elastic has repeatedly referred to BBQ as their new algorithm without acknowledging RaBitQ. For example, in a press release, Elastic states that "Elasticsearch’s new BBQ algorithm redefines vector quantization," omitting any reference to RaBitQ. This is unacceptable and particularly unfair to other teams who have openly acknowledged their use of RaBitQ. We request that our RaBitQ method should be properly credited in all existing and future blogs, press releases, pull requests, and other communications regarding the “BBQ” method.

Point 2 in our last reply: In the related blog that describes the method in this pull request, it states “Although the RaBitQ approach is conceptually rather different to scalar quantization, we were inspired to re-evaluate whether similar performance could be unlocked for scalar quantization” - this is not true since this target has already been achieved by our extended RaBitQ method. Your team should not have made this mis-claim by ignoring our extended RaBitQ method since we circulated our extended RaBitQ paper to your team more than three months ago. In addition, our extended RaBitQ method proposed the idea of searching for the optimal parameters of scalar quantization for each vector (for details, please refer to our blog). The method in this pull request has adopted a highly similar idea. For this reason, we request that in any existing and potentially future channels of introducing the method in this PR, proper acknowledgement of our extended RaBitQ method should be made.

ChrisHegarty · 2025-01-06T15:04:07Z

In my capacity as the Lucene PMC Chair (and with explicit acknowledgment of my current employment with Elastic, as of the date of this writing), I want to emphasize that proper attribution and acknowledgment should be provided for all contributions, as applicable, in accordance with best practices.

While the inclusion of links to external blogs and prior works serves to provide helpful context regarding the broader landscape, it would be of greater value to explicitly delineate which specific elements within this pull request are directly related to the RaBitQ method or its extension.

tveasey · 2025-01-06T16:18:27Z

Just sticking purely to the issues raised regarding this PR and the blog Ben linked explaining the methodology...

Although the RaBitQ approach is conceptually rather different to scalar quantization, we were inspired to re-evaluate whether similar performance could be unlocked for scalar quantization” - this is not true since this target has already been achieved by our extended RaBitQ method. Your team should not have made this mis-claim by ignoring our extended RaBitQ method since we circulated our extended RaBitQ paper to your team more than three months ago.

This comment relates to the fact that RaBitQ, as you yourself describe it in both your papers, is motivated by seeking a form of product quantization (PQ) for which one can compute the dot product directly rather than via look up. Your papers make minimal reference to scalar quantisation (SQ) other than to say the method is a drop in replacement. If you strongly take issue to the statement based on this clarification we can further clarify it in the blog. I still feel this is separate to this PR and it seems better to discuss that in a separate forum.

I would also reiterate that conceptually, our approach is much closer to our prior work on int4 SQ we blogged about last April, which is what inspired it more directly.

In addition, our extended RaBitQ method proposed the idea of searching for the optimal parameters of scalar quantization for each vector (for details, please refer to our blog).

I would argue that finding the nearest point on the sphere is exactly equivalent to the standard process in SQ of finding the nearest grid point to a vector. Perhaps more accurate would be to say you've ported SQ to work with spherical geometry, although as before the more natural motivation, and the one you yourselves adopt, is in terms of PQ. This isn't related to optimising hyperparameters of SQ IMO.

You could argue perhaps that arranging for both codebook centres and corpus vectors to be uniformly distributed on the sphere constitutes this sort of optimization, although it would not be standard usage. At best you could say it indirectly arranges for raw vectors to wind up close in some average sense to the quantized vectors. However, I'd take issue with this statement because a single sample of a random rotation does not ensure that the corpus vectors are uniformly distributed on the sphere: using a single random rotation of, for example, a set of points which are concentrated somewhere on the sphere doesn't change this. You would have to use different samples for different vectors, but this eliminates the performance advantages.

Incidentally, this I think is the reason it performs significantly worse on GIST and indeed part of the reason why we found small improvements across the board for binary. (Tangentially, it feels like a whitening pre-conditioner might actually be of more benefit to performance of RaBitQ. I also can't help but feel some combination of hyperparameter optimization and normalization will yield even further improvements, but I haven't been able to get this to workout yet.)

Binary vector format for flat and hnsw vectors

fbf112a

benwtrent added this to the 10.2.0 milestone Dec 17, 2024

benwtrent mentioned this pull request Dec 17, 2024

Add a Better Binary Quantizer format for dense vectors #13651

Closed

ChrisHegarty added 2 commits December 18, 2024 15:32

test default and panama impls return the same result

62cd45b

add tests for int4BitDotProdut

db10587

mayya-sharipova reviewed Dec 18, 2024

View reviewed changes

mikemccand mentioned this pull request Jan 3, 2025

Change http:// to https:// in our ASL2 copyright header? #14099

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binary vector format for flat and hnsw vectors #14078

Binary vector format for flat and hnsw vectors #14078

benwtrent commented Dec 17, 2024 •

edited by ChrisHegarty

Loading

gaoj0017 commented Dec 18, 2024

mayya-sharipova Dec 18, 2024

mayya-sharipova Dec 18, 2024

mayya-sharipova Dec 18, 2024

benwtrent commented Dec 19, 2024

gaoj0017 commented Dec 26, 2024

msokolov commented Dec 30, 2024

mikemccand commented Jan 3, 2025

benwtrent commented Jan 3, 2025

gaoj0017 commented Jan 6, 2025

ChrisHegarty commented Jan 6, 2025

tveasey commented Jan 6, 2025 •

edited

Loading

Binary vector format for flat and hnsw vectors #14078

Are you sure you want to change the base?

Binary vector format for flat and hnsw vectors #14078

Conversation

benwtrent commented Dec 17, 2024 • edited by ChrisHegarty Loading

gaoj0017 commented Dec 18, 2024

mayya-sharipova Dec 18, 2024

Choose a reason for hiding this comment

mayya-sharipova Dec 18, 2024

Choose a reason for hiding this comment

mayya-sharipova Dec 18, 2024

Choose a reason for hiding this comment

benwtrent commented Dec 19, 2024

gaoj0017 commented Dec 26, 2024

msokolov commented Dec 30, 2024

mikemccand commented Jan 3, 2025

benwtrent commented Jan 3, 2025

gaoj0017 commented Jan 6, 2025

ChrisHegarty commented Jan 6, 2025

tveasey commented Jan 6, 2025 • edited Loading

benwtrent commented Dec 17, 2024 •

edited by ChrisHegarty

Loading

tveasey commented Jan 6, 2025 •

edited

Loading