Hi,
Thank you!
пт, 3 янв. 2025 г. в 14:15, Uwe Schindler :
> Hi,
>
> the expressions query should not be slower. Of course, if you also take
> the compilation into the query time measurement it may be little slower
> due to compilation and optimizing. In general queries should be warmed
> before
Hi,
the expressions query should not be slower. Of course, if you also take
the compilation into the query time measurement it may be little slower
due to compilation and optimizing. In general queries should be warmed
before measuring them + expressions should only be compiled once and
reuse
Hi,
Thanks for the answers!
Yes, my task is to store only non-zero values from a sparse vector of large
dimension, where most of the elements are zero.
вт, 3 дек. 2024 г. в 19:17, Mikhail Khludnev :
> Thanks for clarification Michael!
>
> On Tue, Dec 3, 2024 at 1:56 PM Michael Sokolov wrote:
>
Thanks for clarification Michael!
On Tue, Dec 3, 2024 at 1:56 PM Michael Sokolov wrote:
> Sparse is meaning two different things here. In the case you found Mikhail,
> it means not every document has a value for some vector field. I think the
> question here is about very high dimensional vector
Sparse is meaning two different things here. In the case you found Mikhail,
it means not every document has a value for some vector field. I think the
question here is about very high dimensional vectors where most documents
have zeroes in most dimensions of the vector.
On Tue, Dec 3, 2024, 2:01 A
Morning.
I noticed a condition choosing sparse and dense format underneath
https://github.com/apache/lucene/blob/6053e1e31378378f6d310a05ea6d7dcdfc45f48b/lucene/core/src/java/org/apache/lucene/codecs/lucene95/OffHeapByteVectorValues.java#L108
perhaps it may achieve your performance requirements.
Hi,
Thanks for the answer!
I think this is similar to my initial implementation, where I built the
query as follows (PyLucene):
def build_query(query):
builder = BooleanQuery.Builder()
for term in torch.nonzero(query):
field_name = to_field_name(term.item())
value = query[
Another way is using postings - you can represent each dimension as a
term (`dim0`, `dim1`, etc) and index those that occur in a document.
To encode a value for a dimension you can either provide a custom term
frequency, or index the term multiple times. Then when searching you
can form a BooleanQu
Hi,
Thanks for the reply.
I haven't tried to do that.
However, I do not fully understand how in this case an inverted index will
be constructed for an efficient search by terms (O(1) for each term as a key
)?
пн, 2 дек. 2024 г. в 21:55, Patrick Zhai :
> Hi, have you tried to encode the sparse v
Hi, have you tried to encode the sparse vector yourself using the
BinaryDocValueField? One way I can think of is to encode it as (size,
index_array, value_array) per doc
Intuitively I feel like this should be more efficient than one dimension
per field if your dimension is high enough
Patrick
On
Hi!
I need to index sparse vectors, whereas as I understand it,
KnnFloatVectorField is designed for dense vectors.
Therefore, it seems that this approach will not work.
вс, 1 дек. 2024 г. в 18:36, Mikhail Khludnev :
> Hi,
> May it look like KnnFloatVectorField(... DOT_PRODUCT)
> and KnnFloatVect
Hi,
May it look like KnnFloatVectorField(... DOT_PRODUCT)
and KnnFloatVectorQuery?
Hi!
Thank you for your reply!
I tried the recommendations, and below I gave an example code for
implementing queries. The query with the expression works a little slower,
I think this is due to the need for compilation.
I have one more question, please tell me which type of field is best suited
f
Hi,
Can't it be better done with FunctionQuery and proper ValueSources? Please
also check Lucene Expressions?
On Sat, Nov 30, 2024 at 9:00 PM Viacheslav Dobrynin
wrote:
> Hello!
>
> I have implemented a custom scoring mechanism. It looks like a dot product.
> I would like to ask you how accurate
Hello!
I have implemented a custom scoring mechanism. It looks like a dot product.
I would like to ask you how accurate and effective my implementation is,
could you give me recommendations on how to improve it?
Here are a couple of examples that I want to use this mechanism with.
Example 1:
A do
15 matches
Mail list logo