I have run the luceneutil benchmark with higher iterations and repeat
count but they are still very noisy, which I blame for running those
benchmarks on a laptop.
The results always show some of the facets tasks having speed ups
while others having small slowdowns. One run that clearly shows a
slo
FWIW I have also seen some users store sparse vectors or bloom filters in
binary doc values. In both cases, the serialized size may be non negligible
while not all bytes are needed. This change would likely help.
Having the binary sort and faceting tasks not show a big slowdown would be
good as th
@Cris: Agreeing on an off-heap BytesRef thingy would be a great step forward.
@Mike: Yes, there are other use cases. One that is close to my heart
is the geo use case where in many cases you don't need to read all the
bytes, and geometries can be big. In lucene there are some interesting
usages in
That makes sense to me too in the abstract. At Amazon we also have
interesting BDV fields we have to decode on the fly, so this looks
attractive for that reason (not just faceting).
I would say though that it would be easier to evaluate the fitness for
purpose (faceting) if we had some examples of
Hi Ignacio,
I completely agree with the idea of having a BytesRef-like thing that can be
off-heap. For a while now I’ve been thinking about how we could evolve BytesRef
so as to not expose its on-heap representation. Having a separate primitive is
probably a better way to go.
-Chris.
> On 5 D
Hello,
I have been working with the idea of reading binary doc values
off-heap for a while. The idea behind it is that binary doc values are
often used for faceting where structure data is encoded at write time
and decoded at read time. It feels wasteful to have to read the data
on-heap before dec