Hi,

right now the only way to use the encoders without Strings is with a byte
array. Wouldn't it be helpful to allow to pass in offset and length for use
cases where there's a reusable byte array at hand? There's a part of MIA
devoted to speeding up the encoding and i think this would be a natural
fit. I stumbled upon it because i have avro Utf8 objects that allow me to
get byte[], offset and len.
The used MurmurHash class allows passing in those values...

Another feature i was thinking of was the possibility of restricting the
range to encode, meaning a particular limit of the elements to put the hash
values in. This would allow to keep parts of the vector "clean" when we
know we always have a set of variables, e.g. the intercept.

Is this something to add? Should I add it?

Cheers,
Johannes

Reply via email to