Hi, right now the only way to use the encoders without Strings is with a byte array. Wouldn't it be helpful to allow to pass in offset and length for use cases where there's a reusable byte array at hand? There's a part of MIA devoted to speeding up the encoding and i think this would be a natural fit. I stumbled upon it because i have avro Utf8 objects that allow me to get byte[], offset and len. The used MurmurHash class allows passing in those values...
Another feature i was thinking of was the possibility of restricting the range to encode, meaning a particular limit of the elements to put the hash values in. This would allow to keep parts of the vector "clean" when we know we always have a set of variables, e.g. the intercept. Is this something to add? Should I add it? Cheers, Johannes