Re: Patch to change murmurhash implementation slightly

2023-08-25 Thread Yonik Seeley
On Fri, Aug 25, 2023 at 6:34 PM Thomas Dullien wrote: > apologies if the chart is incorrect. The chart isn't necessarily incorrect, but it probably isn't the most relevant statistic here. "Lies, damn lies, and statistics" ;-) The average length of unique English words is not the same as the

Re: Patch to change murmurhash implementation slightly

2023-08-25 Thread Marcus Eagan
Thomas, Also, is it possible to open this patch as a pull request in GitHub? I guess it does not matter for a lot of the people here. It would make it easier for more people to collaborate in that medium given the shift to GitHub recently. - Marcus On Fri, Aug 25, 2023 at 7:03 PM Marcus

Re: Patch to change murmurhash implementation slightly

2023-08-25 Thread Marcus Eagan
Hi Thomas, Thank you for the hard work thus far. I'm excited to see if the community can benefit from the work. The best way to use the lucene bench is to run the baseline and candidate branches as described here . I

Re: Patch to change murmurhash implementation slightly

2023-08-25 Thread Thomas Dullien
Hey all, apologies if the chart is incorrect. Anyhow, I think the more important questions are: 1) Which benchmarks does the Lucene community have that y'all would like to see an improvement on before accepting this (or any other future) performance patches? I'm guessing the reason why the

Re: Patch to change murmurhash implementation slightly

2023-08-25 Thread Robert Muir
chart is wrong, average word length for english is like 5. On Fri, Aug 25, 2023 at 9:35 AM Thomas Dullien wrote: > > Hey all, > > another data point: There's a diagram with the relevant distributions of word > lengths in various languages here: > >

Re: Patch to change murmurhash implementation slightly

2023-08-25 Thread Thomas Dullien
Hey all, another data point: There's a diagram with the relevant distributions of word lengths in various languages here: https://www.reddit.com/r/languagelearning/comments/h9eao2/average_word_length_of_languages_in_europe_except/ While English is close to the 8-byte limit, average word length