[ https://issues.apache.org/jira/browse/CODEC-264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018374#comment-17018374 ]
Andy Seaborne edited comment on CODEC-264 at 1/18/20 9:27 AM: -------------------------------------------------------------- The v1.14 version of {{hash128(byte[], , int seed)}} does now apply the seed mask, contrary to the comments. Line 805 {noformat} @Deprecated public static long[] hash128(final byte[] data, final int offset, final int length, final int seed) { // ************ // Note: This fails to apply masking using 0xffffffffL to the seed. // ************ return hash128x64(data, offset, length, seed); } {noformat} It calls {{hash128x86(byte[],, int seed)}} (exact signature match), not {{hash128x86(byte[],, long seed)}} (type conversion). {{hash128x86(byte[],, int seed)}} applies the mask (checked by debugger walk through in EclipseIDE). {{hash128(byte[],, int seed)}} should be a call of {{hash128x86(byte[],, long)}} directly. I think that casting at the call site will do that: {noformat} return hash128x64(data, offset, length, (long)seed); {noformat} or for clarity explicitly: {noformat} long seedLong = seed; /* unmasked 32->64 bit extension */ return hash128x64(data, offset, length, seedLong); {noformat} If the private static work function had a different name, then automatic, unmasked conversion would have applied. was (Author: andy.seaborne): The v1.14 version of {{hash128(byte[], , int seed)}} does now apply the seed mask, contrary to the comments. Line 805 {noformat} @Deprecated public static long[] hash128(final byte[] data, final int offset, final int length, final int seed) { // ************ // Note: This fails to apply masking using 0xffffffffL to the seed. // ************ return hash128x64(data, offset, length, seed); } {noformat} It calls {{hash128x86(byte[],, int seed)}} (exact signature match), not {hash128x86(byte[],, long seed)}} (type conversion). {{hash128x86(byte[],, int seed)}} applies the mask (checked by debugger walk through in EclipseIDE). {{hash128(byte[],, int seed)}} should be a call of {{hash128x86(byte[],, long)}} directly. I think that casting at the call site will do that: {noformat} return hash128x64(data, offset, length, (long)seed); {noformat} or for clarity explicitly: {noformat} long seedLong = seed; /* unmasked 32->64 bit extension */ return hash128x64(data, offset, length, seedLong); {noformat} If the private static work function had a different name, then automatic, unmasked conversion would have applied. > murmur3.hash128() does not account for unsigned seed argument > ------------------------------------------------------------- > > Key: CODEC-264 > URL: https://issues.apache.org/jira/browse/CODEC-264 > Project: Commons Codec > Issue Type: Bug > Affects Versions: 1.13 > Reporter: Claude Warren > Assignee: Alex Herbert > Priority: Major > Fix For: 1.14 > > Attachments: YonikMurmur3Tests.java > > > The original murmur3_x64_128 code used unsigned int for seed arguments. > Using the equivalent bit patterns in the commons codec version does not yield > the same results. > I believe this is because the commons version does not account for sign > extension etc. > Yonic Seeley [~yonik] has explains the issue in his implementation > https://github.com/yonik/java_util/blob/master/src/util/hash/MurmurHash3.java > He provides a test case to show that his code returns the same answers as the > original C/C++ code. I modified that test to call the codec version to show > the error. > I have attached that test case. > Given that the original code is in the wild I am uncertain how to fix this > issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)