danny0405 commented on PR #8657:
URL: https://github.com/apache/hudi/pull/8657#issuecomment-1547119881
> I am not sure it is a good design to introduce spark concepts within
hudi-client-common
Obviously it is a bad design that we should avoid to take, can we just impl
the whole spark
danny0405 commented on PR #8657:
URL: https://github.com/apache/hudi/pull/8657#issuecomment-1543269796
> Hardcoding Murmur is likely a good idea
Not hardcoding, I mean to make it configurable, the use choose the algorithm
they desire to use.
> it would allow to support both spa
danny0405 commented on PR #8657:
URL: https://github.com/apache/hudi/pull/8657#issuecomment-1541617605
> > , I'm afraid the algorithm should be in-consistency too in order to
operate the bucket pruning opimization
>
> not sure to understand. Do you mean the hashing algorithm must be t
danny0405 commented on PR #8657:
URL: https://github.com/apache/hudi/pull/8657#issuecomment-1541253334
> ${bucketId}_$
So it seems the naming convention used by Hudi is compatible with Hive in
general(not Spark or Trino), the only concern is the hasing algorithm, I'm
afraid the algor
danny0405 commented on PR #8657:
URL: https://github.com/apache/hudi/pull/8657#issuecomment-1539309061
> - hashing - file naming - file numbering - file sorting
Can you elaborate a little more about these items?
--
This is an automated message from the Apache Git Service.
To respond
danny0405 commented on PR #8657:
URL: https://github.com/apache/hudi/pull/8657#issuecomment-1538176946
> * the rfc statement about support of hive bucketing
https://cwiki.apache.org/confluence/display/HUDI/RFC+-+29%3A+Hash+Index
Thanks for the detailed analysis, so what the actions th
danny0405 commented on PR #8657:
URL: https://github.com/apache/hudi/pull/8657#issuecomment-1537691855
> but so far I am not sure what the current status of hudi hashing
It uses only simple Java hashcode:
https://github.com/apache/hudi/blob/20938c30b168d63cf4e520c6b4e1d7b930bed1ab/