Hi Bhavani Sudha,
>> Are you using spark sql or Hive query?
This happens on all hive, hive on spark, spark sql.
>> the table type ,
This happens for both copy on write and merge on read.
>> configs,
hoodie.upsert.shuffle.parallelism=2
hoodie.insert.shuffle.parallelism=2
hoodie.bulkinsert
> With templates, we can collect good information while people file the
> issues..Not sure about permissions we have on JIRA to enable bots, but may
> have more luck on github workflows doing these already?
> Can we do templates/required fields with JIRAs as well?
Yes, it is very much possible to
Hi Gurudatt and Vinoth,
Thanks for sharing your valuable opinion.
Considering Hudi is still a growing project. I agree that it's better to
keep Github's Issues tab as a way to discuss problems currently.
+1 to introduce issue template and management bot.
Best,
Vino
Vinoth Chandar 于2019年11月19日
+1 on all three.
Would there be a overhaul of existing code to add comments to all classes?
We are pretty reasonable already, but good to get this in shape.
17:54:37 [incubator-hudi]$ grep -R -B 1 "public class" hudi-*/src/main/java
| grep "public class" | wc -l
274
17:54:50 [incubator-hudi]
+1, it’s a hard work but meaningful.
| |
lamberken
IT
|
|
ly.com
lamber...@163.com
|
签名由网易邮箱大师定制
On 11/19/2019 07:27,leesf wrote:
Hi vino,
Thanks for bringing ths discussion up.
+1 on all. the third one seems a bit too strict and usually requires manual
processing of the import order, but I al
Hi vino,
Thanks for bringing ths discussion up.
+1 on all. the third one seems a bit too strict and usually requires manual
processing of the import order, but I also agree and think it makes our
project more professional. And I learned that the calcite community is also
applying this rule.
Best,
Hi Pratyaksh,
Let me try to answer this. I believe spark does not natively invoke
HoodieParquetInputFormat.getSplits() like Hive and Presto does. So when
queried, spark just loads all the data files in that partition without
applying Hoodie filtering logic. Thats why we need to instruct Spark to
r
Hi Gurudatt,
Can you share more context on the table and the query. Are you using spark
sql or Hive query? the table type , etc? Also, if you can provide a small
snippet to reproduce with the configs that you used, it would be useful to
debug.
Thanks,
Sudha
On Sun, Nov 17, 2019 at 11:09 PM Gurud
If we decide to keep GitHub Issues, both great suggestions. We should still
debate if we keep GH issues. I just shared my opinion. :)
With templates, we can collect good information while people file the
issues..Not sure about permissions we have on JIRA to enable bots, but may
have more luck on g
https://jira.apache.org/jira/browse/HUDI-343. tracks this
On Sat, Nov 16, 2019 at 1:46 PM Thomas Weise wrote:
> Sorry for the late reply.
>
> The reporter is applicable to top level projects.
>
> But please create a DOAP file for Hudi, where you can also list the
> release: https://projects.apac
Hi Vinoth / Vino,
Just adding my 2 cents to the discussion. Yes, I agree that GitHub issues
are low friction and can be the first line of support. It will help in
keeping the JIRA clean.
Potential solutions that I have come across in the community,
1. Introduce an issue template.
2. Add a bot th
@vinoyang. All valid points. I just have 1 argument (all others you are
right and I have always known this tradeoff) for keeping Github issues,
when we are still growing the community and that is : it lets anyone with a
github id raise an issue without forcing to sign up for JIRA account. For
large
Figured out.
Below command worked for me in PySpark.
*spark._jsc.hadoopConfiguration().set('mapreduce.input.pathFilter.class','org.apache.hudi.hadoop.HoodieROTablePathFilter')*
Regards,
Purushotham Pushpavanth
On Mon, 18 Nov 2019 at 16:47, Purushotham Pushpavanthar <
pushpavant...@gmail.com> w
Having proper class level and method level comments always makes the life
easier for any new user.
+1 for points 1,2 and 4.
On Mon, Nov 18, 2019 at 5:59 PM vino yang wrote:
> Hi guys,
>
> Currently, Hudi's comment and code styles do not have a uniform
> specification on certain rules. I will li
Hi guys,
Currently, Hudi's comment and code styles do not have a uniform
specification on certain rules. I will list them below. With the rapid
development of the community, the inconsistent comment specification will
bring a lot of problems. I am here to assume that everyone is aware of its
impor
Kabeer, can you please share *PySpark* command to register pathfileter
class?
Regards,
Purushotham Pushpavanth
On Mon, 18 Nov 2019 at 13:46, Pratyaksh Sharma
wrote:
> Hi Vinoth/Kabeer,
>
> I have one small doubt regarding what you proposed to fix the issue. Why is
> HoodieParquetInputFormat c
Hi Vinoth/Kabeer,
I have one small doubt regarding what you proposed to fix the issue. Why is
HoodieParquetInputFormat class not able to handle deduplication of records
in case of spark while it is able to do so in case of presto and hive?
On Sun, Nov 17, 2019 at 4:08 AM Vinoth Chandar wrote:
>
17 matches
Mail list logo