Would like to request write access to the Hive Wiki

2018-08-08 Thread Lu
Hi Hive team, I would like to request the write access to the Hive Wiki because I am planning to write a wiki for Teradata Binary File SerDe (JIRA ticket is here: https://issues.apache.org/jira/browse/HIVE-20225) My confluence username is: luli (https://cwiki.apache.org/ confluence/display/~luli)

RE: Hive output file 000000_0

2018-08-08 Thread Ryan Harris
Also, I believe that the output format matters. If your output is TEXTFILE I think that all of the reducers can append to the same file concurrently. However for block-based output formats, that isn’t possible. From: Furcy Pin [mailto:pin.fu...@gmail.com] Sent: Wednesday, August 08, 2018 9:58

Re: Hive output file 000000_0

2018-08-08 Thread Furcy Pin
Indeed, if your table is big, then you DO want to parallelise and setting the number of reducers to 1 is clearly not a good idea. >From what you explained, I still don't understand what your exact problem is. Can't you just leave your query as is and be okay with many part_X files? On Wed,

RE: Hive output file 000000_0

2018-08-08 Thread Sujeet Pardeshi
Thanks Furcy. Will the below setting not hit performance? Just one reducer? I am processing huge data and cannot compromise on performance. SET mapred.reduce.tasks = 1 Regards, Sujeet Singh Pardeshi Software Specialist SAS Research and Development (India) Pvt. Ltd. Level 2A and Level 3, Cyb

Re: Hive output file 000000_0

2018-08-08 Thread Furcy Pin
It might sound silly, but isn't it what Hive is supposed to do, being a distributed computation framework and all ? Hive will write one file per reducer, called 0_0, 1_0, etc. where the number corresponds to the number of your reducer. Sometimes the _0 will be a _1 or _2 or more depending

RE: Hive output file 000000_0

2018-08-08 Thread Sujeet Pardeshi
Hi Deepak, Thanks for your response. The table is not bucketed or clustered. It can be seen below. DROP TABLE IF EXISTS ${SCHEMA_NM}. daily_summary; CREATE EXTERNAL TABLE ${SCHEMA_NM}.daily_summary ( bouncer VARCHAR(12), device_type VARCHAR(52), visitor_type VARCHAR(10), visit_origination