Hi Hive team,
I would like to request the write access to the Hive Wiki because I am
planning to write a wiki for Teradata Binary File SerDe (JIRA ticket is
here: https://issues.apache.org/jira/browse/HIVE-20225)
My confluence username is: luli (https://cwiki.apache.org/
confluence/display/~luli)
Also, I believe that the output format matters. If your output is TEXTFILE I
think that all of the reducers can append to the same file concurrently.
However for block-based output formats, that isn’t possible.
From: Furcy Pin [mailto:pin.fu...@gmail.com]
Sent: Wednesday, August 08, 2018 9:58
Indeed, if your table is big, then you DO want to parallelise and setting
the number of reducers to 1 is clearly not a good idea.
>From what you explained, I still don't understand what your exact problem
is.
Can't you just leave your query as is and be okay with many part_X
files?
On Wed,
Thanks Furcy. Will the below setting not hit performance? Just one reducer? I
am processing huge data and cannot compromise on performance.
SET mapred.reduce.tasks = 1
Regards,
Sujeet Singh Pardeshi
Software Specialist
SAS Research and Development (India) Pvt. Ltd.
Level 2A and Level 3, Cyb
It might sound silly, but isn't it what Hive is supposed to do, being a
distributed computation framework and all ?
Hive will write one file per reducer, called 0_0, 1_0, etc. where
the number corresponds to the number of your reducer.
Sometimes the _0 will be a _1 or _2 or more depending
Hi Deepak,
Thanks for your response. The table is not bucketed or clustered. It can be
seen below.
DROP TABLE IF EXISTS ${SCHEMA_NM}. daily_summary;
CREATE EXTERNAL TABLE ${SCHEMA_NM}.daily_summary
(
bouncer VARCHAR(12),
device_type VARCHAR(52),
visitor_type VARCHAR(10),
visit_origination