Check the threads in hive user group under “Impact of partitioning on certain
queries”
HTH
Dr Mich Talebzadeh
LinkedIn
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Running the most
”
data) even if
the query explicitly specifies a single partition.
(I mean I _could_ actually do the experiments myself…)
Regards,
Z
From: Owen O'Malley [mailto:omal...@apache.org]
Sent: 02 July 2013 15:52
To: user@hive.apache.org
Subject: Re: Partition performance
On Tue, Jul 2, 2013 at 2:34
with the “same”
data) even if
the query explicitly specifies a single partition.
(I mean I _could_ actually do the experiments myself…)
Regards,
Z
From: Owen O'Malley [mailto:omal...@apache.org]
Sent: 02 July 2013 15:52
To: user@hive.apache.orgmailto:user@hive.apache.org
Subject: Re: Partition performance
On 2 Jul 2013, at 16:51, Owen O'Malley wrote:
On Tue, Jul 2, 2013 at 2:34 AM, Peter Marron
peter.mar...@trilliumsoftware.com wrote:
Hi Owen,
** **
I’m curious about this advice about partitioning. Is there some
fundamental reason why Hive
is slow when the number of partitions
How big were the files in each case in your experiment? Having lots of
small files will add Hadoop overhead.
Also, it would be useful to know the execution times of the map and reduce
tasks. The rule of thumb is that under 20 seconds each, or so, you're
paying a significant of the execution time
1) each partition object is a row in the metastore usually mysql, querying
large tables with many partitions has longer startup time as the hive query
planner has to fetch and process all of this meta-information. This is not
a distributed process. It is usually fast within a few seconds but for
On Wed, Jul 3, 2013 at 5:19 AM, David Morel dmore...@gmail.com wrote:
That is still not really answering the question, which is: why is it slower
to run a query on a heavily partitioned table than it is on the same number
of files in a less heavily partitioned table.
According to Gopal's
?
(It’s not currently a problem for me but I can see that I am going to need to
be able to explain the situation.)
Warm regards,
Z
From: Owen O'Malley [mailto:omal...@apache.org]
Sent: 05 April 2013 00:26
To: user@hive.apache.org
Subject: Re: Partition performance
See slide #9 from my Optimizing
On Tue, Jul 2, 2013 at 2:34 AM, Peter Marron
peter.mar...@trilliumsoftware.com wrote:
Hi Owen,
** **
I’m curious about this advice about partitioning. Is there some
fundamental reason why Hive
is slow when the number of partitions is 10,000 rather than 1,000?
The precise
5, 2013 1:12 PM
Subject: Re: Partition performance
Can you tell how many map tasks are there in each scenario?
If my assumption is correct, you should have 336 in the first case and 14 in
second case.
It looks like it is combing all small files in a folder and running as one map
task
wondering what's the reason behind it? If I run this on a real cluster,
maybe it won't perform so differently?
Thanks.
From: Dean Wampler dean.wamp...@thinkbiganalytics.com
To: user@hive.apache.org
Sent: Thursday, April 4, 2013 4:28 PM
Subject: Re: Partition
...@thinkbiganalytics.com
*To:* user@hive.apache.org
*Sent:* Thursday, April 4, 2013 4:28 PM
*Subject:* Re: Partition performance
Also, how big are the files in each directory? Are they roughly the size
of one HDFS block or a multiple. Lots of small files will mean lots of
mapper tasks will little to do
The slow down is most possibly due to large number of partitions.
I believe the Hive book authors tell us to be cautious with large number of
partitions :-) and I abide by that.
Users
Please add your points of view and experiences
Thanks
sanjay
From: Ian
Is it possible for you to send the explain plan of these two queries?
Regards,
Ramki.
On Thu, Apr 4, 2013 at 4:06 PM, Sanjay Subramanian
sanjay.subraman...@wizecommerce.com wrote:
The slow down is most possibly due to large number of partitions.
I believe the Hive book authors tell us to
See slide #9 from my Optimizing Hive Queries talk
http://www.slideshare.net/oom65/optimize-hivequeriespptx . Certainly, we
will improve it, but for now you are much better off with 1,000 partitions
than 10,000.
-- Owen
On Thu, Apr 4, 2013 at 4:21 PM, Ramki Palle ramki.pa...@gmail.com wrote:
Also, how big are the files in each directory? Are they roughly the size of
one HDFS block or a multiple. Lots of small files will mean lots of mapper
tasks will little to do.
You can also compare the job tracker console output for each job. I bet the
slow one has a lot of very short map and
16 matches
Mail list logo