RE: Performance Issues in Hive with S3 and Partitions

2012-07-27 Thread Connell, Chuck
: Friday, July 27, 2012 2:42 PM To: user@hive.apache.org Subject: RE: Performance Issues in Hive with S3 and Partitions Igor, I did not see any major improvement in the performance even after setting "Hive.optimize.s3.query=true", although the same was suggested by AWS Team. My problem is

RE: Performance Issues in Hive with S3 and Partitions

2012-07-27 Thread richin.jain
he.org<mailto:user@hive.apache.org> Sent: Saturday, July 28, 2012 12:32 AM Subject: Re: Performance Issues in Hive with S3 and Partitions Use a different partitioning scheme or consider using clustered / bucketed tables. On 7/27/12, richin.j...@nokia.com<mailto:richin.j...@nokia.com>

Re: Performance Issues in Hive with S3 and Partitions

2012-07-27 Thread Bejoy Ks
AM Subject: Re: Performance Issues in Hive with S3 and Partitions Use a different partitioning scheme or consider using clustered / bucketed tables. On 7/27/12, richin.j...@nokia.com wrote: > Igor, > > I did not see any major improvement in the performance even after setting > "

Re: Performance Issues in Hive with S3 and Partitions

2012-07-27 Thread Edward Capriolo
gt; Richin > > From: Jain Richin (Nokia-LC/Boston) > Sent: Tuesday, July 24, 2012 11:49 AM > To: user@hive.apache.org > Subject: RE: Performance Issues in Hive with S3 and Partitions > > Hi Igor, > > Thanks for the response. Yes I am using EMR. > I will make changes and

RE: Performance Issues in Hive with S3 and Partitions

2012-07-27 Thread richin.jain
and HDFS are not meant to deal with lot of small files, but if that is the way to go is there any work around? Thanks, Richin From: Jain Richin (Nokia-LC/Boston) Sent: Tuesday, July 24, 2012 11:49 AM To: user@hive.apache.org Subject: RE: Performance Issues in Hive with S3 and Partitions Hi Igor

Re: Performance Issues in Hive with S3 and Partitions

2012-07-24 Thread Edward Capriolo
Hi Igor, > > > > Thanks for the response. Yes I am using EMR. > > I will make changes and let you know if that helps. > > > > Richin > > > > From: ext Igor Tatarinov [mailto:i...@decide.com] > Sent: Tuesday, July 24, 2012 12:38 AM > To: user@hive.apache

RE: Performance Issues in Hive with S3 and Partitions

2012-07-24 Thread richin.jain
Hi Igor, Thanks for the response. Yes I am using EMR. I will make changes and let you know if that helps. Richin From: ext Igor Tatarinov [mailto:i...@decide.com] Sent: Tuesday, July 24, 2012 12:38 AM To: user@hive.apache.org Subject: Re: Performance Issues in Hive with S3 and Partitions Are

Re: Performance Issues in Hive with S3 and Partitions

2012-07-23 Thread Igor Tatarinov
Are you using EMR? Have you tried setting Hive.optimize.s3.query=true as mentioned in http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-version-details.html I haven't tried using that option myself. I am curious if it helps in your scenario. The above page also me

Performance Issues in Hive with S3 and Partitions

2012-07-23 Thread richin.jain
Hi, Sorry this is an AWS Hive Specific question. I have two External Hive tables for my custom logs. 1. flat directory structure on AWS S3, no partition and files in bz2 compressed format (few big files) 2. With 3 level of partitions on AWS S3 (lot of small uncompressed files) I noticed that