Re: CombineHiveInputFormat not working

2015-09-30 Thread Pradeep Gollakota
istro and version, be potentially aware of > > https://issues.apache.org/jira/browse/MAPREDUCE-1597 > > and > > https://issues.apache.org/jira/browse/MAPREDUCE-5537 > > > > test it and see... > > > > *From:* Pradeep Gollakota [mailto:pradeep...@gmail.com

Re: CombineHiveInputFormat not working

2015-09-30 Thread Pradeep Gollakota
CombineHiveInputFormat not working > > > > what are your values for: > > mapred.min.split.size > > mapred.max.split.size > > hive.hadoop.supports.splittable.combineinputformat > > > > > > *From:* Pradeep Gollakota [mailto:pradeep...@gmail.com] > *Sent:* Wednesday,

CombineHiveInputFormat not working

2015-09-30 Thread Pradeep Gollakota
Hi all, I have an external table of with the following DDL. ``` DROP TABLE IF EXISTS raw_events; CREATE EXTERNAL TABLE IF NOT EXISTS raw_events ( raw_event_string string) PARTITIONED BY (dc string, community string, dt string) STORED AS TEXTFILE LOCATION '/lithium/events/{dc}/{community}/even

Re: Very slow dynamic partition load

2015-06-11 Thread Pradeep Gollakota
I actually decided to remove one of my 2 partition columns and make it a bucketing column instead... same query completed fully in under 10 minutes with 92 partitions added. This will suffice for me for now. On Thu, Jun 11, 2015 at 2:25 PM, Pradeep Gollakota wrote: > Hmm... did your performa

Re: Very slow dynamic partition load

2015-06-11 Thread Pradeep Gollakota
al partitions > in the table but simply partitioned data in hdfs give it a shot. It may be > worthwhile looking into optimizations for this use case. > > -Slava > > On Thu, Jun 11, 2015 at 11:56 AM, Pradeep Gollakota > wrote: > >> Hi All, >> >> I have a

Very slow dynamic partition load

2015-06-11 Thread Pradeep Gollakota
Hi All, I have a table which is partitioned on two columns (customer, date). I'm loading some data into the table using a Hive query. The MapReduce job completed within a few minutes and needs to "commit" the data to the appropriate partitions. There were about 32000 partitions generated. The comm

Re: HCatInputFormat combine splits

2015-05-14 Thread Pradeep Gollakota
= > > > > > > On Thursday, May 14, 2015 11:04 AM, Pradeep Gollakota < > pradeep...@gmail.com> wrote: > > > The following property has been to no effect. > > mapreduce.input.fileinputformat.split.maxsize = 67108864 > > I'm still getting 1 Mapper p

Re: HCatInputFormat combine splits

2015-05-14 Thread Pradeep Gollakota
The following property has been to no effect. mapreduce.input.fileinputformat.split.maxsize = 67108864 I'm still getting 1 Mapper per file. On Thu, May 14, 2015 at 10:27 AM, Ankit Bhatnagar wrote: > you can explicitly set the split size > > > > On Wednesday, May 13, 20

HCatInputFormat combine splits

2015-05-13 Thread Pradeep Gollakota
Hi All, I'm writing an MR job to read data using HCatInputFormat... however, the job is generating too many splits. I don't have this problem when running queries in Hive since it combines splits by default. Is there an equivalent in MR so that I'm not generating thousands of mappers? Thanks, Pr

Re: Create custom UDF

2015-03-03 Thread Pradeep Gollakota
This is what I use: org.apache.hive hive-exec 0.12.0 provided I don't believe anything else is needed. On Tue, Mar 3, 2015 at 2:43 PM, Buntu Dev wrote: > I couldn't find any official documentation on how to create a UDF and mvn > dependencies for building the project except f

Re: Hive - regexp_replace function for multiple strings

2015-02-03 Thread Pradeep Gollakota
I don't think this is doable using the out of the box regexp_replace() UDF. That way I would do it, is using a file to create a mapping between a regexp and it's replacement and write a custom UDF that loads this file and applies all regular expressions on the input. Hope this helps. On Tue, Feb