CombineHiveInputFormat not working

2015-09-30 Thread Pradeep Gollakota
Hi all, I have an external table of with the following DDL. ``` DROP TABLE IF EXISTS raw_events; CREATE EXTERNAL TABLE IF NOT EXISTS raw_events ( raw_event_string string) PARTITIONED BY (dc string, community string, dt string) STORED AS TEXTFILE LOCATION

Re: CombineHiveInputFormat not working

2015-09-30 Thread Pradeep Gollakota
iveInputFormat not working > > > > what are your values for: > > mapred.min.split.size > > mapred.max.split.size > > hive.hadoop.supports.splittable.combineinputformat > > > > > > *From:* Pradeep Gollakota [mailto:pradeep...@gmail.com] > *Sent:* Wed

Re: CombineHiveInputFormat not working

2015-09-30 Thread Pradeep Gollakota
n your hadoop distro and version, be potentially aware of > > https://issues.apache.org/jira/browse/MAPREDUCE-1597 > > and > > https://issues.apache.org/jira/browse/MAPREDUCE-5537 > > > > test it and see... > > > > *From:* Pradeep Gollakota [mailto:pradeep.

Re: Very slow dynamic partition load

2015-06-11 Thread Pradeep Gollakota
actual partitions in the table but simply partitioned data in hdfs give it a shot. It may be worthwhile looking into optimizations for this use case. -Slava On Thu, Jun 11, 2015 at 11:56 AM, Pradeep Gollakota pradeep...@gmail.com wrote: Hi All, I have a table which is partitioned on two

Re: Very slow dynamic partition load

2015-06-11 Thread Pradeep Gollakota
I actually decided to remove one of my 2 partition columns and make it a bucketing column instead... same query completed fully in under 10 minutes with 92 partitions added. This will suffice for me for now. On Thu, Jun 11, 2015 at 2:25 PM, Pradeep Gollakota pradeep...@gmail.com wrote: Hmm

Very slow dynamic partition load

2015-06-11 Thread Pradeep Gollakota
Hi All, I have a table which is partitioned on two columns (customer, date). I'm loading some data into the table using a Hive query. The MapReduce job completed within a few minutes and needs to commit the data to the appropriate partitions. There were about 32000 partitions generated. The

Re: HCatInputFormat combine splits

2015-05-14 Thread Pradeep Gollakota
:37 PM, Pradeep Gollakota pradeep...@gmail.com wrote: Hi All, I'm writing an MR job to read data using HCatInputFormat... however, the job is generating too many splits. I don't have this problem when running queries in Hive since it combines splits by default. Is there an equivalent in MR

HCatInputFormat combine splits

2015-05-14 Thread Pradeep Gollakota
Hi All, I'm writing an MR job to read data using HCatInputFormat... however, the job is generating too many splits. I don't have this problem when running queries in Hive since it combines splits by default. Is there an equivalent in MR so that I'm not generating thousands of mappers? Thanks,

Re: Hive - regexp_replace function for multiple strings

2015-02-03 Thread Pradeep Gollakota
I don't think this is doable using the out of the box regexp_replace() UDF. That way I would do it, is using a file to create a mapping between a regexp and it's replacement and write a custom UDF that loads this file and applies all regular expressions on the input. Hope this helps. On Tue, Feb