Hi all,
I have an external table of with the following DDL.
```
DROP TABLE IF EXISTS raw_events;
CREATE EXTERNAL TABLE IF NOT EXISTS raw_events (
raw_event_string string)
PARTITIONED BY (dc string, community string, dt string)
STORED AS TEXTFILE
LOCATION
iveInputFormat not working
>
>
>
> what are your values for:
>
> mapred.min.split.size
>
> mapred.max.split.size
>
> hive.hadoop.supports.splittable.combineinputformat
>
>
>
>
>
> *From:* Pradeep Gollakota [mailto:pradeep...@gmail.com]
> *Sent:* Wed
n your hadoop distro and version, be potentially aware of
>
> https://issues.apache.org/jira/browse/MAPREDUCE-1597
>
> and
>
> https://issues.apache.org/jira/browse/MAPREDUCE-5537
>
>
>
> test it and see...
>
>
>
> *From:* Pradeep Gollakota [mailto:pradeep.
actual partitions
in the table but simply partitioned data in hdfs give it a shot. It may be
worthwhile looking into optimizations for this use case.
-Slava
On Thu, Jun 11, 2015 at 11:56 AM, Pradeep Gollakota pradeep...@gmail.com
wrote:
Hi All,
I have a table which is partitioned on two
I actually decided to remove one of my 2 partition columns and make it a
bucketing column instead... same query completed fully in under 10 minutes
with 92 partitions added. This will suffice for me for now.
On Thu, Jun 11, 2015 at 2:25 PM, Pradeep Gollakota pradeep...@gmail.com
wrote:
Hmm
Hi All,
I have a table which is partitioned on two columns (customer, date). I'm
loading some data into the table using a Hive query. The MapReduce job
completed within a few minutes and needs to commit the data to the
appropriate partitions. There were about 32000 partitions generated. The
:37 PM, Pradeep Gollakota
pradeep...@gmail.com wrote:
Hi All,
I'm writing an MR job to read data using HCatInputFormat... however, the
job is generating too many splits. I don't have this problem when running
queries in Hive since it combines splits by default.
Is there an equivalent in MR
Hi All,
I'm writing an MR job to read data using HCatInputFormat... however, the
job is generating too many splits. I don't have this problem when running
queries in Hive since it combines splits by default.
Is there an equivalent in MR so that I'm not generating thousands of
mappers?
Thanks,
I don't think this is doable using the out of the box regexp_replace() UDF.
That way I would do it, is using a file to create a mapping between a
regexp and it's replacement and write a custom UDF that loads this file and
applies all regular expressions on the input.
Hope this helps.
On Tue, Feb