istro and version, be potentially aware of
>
> https://issues.apache.org/jira/browse/MAPREDUCE-1597
>
> and
>
> https://issues.apache.org/jira/browse/MAPREDUCE-5537
>
>
>
> test it and see...
>
>
>
> *From:* Pradeep Gollakota [mailto:pradeep...@gmail.com
CombineHiveInputFormat not working
>
>
>
> what are your values for:
>
> mapred.min.split.size
>
> mapred.max.split.size
>
> hive.hadoop.supports.splittable.combineinputformat
>
>
>
>
>
> *From:* Pradeep Gollakota [mailto:pradeep...@gmail.com]
> *Sent:* Wednesday,
Hi all,
I have an external table of with the following DDL.
```
DROP TABLE IF EXISTS raw_events;
CREATE EXTERNAL TABLE IF NOT EXISTS raw_events (
raw_event_string string)
PARTITIONED BY (dc string, community string, dt string)
STORED AS TEXTFILE
LOCATION '/lithium/events/{dc}/{community}/even
I actually decided to remove one of my 2 partition columns and make it a
bucketing column instead... same query completed fully in under 10 minutes
with 92 partitions added. This will suffice for me for now.
On Thu, Jun 11, 2015 at 2:25 PM, Pradeep Gollakota
wrote:
> Hmm... did your performa
al partitions
> in the table but simply partitioned data in hdfs give it a shot. It may be
> worthwhile looking into optimizations for this use case.
>
> -Slava
>
> On Thu, Jun 11, 2015 at 11:56 AM, Pradeep Gollakota
> wrote:
>
>> Hi All,
>>
>> I have a
Hi All,
I have a table which is partitioned on two columns (customer, date). I'm
loading some data into the table using a Hive query. The MapReduce job
completed within a few minutes and needs to "commit" the data to the
appropriate partitions. There were about 32000 partitions generated. The
comm
=
>
>
>
>
>
> On Thursday, May 14, 2015 11:04 AM, Pradeep Gollakota <
> pradeep...@gmail.com> wrote:
>
>
> The following property has been to no effect.
>
> mapreduce.input.fileinputformat.split.maxsize = 67108864
>
> I'm still getting 1 Mapper p
The following property has been to no effect.
mapreduce.input.fileinputformat.split.maxsize = 67108864
I'm still getting 1 Mapper per file.
On Thu, May 14, 2015 at 10:27 AM, Ankit Bhatnagar
wrote:
> you can explicitly set the split size
>
>
>
> On Wednesday, May 13, 20
Hi All,
I'm writing an MR job to read data using HCatInputFormat... however, the
job is generating too many splits. I don't have this problem when running
queries in Hive since it combines splits by default.
Is there an equivalent in MR so that I'm not generating thousands of
mappers?
Thanks,
Pr
This is what I use:
org.apache.hive
hive-exec
0.12.0
provided
I don't believe anything else is needed.
On Tue, Mar 3, 2015 at 2:43 PM, Buntu Dev wrote:
> I couldn't find any official documentation on how to create a UDF and mvn
> dependencies for building the project except f
I don't think this is doable using the out of the box regexp_replace() UDF.
That way I would do it, is using a file to create a mapping between a
regexp and it's replacement and write a custom UDF that loads this file and
applies all regular expressions on the input.
Hope this helps.
On Tue, Feb
11 matches
Mail list logo