Hi Team,
I am seeing following error in hive in reduce phase,can you guide me on its
cause and possible solution ?
java.lang.RuntimeException: Hive Runtime Error while closing operators: Unable
to rename output from:
Hi,
Write the custom partitioner on timestamp and as you mentioned set
#reducers to X.
Thanks, but how to set reducer number to X? X is dependent on input
(run-time), which is unknown on job configuration (compile time).
2014-03-01 17:44 GMT+08:00 AnilKumar B akumarb2...@gmail.com:
Hi,
Write the custom partitioner on timestamp and as you mentioned set
#reducers to X.
You can use MultipleOutputs and construct the custom file name based on
timestamp.
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html
On Fri, Feb 28, 2014 at 11:44 PM, Fengyun RAO raofeng...@gmail.com wrote:
It's a common web log analysis
I've tried setting all of this at once:
conf.set(yarn.resourcemanager.connect.max-wait.mshttp://yarn.resourcemanager.connect.max-wait.ms,
500);
conf.set(yarn.resourcemanager.connect.retry-interval.mshttp://yarn.resourcemanager.connect.retry-interval.ms,
500);
I would like to explore Call Data Record (CDR aka Call Detail Record) analysis,
and to that end I'm looking for a large (GB+) CDR file or a program to
synthesize a somewhat-realistic sample file. Does anyone know where to find
such a thing?
Thanks
John
Have you looked at
http://www.gedis-studio.com/online-call-detail-records-cdr-generator.html ?
On Sat, Mar 1, 2014 at 7:39 AM, John Lilley john.lil...@redpoint.netwrote:
I would like to explore Call Data Record (CDR aka Call Detail Record)
analysis, and to that end I'm looking for a large
Hello,
I am trying to execute a CUDA benchmark in a Hadoop Framework and using
Hadoop Pipes for invoking the CUDA code which is written in a C++
interface from the Hadoop Framework. I am just a bit interested in
knowing what can be the drawbacks of using Hadoop Pipes for this and
whether
Yes I have, and I'm talking to them now about getting a sample file. They may
be nice and give me a large file. I was also hoping to find real data if
possible.
Thanks,
john
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Saturday, March 01, 2014 8:43 AM
To: common-u...@hadoop.apache.org
We have a node that died and had to be rebuilt. However, its status is still
showing in the dfsadmin report
hdfs dfsadmin -report
[...]
Dead datanodes:
Name: 192.168.57.104:50010 (192.168.57.104)
Hostname: tarantula.office.datalever.com
Decommission Status : Decommissioned
Configured Capacity:
Thanks Devin. We don't just want one file. It's complicated.
if the input folder contains data in X hours, we want X files,
if Y hours, we want Y files.
obviously, X or Y is unknown on compile time.
2014-03-01 20:48 GMT+08:00 Devin Suiter RDX dsui...@rdx.com:
If you only want one file, then
Fengyun,
Is there any particular reason you have to have exactly 1 file per hour? As
you probably knew already, each reducer will output 1 file, or if you use
MultipleOutputs as I suggested, a set of files. If you have to fit the
number of reducers to the number hours you have from the input, and
Thanks, Simon. that's very clear.
2014-03-02 14:53 GMT+08:00 Simon Dong simond...@gmail.com:
Reading data for each hour shouldn't be a problem, as for Hadoop or shell
you can pretty much do everything with mmddhh* as you can do with mmddhh.
But if you need the data for the hour all sorted
Don't you think using flume would be easier. Use hdfs sink and use a
property to roll out the log file every hour.
By doing this way you use a single flume agent to receive logs as and when
it is generating and you will be directly dumping to hdfs.
If you want to remove unwanted logs you can write
14 matches
Mail list logo