Seeing strange error in Hive

2014-03-01 Thread Siddharth Tiwari
Hi Team, I am seeing following error in hive in reduce phase,can you guide me on its cause and possible solution ? java.lang.RuntimeException: Hive Runtime Error while closing operators: Unable to rename output from:

Re: Map-Reduce: How to make MR output one file an hour?

2014-03-01 Thread AnilKumar B
Hi, Write the custom partitioner on timestamp and as you mentioned set #reducers to X.

Re: Map-Reduce: How to make MR output one file an hour?

2014-03-01 Thread Fengyun RAO
Thanks, but how to set reducer number to X? X is dependent on input (run-time), which is unknown on job configuration (compile time). 2014-03-01 17:44 GMT+08:00 AnilKumar B akumarb2...@gmail.com: Hi, Write the custom partitioner on timestamp and as you mentioned set #reducers to X.

Re: Map-Reduce: How to make MR output one file an hour?

2014-03-01 Thread Simon Dong
You can use MultipleOutputs and construct the custom file name based on timestamp. http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html On Fri, Feb 28, 2014 at 11:44 PM, Fengyun RAO raofeng...@gmail.com wrote: It's a common web log analysis

RE: very long timeout on failed RM connect

2014-03-01 Thread John Lilley
I've tried setting all of this at once: conf.set(yarn.resourcemanager.connect.max-wait.mshttp://yarn.resourcemanager.connect.max-wait.ms, 500); conf.set(yarn.resourcemanager.connect.retry-interval.mshttp://yarn.resourcemanager.connect.retry-interval.ms, 500);

large CDR data samples

2014-03-01 Thread John Lilley
I would like to explore Call Data Record (CDR aka Call Detail Record) analysis, and to that end I'm looking for a large (GB+) CDR file or a program to synthesize a somewhat-realistic sample file. Does anyone know where to find such a thing? Thanks John

Re: large CDR data samples

2014-03-01 Thread Ted Yu
Have you looked at http://www.gedis-studio.com/online-call-detail-records-cdr-generator.html ? On Sat, Mar 1, 2014 at 7:39 AM, John Lilley john.lil...@redpoint.netwrote: I would like to explore Call Data Record (CDR aka Call Detail Record) analysis, and to that end I'm looking for a large

Drawbacks of Hadoop Pipes

2014-03-01 Thread Basu,Indrashish
Hello, I am trying to execute a CUDA benchmark in a Hadoop Framework and using Hadoop Pipes for invoking the CUDA code which is written in a C++ interface from the Hadoop Framework. I am just a bit interested in knowing what can be the drawbacks of using Hadoop Pipes for this and whether

RE: large CDR data samples

2014-03-01 Thread John Lilley
Yes I have, and I'm talking to them now about getting a sample file. They may be nice and give me a large file. I was also hoping to find real data if possible. Thanks, john From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Saturday, March 01, 2014 8:43 AM To: common-u...@hadoop.apache.org

how to remove a dead node?

2014-03-01 Thread John Lilley
We have a node that died and had to be rebuilt. However, its status is still showing in the dfsadmin report hdfs dfsadmin -report [...] Dead datanodes: Name: 192.168.57.104:50010 (192.168.57.104) Hostname: tarantula.office.datalever.com Decommission Status : Decommissioned Configured Capacity:

Re: Map-Reduce: How to make MR output one file an hour?

2014-03-01 Thread Fengyun RAO
Thanks Devin. We don't just want one file. It's complicated. if the input folder contains data in X hours, we want X files, if Y hours, we want Y files. obviously, X or Y is unknown on compile time. 2014-03-01 20:48 GMT+08:00 Devin Suiter RDX dsui...@rdx.com: If you only want one file, then

Re: Map-Reduce: How to make MR output one file an hour?

2014-03-01 Thread Simon Dong
Fengyun, Is there any particular reason you have to have exactly 1 file per hour? As you probably knew already, each reducer will output 1 file, or if you use MultipleOutputs as I suggested, a set of files. If you have to fit the number of reducers to the number hours you have from the input, and

Re: Map-Reduce: How to make MR output one file an hour?

2014-03-01 Thread Fengyun RAO
Thanks, Simon. that's very clear. 2014-03-02 14:53 GMT+08:00 Simon Dong simond...@gmail.com: Reading data for each hour shouldn't be a problem, as for Hadoop or shell you can pretty much do everything with mmddhh* as you can do with mmddhh. But if you need the data for the hour all sorted

Re: Map-Reduce: How to make MR output one file an hour?

2014-03-01 Thread Shekhar Sharma
Don't you think using flume would be easier. Use hdfs sink and use a property to roll out the log file every hour. By doing this way you use a single flume agent to receive logs as and when it is generating and you will be directly dumping to hdfs. If you want to remove unwanted logs you can write