date:20140301

Seeing strange error in Hive

2014-03-01 Thread Siddharth Tiwari

Hi Team, I am seeing following error in hive in reduce phase,can you guide me on its cause and possible solution ? java.lang.RuntimeException: Hive Runtime Error while closing operators: Unable to rename output from:

Re: Map-Reduce: How to make MR output one file an hour?

2014-03-01 Thread AnilKumar B

Hi, Write the custom partitioner on timestamp and as you mentioned set #reducers to X.

Re: Map-Reduce: How to make MR output one file an hour?

2014-03-01 Thread Fengyun RAO

Thanks, but how to set reducer number to X? X is dependent on input (run-time), which is unknown on job configuration (compile time). 2014-03-01 17:44 GMT+08:00 AnilKumar B akumarb2...@gmail.com: Hi, Write the custom partitioner on timestamp and as you mentioned set #reducers to X.

Re: Map-Reduce: How to make MR output one file an hour?

2014-03-01 Thread Simon Dong

You can use MultipleOutputs and construct the custom file name based on timestamp. http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html On Fri, Feb 28, 2014 at 11:44 PM, Fengyun RAO raofeng...@gmail.com wrote: It's a common web log analysis

RE: very long timeout on failed RM connect

2014-03-01 Thread John Lilley

I've tried setting all of this at once: conf.set(yarn.resourcemanager.connect.max-wait.mshttp://yarn.resourcemanager.connect.max-wait.ms, 500); conf.set(yarn.resourcemanager.connect.retry-interval.mshttp://yarn.resourcemanager.connect.retry-interval.ms, 500);

large CDR data samples

2014-03-01 Thread John Lilley

I would like to explore Call Data Record (CDR aka Call Detail Record) analysis, and to that end I'm looking for a large (GB+) CDR file or a program to synthesize a somewhat-realistic sample file. Does anyone know where to find such a thing? Thanks John

Re: large CDR data samples

2014-03-01 Thread Ted Yu

Have you looked at http://www.gedis-studio.com/online-call-detail-records-cdr-generator.html ? On Sat, Mar 1, 2014 at 7:39 AM, John Lilley john.lil...@redpoint.netwrote: I would like to explore Call Data Record (CDR aka Call Detail Record) analysis, and to that end I'm looking for a large

Drawbacks of Hadoop Pipes

2014-03-01 Thread Basu,Indrashish

Hello, I am trying to execute a CUDA benchmark in a Hadoop Framework and using Hadoop Pipes for invoking the CUDA code which is written in a C++ interface from the Hadoop Framework. I am just a bit interested in knowing what can be the drawbacks of using Hadoop Pipes for this and whether

RE: large CDR data samples

2014-03-01 Thread John Lilley

Yes I have, and I'm talking to them now about getting a sample file. They may be nice and give me a large file. I was also hoping to find real data if possible. Thanks, john From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Saturday, March 01, 2014 8:43 AM To: common-u...@hadoop.apache.org

how to remove a dead node?

2014-03-01 Thread John Lilley

We have a node that died and had to be rebuilt. However, its status is still showing in the dfsadmin report hdfs dfsadmin -report [...] Dead datanodes: Name: 192.168.57.104:50010 (192.168.57.104) Hostname: tarantula.office.datalever.com Decommission Status : Decommissioned Configured Capacity:

Re: Map-Reduce: How to make MR output one file an hour?

2014-03-01 Thread Fengyun RAO

Thanks Devin. We don't just want one file. It's complicated. if the input folder contains data in X hours, we want X files, if Y hours, we want Y files. obviously, X or Y is unknown on compile time. 2014-03-01 20:48 GMT+08:00 Devin Suiter RDX dsui...@rdx.com: If you only want one file, then

Re: Map-Reduce: How to make MR output one file an hour?

2014-03-01 Thread Simon Dong

Fengyun, Is there any particular reason you have to have exactly 1 file per hour? As you probably knew already, each reducer will output 1 file, or if you use MultipleOutputs as I suggested, a set of files. If you have to fit the number of reducers to the number hours you have from the input, and

Re: Map-Reduce: How to make MR output one file an hour?

2014-03-01 Thread Fengyun RAO

Thanks, Simon. that's very clear. 2014-03-02 14:53 GMT+08:00 Simon Dong simond...@gmail.com: Reading data for each hour shouldn't be a problem, as for Hadoop or shell you can pretty much do everything with mmddhh* as you can do with mmddhh. But if you need the data for the hour all sorted

Re: Map-Reduce: How to make MR output one file an hour?

2014-03-01 Thread Shekhar Sharma

Don't you think using flume would be easier. Use hdfs sink and use a property to roll out the log file every hour. By doing this way you use a single flume agent to receive logs as and when it is generating and you will be directly dumping to hdfs. If you want to remove unwanted logs you can write

Seeing strange error in Hive

Re: Map-Reduce: How to make MR output one file an hour?

Re: Map-Reduce: How to make MR output one file an hour?

Re: Map-Reduce: How to make MR output one file an hour?

RE: very long timeout on failed RM connect

large CDR data samples

Re: large CDR data samples

Drawbacks of Hadoop Pipes

RE: large CDR data samples

how to remove a dead node?

Re: Map-Reduce: How to make MR output one file an hour?

Re: Map-Reduce: How to make MR output one file an hour?

Re: Map-Reduce: How to make MR output one file an hour?

Re: Map-Reduce: How to make MR output one file an hour?

14 matches

Site Navigation

Mail list logo

Footer information