Re: tutorial on Hadoop/Hbase utility classes

2011-09-01 Thread Arun C Murthy
Thanks for putting this up, it's very useful. I'd encourage you to contribute this a documentation patch so that you help everyone who comes to hadoop.apache.org, plus you can be a part of the project and a contributor. I can help with the mechanics - here is a link to help you get started:

Re: Binary content

2011-09-01 Thread Dieter Plaetinck
On Wed, 31 Aug 2011 08:44:42 -0700 Mohit Anchlia mohitanch...@gmail.com wrote: Does map-reduce work well with binary contents in the file? This binary content is basically some CAD files and map reduce program need to read these files using some proprietry tool extract values and do some

Timer jobs

2011-09-01 Thread Per Steffensen
Hi I use hadoop for a MapReduce job in my system. I would like to have the job run very 5th minute. Are there any distributed timer job stuff in hadoop? Of course I could setup a timer in an external timer framework (CRON or something like that) that invokes the MapReduce job. But CRON is

Re: Timer jobs

2011-09-01 Thread Ronen Itkin
Hi Try to use Oozie for job coordination and work flows. On Thu, Sep 1, 2011 at 12:30 PM, Per Steffensen st...@designware.dk wrote: Hi I use hadoop for a MapReduce job in my system. I would like to have the job run very 5th minute. Are there any distributed timer job stuff in hadoop? Of

Re: Hadoop with Netapp

2011-09-01 Thread Steve Loughran
On 25/08/11 08:20, Sagar Shukla wrote: Hi Hakan, Please find my comments inline in blue : -Original Message- From: Hakan (c)lter [mailto:hakanil...@gmail.com] Sent: Thursday, August 25, 2011 12:28 PM To: common-user@hadoop.apache.org Subject: Hadoop with Netapp Hi

Re: Turn off all Hadoop logs?

2011-09-01 Thread Steve Loughran
On 29/08/11 20:31, Frank Astier wrote: Is it possible to turn off all the Hadoop logs simultaneously? In my unit tests, I don’t want to see the myriad “INFO” logs spewed out by various Hadoop components. I’m using: ((Log4JLogger) DataNode.LOG).getLogger().setLevel(Level.OFF);

Re: Timer jobs

2011-09-01 Thread Per Steffensen
Hi Thanks a lot for pointing me to Oozie. I have looked a little bit into Oozie and it seems like the component triggering jobs is called Coordinator Application. But I really see nowhere that this Coordinator Application doesnt just run on a single machine, and that it will therefore not

Re: Timer jobs

2011-09-01 Thread Ronen Itkin
If I get you right you are asking about Installing Oozie as Distributed and/or HA cluster?! In that case I am not familiar with an out of the box solution by Oozie. But, I think you can made up a solution of your own, for example: Installing Oozie on two servers on the same partition which will be

I got the problem from Map output lost

2011-09-01 Thread Tu Tu
From this week,My Hadoop caught his problem with information as following: Lost task tracker: tracker_rsync.host01:localhost/127.0.0.1:40759 Map output lost, rescheduling: getMapOutput(attempt_201108021855_6734_m_97_1,2002) failed : org.apache.hadoop.util.DiskChecker$DiskErrorException: Could

Problem with Python + Hadoop: how to link .so outside Python?

2011-09-01 Thread Xiong Deng
Hi, I have successfully installed scipy on my Python 2.7 on my local Linux, and I want to pack my Python2.7 (with scipy) onto Hadoop and run my Python MapReduce scripts, like this: 20 ${HADOOP_HOME}/bin/hadoop streaming \$ 21 -input ${input} \$ 22 -output ${output} \$ 23

Re: Timer jobs

2011-09-01 Thread Alejandro Abdelnur
[moving common-user@ to BCC] Oozie is not HA yet. But it would be relatively easy to make it. It was designed with that in mind, we even did a prototype. Oozie consists of 2 services, a SQL database to store the Oozie jobs state and a servlet container where Oozie app proper runs. The solution

Re: Timer jobs

2011-09-01 Thread Per Steffensen
Thanks for your response. See comments below. Regards, Per Steffensen Alejandro Abdelnur skrev: [moving common-user@ to BCC] Oozie is not HA yet. But it would be relatively easy to make it. It was designed with that in mind, we even did a prototype. Ok, so if it isnt HA out-of-the-box I

Re: Creating a hive table for a custom log

2011-09-01 Thread Brock Noland
Hi, On Thu, Sep 1, 2011 at 9:08 AM, Raimon Bosch raimon.bo...@gmail.com wrote: Hi, I'm trying to create a table similar to apache_log but I'm trying to avoid to write my own map-reduce task because I don't want to have my HDFS files twice. So if you're working with log lines like this:

Re: Timer jobs

2011-09-01 Thread Tharindu Mathew
On Thu, Sep 1, 2011 at 7:58 PM, Per Steffensen st...@designware.dk wrote: Thanks for your response. See comments below. Regards, Per Steffensen Alejandro Abdelnur skrev: [moving common-user@ to BCC] Oozie is not HA yet. But it would be relatively easy to make it. It was designed with

Re: Timer jobs

2011-09-01 Thread Per Steffensen
Well I am not sure I get you right, but anyway, basically I want a timer framework that triggers my jobs. And the triggering of the jobs need to work even though one or two particular machines goes down. So the timer triggering mechanism has to live in the cluster, so to speak. What I dont

Re: Binary content

2011-09-01 Thread Mohit Anchlia
On Thu, Sep 1, 2011 at 1:25 AM, Dieter Plaetinck dieter.plaeti...@intec.ugent.be wrote: On Wed, 31 Aug 2011 08:44:42 -0700 Mohit Anchlia mohitanch...@gmail.com wrote: Does map-reduce work well with binary contents in the file? This binary content is basically some CAD files and map reduce

Re: Timer jobs

2011-09-01 Thread Tharindu Mathew
In Hadoop, if the client that triggers the job fails, is there a way to recover and another client to submit the job? On Thu, Sep 1, 2011 at 8:44 PM, Per Steffensen st...@designware.dk wrote: Well I am not sure I get you right, but anyway, basically I want a timer framework that triggers my

Re: Binary content

2011-09-01 Thread Owen O'Malley
On Thu, Sep 1, 2011 at 8:37 AM, Mohit Anchlia mohitanch...@gmail.comwrote: Thanks! Is there a specific tutorial I can focus on to see how it could be done? Take the word count example and change its output format to be SequenceFileOutputFormat.

Re: Timer jobs

2011-09-01 Thread Vitalii Tymchyshyn
01.09.11 18:14, Per Steffensen написав(ла): Well I am not sure I get you right, but anyway, basically I want a timer framework that triggers my jobs. And the triggering of the jobs need to work even though one or two particular machines goes down. So the timer triggering mechanism has to live

cross product of 2 data sets

2011-09-01 Thread Marc Sturlese
Hey there, I would like to do the cross product of two data sets, any of them feeds in memory. I've seen pig has the cross operation. Can someone please explain me how it implements it? -- View this message in context:

Re: cross product of 2 data sets

2011-09-01 Thread Alan Gates
http://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html search on cross matches Alan. On Sep 1, 2011, at 11:44 AM, Marc Sturlese wrote: Hey there, I would like to do the cross product of two data sets, any of them feeds in memory. I've seen pig has the cross operation. Can

Re: Timer jobs

2011-09-01 Thread Per Steffensen
Vitalii Tymchyshyn skrev: 01.09.11 18:14, Per Steffensen написав(ла): Well I am not sure I get you right, but anyway, basically I want a timer framework that triggers my jobs. And the triggering of the jobs need to work even though one or two particular machines goes down. So the timer

MultipleOutputs - Create multiple files during output

2011-09-01 Thread modemide
Hi all, I was wondering if anyone was familiar with this class. I want to create multiple output files during my reduce. My input files will consist of name1action1date1 name1action2date2 name1action3date3 name2action1date1 name2action2date2 name2action3date3 My goal is to create files with

Namenode not starting

2011-09-01 Thread abhishek sharma
Hi all, I am trying to install Hadoop (release 0.20.203) on a machine with CentOS. When I try to start HDFS, I get the following error. machine-name: Unrecognized option: -jvm machine-name: Could not create the Java virtual machine. Any idea what might be the problem? Thanks, Abhishek

Re: Namenode not starting

2011-09-01 Thread abhishek sharma
Hi Hailong, I have installed JDK and set JAVA_HOME correctly (as far as I know). Output of java -version is: java version 1.6.0_04 Java(TM) SE Runtime Environment (build 1.6.0_04-b12) Java HotSpot(TM) Server VM (build 10.0-b19, mixed mode) I also have another version installed 1.6.0_27 but get

Re: Namenode not starting

2011-09-01 Thread abhishek sharma
Actually, I found the reason. I am running HDFS as root and there is a bug that has recently been fixed. https://issues.apache.org/jira/browse/HDFS-1943 Thanks, Abhishek On Thu, Sep 1, 2011 at 6:25 PM, Ravi Prakash ravihad...@gmail.com wrote: Hi Abhishek, Try reading through the shell

Re: TestDFSIO failure

2011-09-01 Thread Ken Krugler
Hi Matt, On Jun 20, 2011, at 1:46pm, GOEKE, MATTHEW (AG/1000) wrote: Has anyone else run into issues using output compression (in our case lzo) on TestDFSIO and it failing to be able to read the metrics file? I just assumed that it would use the correct decompression codec after it finishes

Re: MultipleOutputs - Create multiple files during output

2011-09-01 Thread Stan Rosenberg
Hi Tim, You could create a custom HashPartitioner so that all key,value pairs denoting the actions of the same user end up in the same reducer; then you need only one output file per reducer. Btw, how large are the output files? make sure you don't end up creating a lot of small files, i.e.,