Re: only one map or reduce job per time on one node

2013-11-06 Thread DSuiter RDX
I suspect that the reason no-one is responding with good answers is that fundamentally, it seems like what you are trying to do runs against the reason Hadoop is designed the way it is. A parallel process framework is defeated if you force it to not work concurrently... Maybe you should look into

Re: Documentation on installing Hadoop 2.2.0 on Microsoft Windows

2013-10-24 Thread DSuiter RDX
It was my understanding that HortonWorks depended on CygWin (UNIX emulation on Windows) for most of the Bigtop family of tools - Hadoop core, MapReduce, etc. - so, you will probably make all your configuration files in Windows, since XML is agnostic, and can develop in Windows, since JARs and

Re: Improving MR job disk IO

2013-10-11 Thread DSuiter RDX
So, perhaps this has been thought of, but perhaps not. It is my understanding that grep is usually sorting things one line at a time. As I am currently experimenting with Avro, I am finding that the local grep function does not handle it well at all, because it is one long line essentially, so

Re: Job initialization failed: java.lang.NullPointerException at resolveAndAddToTopology

2013-10-11 Thread DSuiter RDX
temporarly had set the permissions to 777 to see if something changes, but it didn't ... I checked only the jobtracker, are the other nodes important for this as well? thx already in advance, especially for the quick response! Wolli 2013/10/11 DSuiter RDX dsui...@rdx.com The user running

Re: State of Art in Hadoop Log aggregation

2013-10-11 Thread DSuiter RDX
Sagar, It sounds like you want a management console. We are using Cloudera Manager, but for 200 nodes you would need to license it, it is only free up to 50 nodes. The FOSS version of this is Ambari, iirc. http://incubator.apache.org/ambari/ Flume will provide a Hadoop-integrated pipeline for

Read Avro schema automatically?

2013-10-10 Thread DSuiter RDX
Hi, We are working on building a MapReduce program that takes Avro input from HDFS, gets the timestamp, and counts the number of events written in any given day. We would like to make a program that does not need to have the Avro data declared previously, rather, it would be best if it could read