Questions related to memory request in the YARN MapReduce

2014-07-22 Thread java8964
Hi, Our current production environment is hadoop 1.1 using MRV1, which stores different kind of data sets in AVRO format. In default case, we set map.java.opts=-Xmx1024M and reduce.java.opts=-Xmx2048M, and for some data sets, the end user can change map.java.opts=-Xmx2048M and

issue about run MR job use system user in CDH5

2014-07-22 Thread ch huang
hi,maillist: i set up CDH5 yarn cluster ,and set the following option in my mapred-site.xml file property nameyarn.app.mapreduce.am.staging-dir/name value/data/value /property mapreduce history server will set history dir in the directory /data ,but if

Re: issue about run MR job use system user in CDH5

2014-07-22 Thread Alexander Alten-Lorenz
Please post vendor specific questions to the mailinglists of the vendor: https://groups.google.com/a/cloudera.org/forum/#!forum/cdh-user Look closer at: security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException:

Re: Subscribe user hadoop user

2014-07-22 Thread Ted Yu
See http://hadoop.apache.org/mailing_lists.html#User On Jul 21, 2014, at 9:23 PM, Liu, Yi A yi.a@intel.com wrote: Regards, Yi Liu

RE: Subscribe user hadoop user

2014-07-22 Thread Liu, Yi A
Right, Thanks. Already subscribed, just sent wrong email. Regards, Yi Liu From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Tuesday, July 22, 2014 5:44 PM To: user@hadoop.apache.org Cc: user@hadoop.apache.org Subject: Re: Subscribe user hadoop user See

RE: planning a cluster

2014-07-22 Thread YIMEN YIMGA Gael
Hello, I can share a clue that i used to fix this. If you could calculate the number of nodes that you’ll need after a year, then you should make at the startup, a cluster with that number of node. ☺ Warm regards From: Devaraj K [mailto:deva...@apache.org] Sent: Tuesday 22 July 2014 16:46 To:

Re: planning a cluster

2014-07-22 Thread Devaraj K
You may need to consider these things while choosing no of nodes for your cluster. 1. Data storage: how much data you are going to store in the cluster 2. Data processing : what is the processing you are going to do in the cluster 3. Each node hardware configurations On Mon, Jul 21, 2014 at

Re: planning a cluster

2014-07-22 Thread Adaryl Bob Wakefield, MBA
Someone contacted me directly and suggested the book Hadoop Operations by Eric Sammer. Adaryl Bob Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba From: YIMEN YIMGA Gael Sent: Tuesday, July 22, 2014 9:48 AM To: user@hadoop.apache.org Subject: RE:

Bench-marking Hadoop Performance

2014-07-22 Thread Charley Newtonne
This is a new cluster I'm putting up and I need to get an idea on what to expect from a performance standpoint. Older docs point to gridmix and TestDFSIO . However, most of this doc is obsolete and no longer applies on 2.4. Where can I find benchmarking docs for 2.4? What are my options? Also, I

Re: planning a cluster

2014-07-22 Thread Chris Mawata
If you plan to use it to learn how to program for Hadoop then pseudo distributed (cluster of 1) will do. If you plan to use it to learn how to administer a cluster then 4 or 5 nodes will allow experiments with commissioning and decommissioning nodes, HA, Journaling, etc. If it is a proof of

Hadoop 2.4 test jar files.

2014-07-22 Thread Charley Newtonne
I have spent hours trying to find out how to run these jar files. The older version are documented on the web and some of the books. These, however, are not. How do I know ... - The purpose of each one of these jar files. - The class to call and what it does. - The arguments to pass.

Re: Bench-marking Hadoop Performance

2014-07-22 Thread jay vyas
There are alot of tests out there and it can be tough to determine what is a standard. - TeraGen/TearSort and testdfsio are starting points. - Various other non apache projects (such as ycsb or hibench) will have good benchmarks for certain type sof cases. -If looking for a more comprehensive

Re: Hadoop 2.4 test jar files.

2014-07-22 Thread Ted Yu
These jar files contain source code for the respective hadoop modules. You can expand the one(s) you're interested in and run tests contained in them. Cheers On Tue, Jul 22, 2014 at 9:47 AM, Charley Newtonne cnewto...@gmail.com wrote: I have spent hours trying to find out how to run these jar

Re: Hadoop 2.4 test jar files.

2014-07-22 Thread Charley Newtonne
..You can expand the one(s) you're interested in and run tests contained in them... How is that done? How do I know what these classes do and what arguments they take? On Tue, Jul 22, 2014 at 1:42 PM, Ted Yu yuzhih...@gmail.com wrote: These jar files contain source code for the respective

Configuring the Container Logs

2014-07-22 Thread Jogeshwar Karthik Akundi
Hi, I have been trying to configure the Log output of my actual job container files (appid/containerid/sysout, syslog). 1) I want to disable DEBUG in the container logs. 2) I want to redirect a specific package into another file. I tried the steps described in:

Re: Hadoop 2.4 test jar files.

2014-07-22 Thread jay vyas
FYI, the FS tests have just been overhauled and im not sure if those jars have the latest FS tests (hadoop-9361). For those tests its easy to add them by building hadoop and just adding the hadoop-common and hadoop-common-test jars as maven dependencies locally. On Tue, Jul 22, 2014 at 2:00 PM,

Skippin those gost darn 0 byte diles

2014-07-22 Thread Edward Capriolo
I have two processes. One that writes sequence files directly to hdfs, the other that is a hive table that reads these files. All works well with the exception that I am only flushing the files periodically. SequenceFile input format gets angry when it encounters 0-bytes seq files. I was

Re: Skippin those gost darn 0 byte diles

2014-07-22 Thread Bertrand Dechoux
The header is expected to have the full name of the key class and value class so if it is only detected with the first record (?) indeed the file can not respect its own format. I haven't tried it but LazyOutputFormat should solve your problem.

Re: Skippin those gost darn 0 byte diles

2014-07-22 Thread Bertrand Dechoux
I looked at the source by curiosity, for the latest version (2.4), the header is flushed during the writer creation. Of course, key/value classes are provided. By 0-bytes, you really mean even without the header? Or 0 bytes of payload? On Tue, Jul 22, 2014 at 11:05 PM, Bertrand Dechoux

Re: Skippin those gost darn 0 byte diles

2014-07-22 Thread Edward Capriolo
Currently using: dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-hdfs/artifactId version2.3.0/version /dependency I have this piece of code that does. writer = SequenceFile.createWriter(fs, conf, p, Text.class, Text.class,

Re: Skippin those gost darn 0 byte diles

2014-07-22 Thread Edward Capriolo
Here is the stack trace... Caused by: java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:267) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329) at