Hi,
Our current production environment is hadoop 1.1 using MRV1, which stores
different kind of data sets in AVRO format.
In default case, we set map.java.opts=-Xmx1024M and reduce.java.opts=-Xmx2048M,
and for some data sets, the end user can change map.java.opts=-Xmx2048M and
hi,maillist:
i set up CDH5 yarn cluster ,and set the following option in my
mapred-site.xml file
property
nameyarn.app.mapreduce.am.staging-dir/name
value/data/value
/property
mapreduce history server will set history dir in the directory /data ,but
if
Please post vendor specific questions to the mailinglists of the vendor:
https://groups.google.com/a/cloudera.org/forum/#!forum/cdh-user
Look closer at:
security.UserGroupInformation: PriviledgedActionException as:hbase
(auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException:
See http://hadoop.apache.org/mailing_lists.html#User
On Jul 21, 2014, at 9:23 PM, Liu, Yi A yi.a@intel.com wrote:
Regards,
Yi Liu
Right, Thanks. Already subscribed, just sent wrong email.
Regards,
Yi Liu
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Tuesday, July 22, 2014 5:44 PM
To: user@hadoop.apache.org
Cc: user@hadoop.apache.org
Subject: Re: Subscribe user hadoop user
See
Hello,
I can share a clue that i used to fix this.
If you could calculate the number of nodes that you’ll need after a year, then
you should make at the startup, a cluster with that number of node. ☺
Warm regards
From: Devaraj K [mailto:deva...@apache.org]
Sent: Tuesday 22 July 2014 16:46
To:
You may need to consider these things while choosing no of nodes for your
cluster.
1. Data storage: how much data you are going to store in the cluster
2. Data processing : what is the processing you are going to do in the
cluster
3. Each node hardware configurations
On Mon, Jul 21, 2014 at
Someone contacted me directly and suggested the book Hadoop Operations by Eric
Sammer.
Adaryl Bob Wakefield, MBA
Principal
Mass Street Analytics
913.938.6685
www.linkedin.com/in/bobwakefieldmba
From: YIMEN YIMGA Gael
Sent: Tuesday, July 22, 2014 9:48 AM
To: user@hadoop.apache.org
Subject: RE:
This is a new cluster I'm putting up and I need to get an idea on what to
expect from a performance standpoint.
Older docs point to gridmix and TestDFSIO . However, most of this doc is
obsolete and no longer applies on 2.4.
Where can I find benchmarking docs for 2.4? What are my options?
Also, I
If you plan to use it to learn how to program for Hadoop then pseudo
distributed (cluster of 1) will do. If you plan to use it to learn how to
administer a cluster then 4 or 5 nodes will allow experiments with
commissioning and decommissioning nodes, HA, Journaling, etc. If it is a
proof of
I have spent hours trying to find out how to run these jar files. The older
version are documented on the web and some of the books. These, however,
are not.
How do I know ...
- The purpose of each one of these jar files.
- The class to call and what it does.
- The arguments to pass.
There are alot of tests out there and it can be tough to determine what is
a standard.
- TeraGen/TearSort and testdfsio are starting points.
- Various other non apache projects (such as ycsb or hibench) will have
good benchmarks for certain type sof cases.
-If looking for a more comprehensive
These jar files contain source code for the respective hadoop modules.
You can expand the one(s) you're interested in and run tests contained in
them.
Cheers
On Tue, Jul 22, 2014 at 9:47 AM, Charley Newtonne cnewto...@gmail.com
wrote:
I have spent hours trying to find out how to run these jar
..You can expand the one(s) you're interested in and run tests contained
in them...
How is that done? How do I know what these classes do and what arguments
they take?
On Tue, Jul 22, 2014 at 1:42 PM, Ted Yu yuzhih...@gmail.com wrote:
These jar files contain source code for the respective
Hi,
I have been trying to configure the Log output of my actual job container
files (appid/containerid/sysout, syslog).
1) I want to disable DEBUG in the container logs.
2) I want to redirect a specific package into another file.
I tried the steps described in:
FYI, the FS tests have just been overhauled and im not sure if those jars
have the latest FS tests (hadoop-9361). For those tests its easy to add
them by building hadoop and just adding the hadoop-common and
hadoop-common-test jars as maven dependencies locally.
On Tue, Jul 22, 2014 at 2:00 PM,
I have two processes. One that writes sequence files directly to hdfs, the
other that is a hive table that reads these files.
All works well with the exception that I am only flushing the files
periodically. SequenceFile input format gets angry when it encounters
0-bytes seq files.
I was
The header is expected to have the full name of the key class and value
class so if it is only detected with the first record (?) indeed the file
can not respect its own format.
I haven't tried it but LazyOutputFormat should solve your problem.
I looked at the source by curiosity, for the latest version (2.4), the
header is flushed during the writer creation. Of course, key/value classes
are provided. By 0-bytes, you really mean even without the header? Or 0
bytes of payload?
On Tue, Jul 22, 2014 at 11:05 PM, Bertrand Dechoux
Currently using:
dependency
groupIdorg.apache.hadoop/groupId
artifactIdhadoop-hdfs/artifactId
version2.3.0/version
/dependency
I have this piece of code that does.
writer = SequenceFile.createWriter(fs, conf, p, Text.class, Text.class,
Here is the stack trace...
Caused by: java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:267)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
at
21 matches
Mail list logo