Could not find or load main class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer

2017-07-31 Thread liang
oop-2.4.1/lib/native, org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer, via, application_1501214005846_0009, container_1501214005846_0009_02_01, ip-10-0-10-234.ec2.internal, 8040, /tmp/hadoop-vitria/nm-local-dir] Please help. Regards, -Liang

Re: FSImage from uncompress to compress change

2015-06-16 Thread Yanbo Liang
As far as I know, HDFS get image compression information from image file when loading fsimage. So you can correctly load fsimage file even you set different compression codec. I strongly recommend to do these operations with the same version and run "hdfs dfsadmin -saveNamespace" to save the new co

Re: Hive update functionality for External tables

2015-06-16 Thread Yanbo Liang
I have also try to use these functionality but it did not work well for external table. It has many restricts for the underlying file of the table which will be update/delete such as supporting AcidOutputFormat, is bucked etc. It support only ORC as the file format until now and the table show also

Re: Set Replica Issue

2015-06-15 Thread Yanbo Liang
1, It means that you can not use native library for your platform which is written by C/C++ and will performance benefit. However, it can be replaced by buildin-java classes. This is a warning log not error one, so it doesn't matter. 2, You can check the replicas number of this file by other ways.

Re: Problems with the Fedarated name node configuration

2014-08-16 Thread Yanbo Liang
- Do you see anything wrong in above configuration ? Looks like all right. - Where am I supposed to run this ( on name nodes, data nodes or on every node) ? run on all DataNodes, refresh all DataNodes to pick up the newly added NameNode. - I suppose the default data n

Re: Test read caching

2014-08-15 Thread Yanbo Liang
You can check the response of your command. For example, you can execute "hdfs dfsadmin -report" and you will get reply like following and can ensure the space of cache used and remaining is reasonable. Configured Cache Capacity: 64000 (62.50 KB) Cache Used: 4096 (4 KB) Cache Remaining: 59904 (58

Re: OIV TOOL ERROR

2014-07-16 Thread Yanbo Liang
Make sure you have same *hadoop-core*.jar* and all the libraries included in the Hadoop lib directory in the classpath. It looks like can not find the class org.apache.hadoop.log.metrics.EventCounter which was configured at log4j.properties. You should check the following line at log4j.properties:

Re: Not able to place enough replicas

2014-07-14 Thread Yanbo Liang
Maybe the user 'test' has no privilege of write operation. You can refer the ERROR log like: org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:test (auth:SIMPLE) 2014-07-15 2:07 GMT+08:00 Bogdan Raducanu : > I'm getting this error while writing many files. > org.apac

Re: building hadoop 2.x from source

2013-12-27 Thread Yanbo Liang
You can use maven to compile and package Hadoop and deploy it to one cluster, then run it with script supplied by Hadoop. And this tutorial for your reference http://svn.apache.org/repos/asf/hadoop/common/trunk/BUILDING.txt 2013/12/25 Karim Awara > Hi, > > I managed to build hadoop 2.2 from sou

Re: Request for a pointer to a MapReduce Program tutorial

2013-12-27 Thread Yanbo Liang
May be you can reference <> 2013/12/27 Sitaraman Vilayannur > Hi, > Would much appreciate a pointer to a mapreduce tutorial which explains > how i can run a simulated cluster of mapreduce nodes on a single PC and > write a Java program with the MapReduce Paradigm. > Thanks very much. > Sitara

Re: Split the File using mapreduce

2013-12-27 Thread Yanbo Liang
Did you installed Hive on your Hadoop cluster? If yes, use Hive SQL may be simple and efficiency. Otherwise, you can write a MapReduce program with org.apache.hadoop.mapred.lib.MultiOuputFormat, and the output from the Reducer can be written to more than one file. 2013/12/27 Nitin Pawar > 1)if

Re: How to execute wordcount with compression?

2013-10-18 Thread Yanbo Liang
Compression is irrelevant with yarn. If you want to store files with compression, you should compress the file when they were load to HDFS. The files on HDFS were compressed according to the parameter "io.compression.codecs" which was set in core-site.xml. If you want to specific a novel compressio

Re: Parallel Load Data into Two partitions of a Hive Table

2013-05-03 Thread Yanbo Liang
load data to different partitions parallel is OK, because it equivalent to write to different file on HDFS 2013/5/3 selva > Hi All, > > I need to load a month worth of processed data into a hive table. Table > have 10 partitions. Each day have many files to load and each file is > taking two se

Re: block over-replicated

2013-04-15 Thread Yanbo Liang
You can reference this function, it remove excess replicas form the map. public void removeStoredBlock(Block block, DatanodeDescriptor node) 2013/4/12 lei liu > > I use hadoop-2.0.3. I find when on block is over-replicated, the replicas > to be add to excessReplicateMap attribute of Blockma

Re: Finding mean and median python streaming

2013-04-06 Thread Yanbo Liang
t; > > > On Tue, Apr 2, 2013 at 2:14 AM, Yanbo Liang wrote: > >> How many Reducer did you start for this job? >> If you start many Reducers for this job, it will produce multiple output >> file which named as part-*. >> And each part is only the local mean an

Re: hadoop datanode kernel build and HDFS multiplier factor

2013-04-03 Thread Yanbo Liang
I have done similar experiment for tuning hadoop performance. Many factors will influence the performance such as hadoop configuration, JVM, OS. For Linux kernel related factors, we have found two main focus of attention: 1, Every read operation of file system will trigger one disk write operation

Re: are we able to decommission multi nodes at one time?

2013-04-02 Thread Yanbo Liang
to check my understanding, just shutting down 2 of them and then 2 >>> more and then 2 more without decommissions. >>> >> >>> >> is this correct? >>> >> >>> >> >>> >> 2013. 4. 2., 오후 4:54, Harsh J 작성: >>> >> &

Re: MultipleInputs.addInputPath compile error in eclipse(indigo)

2013-04-02 Thread Yanbo Liang
You set the wrong parameter NodeReducer.class which should be subclass of Mapper rather than Reducer. 2013/4/2 YouPeng Yang > HI GUYS > I want to use the the > org.apache.hadoop.mapreduce.lib.input.MultipleInputs; > > > However it comes a compile error in my eclipse(indigo): > > public sta

Re: Finding mean and median python streaming

2013-04-02 Thread Yanbo Liang
How many Reducer did you start for this job? If you start many Reducers for this job, it will produce multiple output file which named as part-*. And each part is only the local mean and median value of the specific Reducer partition. Two kinds of solutions: 1, Call the method of setNumReduceT

Re: Provide context to map function

2013-04-02 Thread Yanbo Liang
protected void map(KEYIN key, VALUEIN value, Context context) throws IOException, InterruptedException { context.write((KEYOUT) key, (VALUEOUT) value); } Context is a parameter that the execute environment will pass to the map() function. You can just use it in the map()

Re: are we able to decommission multi nodes at one time?

2013-04-02 Thread Yanbo Liang
d that I don't need to decommission node by node. > for this case, is there no problems if I decommissioned 7 nodes at the > same time? > > > 2013. 4. 2., 오후 12:14, Azuryy Yu 작성: > > I can translate it to native English: how many nodes you want to > decommission? > >

Re: are we able to decommission multi nodes at one time?

2013-04-01 Thread Yanbo Liang
You want to decommission how many nodes? 2013/4/2 Henry JunYoung KIM > 15 for datanodes and 3 for replication factor. > > 2013. 4. 1., 오후 3:23, varun kumar 작성: > > > How many nodes do you have and replication factor for it. > >

Re: are we able to decommission multi nodes at one time?

2013-04-01 Thread Yanbo Liang
It's alowable to decommission multi nodes at the same time. Just write the all the hostnames which will be decommissioned to the exclude file and run "bin/hadoop dfsadmin -refreshNodes". However you need to ensure the decommissioned DataNodes are minority of all the DataNodes in the cluster and t

Re: DFSOutputStream.sync() method latency time

2013-03-28 Thread Yanbo Liang
nt can execute write method until the sync > method return sucess, so I think the sync method latency time should be > equal with superposition of each datanode operation. > > > > > 2013/3/28 Yanbo Liang > >> 1st when client wants to write data to HDFS, it shoul

Re: Inspect a context object and see whats in it

2013-03-28 Thread Yanbo Liang
You can try to add some probes to source code and recompile it. If you want to know the keys and values you add at each step, you can add print code to map() function of class Mapper and reduce() function of class Reducer. The shortcoming is that you will produce many log output which may fill the

Re: DFSOutputStream.sync() method latency time

2013-03-28 Thread Yanbo Liang
1st when client wants to write data to HDFS, it should be create DFSOutputStream. Then the client write data to this output stream and this stream will transfer data to all DataNodes with the constructed pipeline by the means of Packet whose size is 64KB. These two operations is concurrent, so the

Re:

2013-03-28 Thread Yanbo Liang
You can get detail information from the Greenplum website: http://www.greenplum.com/products/pivotal-hd 2013/3/28 oualid ait wafli > Hi > > Sameone know samething about EMC distribution for Big Data which itegrate > Hadoop and other tools ? > > Thanks >

Re: Any answer ? Candidate application for map reduce

2013-03-25 Thread Yanbo Liang
>From your description "split the data in to chunks, feed the chunks to the application, and merge the processed chunks to get A back" is just suit for the MapReduce paradigm. First you can feed the split chunks to Mapper and merge the processed chunks at Reducer. Why did you not use MapReduce para

Re: using test.org.apache.hadoop.fs.s3native.InMemoryNativeFileSystemStore class in hadoop

2013-03-18 Thread Yanbo Liang
It just unit test, so you don't need to set any parameters in configuration files. 2013/3/18 Agarwal, Nikhil > Hi, > > ** ** > > Thanks for the quick reply. In order to test the class > TestInMemoryNativeS3FileSystemContract and its functions what should be the > value of parameter sin m

Re: On a small cluster can we double up namenode/master with tasktrackers?

2013-03-18 Thread Yanbo Liang
I think it is inadvisable to put NameNode and Master(JobTracker) placed in the same machine, because the two one are resource intensive applications. 2013/3/18 David Parks > I want 20 servers, I got 7, so I want to make the most of the 7 I have. > Each of the 7 servers have: 24GB of ram, 4TB, an

Re: using test.org.apache.hadoop.fs.s3native.InMemoryNativeFileSystemStore class in hadoop

2013-03-18 Thread Yanbo Liang
These test classes are used for unit testing. You can run these cases to test particular function of a class. But when we run these test case, we need some additional classes and functions to simulate some underlying function which were called by these test cases. InMemoryNativeFileSystemStore is

Re: Understand dfs.datanode.max.xcievers

2013-03-18 Thread Yanbo Liang
dfs.datanode.max.xcievers value should set across the cluster rather than particular DataNode. It means the upper bound on the number of files that the DataNode will serve at any one time. 2013/3/17 Dhanasekaran Anbalagan > Hi Guys, > > We are having few data nodes in an inconsistent state. fr

Re: How to Create file in HDFS using java Client with Permission

2013-03-15 Thread Yanbo Liang
You must change to user dasmohap to execute this client program otherwise you can not create file under the directory "/user/dasmohap". If you do not have a user called dasmohap at client machine, create it or hack as these step http://stackoverflow.com/questions/11371134/how-to-specify-username-wh

Re: Why hadoop is spawing two map over file size 1.5 KB ?

2013-03-14 Thread Yanbo Liang
I guess may be one of them is the speculative execution. You can check the parameter "mapred.map.tasks.speculative.execution" to ensure whether it is allowed speculative execution. You can get the precise information that whether it is speculative map task from the tasktracker log. 2013/3/12 samir

Re: HDFS Cluster Summary DataNode usages

2013-03-14 Thread Yanbo Liang
It means : the minimum number of used storage capacity / total storage capacity of a datanode; the median number of used storage capacity / total storage capacity of a datanode; the maxmum number of used storage capacity / total storage capacity of a datanode; and the standard deviation of all thes

Re: “hadoop namenode -format” formats wrong directory

2013-02-06 Thread Yanbo Liang
you can try to use the new parameter "dfs.namenode.name.dir" to specify the directory. 2013/2/6, Andrey V. Romanchev : > Hello! > > I'm trying to install Hadoop 1.1.2.21 on CentOS 6.3. > > I've configured dfs.name.dir in /etc/hadoop/conf/hdfs-site.xml file > > dfs.name.dir > /mnt/ext/hadoop/hdfs/n

Re: Apache Hadoop and GPGPU integration

2013-02-04 Thread Yanbo Liang
http://wiki.apache.org/hadoop/CUDA%20On%20Hadoop wish it will helpful! 2013/2/4 Mohammad Tariq > Oh..Apologies for the unnecessary response. > > Warm Regards, > Tariq > https://mtariq.jux.com/ > cloudfront.blogspot.com > > > On Mon, Feb 4, 2013 at 3:04 AM, anil kumar wrote: > >> Hi, >> >> I am

Re: FSEditLog won't log the change if one file size is changed?

2013-01-18 Thread Yanbo Liang
The metadata did not include file size, so if client ask file size to the DataNode which stored the last block. 2013/1/17 Zheng Yong > If not, when the node is down, how to restore these information in > namenode?

Re: distributed cache

2012-11-16 Thread Yanbo Liang
As far as I know, The local.cache.size parameter controls the size of the DistributedCache. By default, it’s set to 10 GB. And the parameter io.sort.mb is not used here, it used as each map task has a circular memory buffer that it writes the output to. 2012/11/16 yingnan.ma > ** > > when I use

Re: Hadoop and Hbase site xml

2012-11-13 Thread Yanbo Liang
ml files. I was trying to see what > are the parameters that I need to pass to the conf object. Should I take > all the parameters in the xml file and use it in the conf file? > > > On Mon, Nov 12, 2012 at 7:17 PM, Yanbo Liang wrote: > >> There are two candidate: >>

Re: Hadoop and Hbase site xml

2012-11-12 Thread Yanbo Liang
There are two candidate: 1) You need to copy your Hadoop/HBase configuration such as common-site.xml, hdfs-site.xml, or *hbase-site.xml *file from "etc" or "conf" subdirectory of Hadoop/HBase installation directory into the Java project directory. Then the configuration of Hadoop/HBase will be auto

Re: problem using s3 instead of hdfs

2012-10-16 Thread Yanbo Liang
Because you did not set defaultFS in conf, so you need to explicit indicate the absolute path (include schema) of the file in S3 when you run a MR job. 2012/10/16 Rahul Patodi > I think these blog posts will answer your question: > > > http://www.technology-mania.com/2012/05/s3-instead-of-hdfs-w