I have also try to use these functionality but it did not work well for
external table.
It has many restricts for the underlying file of the table which will be
update/delete such as supporting AcidOutputFormat, is bucked etc. It
support only ORC as the file format until now and the table show
As far as I know, HDFS get image compression information from image file
when loading fsimage.
So you can correctly load fsimage file even you set different compression
codec.
I strongly recommend to do these operations with the same version and run
hdfs dfsadmin -saveNamespace to save the new
1, It means that you can not use native library for your platform which is
written by C/C++ and will performance benefit. However, it can be replaced
by buildin-java classes. This is a warning log not error one, so it doesn't
matter.
2, You can check the replicas number of this file by other ways.
- Do you see anything wrong in above configuration ?
Looks like all right.
- Where am I supposed to run this ( on name nodes, data nodes or
on every node) ?
run on all DataNodes, refresh all DataNodes to pick up the newly added
NameNode.
- I suppose the default data
You can check the response of your command.
For example, you can execute hdfs dfsadmin -report
and you will get reply like following and can ensure the space of cache
used and remaining is reasonable.
Configured Cache Capacity: 64000 (62.50 KB)
Cache Used: 4096 (4 KB)
Cache Remaining: 59904
Maybe the user 'test' has no privilege of write operation.
You can refer the ERROR log like:
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:test (auth:SIMPLE)
2014-07-15 2:07 GMT+08:00 Bogdan Raducanu lrd...@gmail.com:
I'm getting this error while writing many
Hi Chao,
As far as I know, if client B opens the file which is under construction,
the DFSInputStream will get the LocatedBlocks object and it contains a
member variable which called underConstruction to mark this file is under
construction.
If the file is reopen, the client will get a different
Did you installed Hive on your Hadoop cluster?
If yes, use Hive SQL may be simple and efficiency.
Otherwise, you can write a MapReduce program with
org.apache.hadoop.mapred.lib.MultiOuputFormat, and the output from the
Reducer can be written to more than one file.
2013/12/27 Nitin Pawar
May be you can reference Hadoop in action
2013/12/27 Sitaraman Vilayannur vrsitaramanietfli...@gmail.com
Hi,
Would much appreciate a pointer to a mapreduce tutorial which explains
how i can run a simulated cluster of mapreduce nodes on a single PC and
write a Java program with the
You can use maven to compile and package Hadoop and deploy it to one
cluster, then run it with script supplied by Hadoop.
And this tutorial for your reference
http://svn.apache.org/repos/asf/hadoop/common/trunk/BUILDING.txt
2013/12/25 Karim Awara karim.aw...@kaust.edu.sa
Hi,
I managed to
Compression is irrelevant with yarn.
If you want to store files with compression, you should compress the file
when they were load to HDFS.
The files on HDFS were compressed according to the parameter
io.compression.codecs which was set in core-site.xml.
If you want to specific a novel compression
load data to different partitions parallel is OK, because it equivalent to
write to different file on HDFS
2013/5/3 selva selvai...@gmail.com
Hi All,
I need to load a month worth of processed data into a hive table. Table
have 10 partitions. Each day have many files to load and each file is
You can reference this function, it remove excess replicas form the map.
public void removeStoredBlock(Block block, DatanodeDescriptor node)
2013/4/12 lei liu liulei...@gmail.com
I use hadoop-2.0.3. I find when on block is over-replicated, the replicas
to be add to excessReplicateMap
?
On Tue, Apr 2, 2013 at 2:14 AM, Yanbo Liang yanboha...@gmail.com wrote:
How many Reducer did you start for this job?
If you start many Reducers for this job, it will produce multiple output
file which named as part-*.
And each part is only the local mean and median value
at a time on a replication
average of 3 or 3+, and put it back in later without too much data
movement impact.
On Tue, Apr 2, 2013 at 1:06 PM, Yanbo Liang yanboha...@gmail.com
wrote:
It's reasonable to decommission 7 nodes at the same time.
But may be it also takes long time to finish
I have done similar experiment for tuning hadoop performance.
Many factors will influence the performance such as hadoop configuration,
JVM, OS.
For Linux kernel related factors, we have found two main focus of attention:
1, Every read operation of file system will trigger one disk write
protected void map(KEYIN key, VALUEIN value,
Context context) throws IOException,
InterruptedException {
context.write((KEYOUT) key, (VALUEOUT) value);
}
Context is a parameter that the execute environment will pass to the map()
function.
You can just use it in the
How many Reducer did you start for this job?
If you start many Reducers for this job, it will produce multiple output
file which named as part-*.
And each part is only the local mean and median value of the specific
Reducer partition.
Two kinds of solutions:
1, Call the method of
You set the wrong parameter NodeReducer.class which should be subclass of
Mapper rather than Reducer.
2013/4/2 YouPeng Yang yypvsxf19870...@gmail.com
HI GUYS
I want to use the the
org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
However it comes a compile error in my
It's alowable to decommission multi nodes at the same time.
Just write the all the hostnames which will be decommissioned to the
exclude file and run bin/hadoop dfsadmin -refreshNodes.
However you need to ensure the decommissioned DataNodes are minority of all
the DataNodes in the cluster and
You want to decommission how many nodes?
2013/4/2 Henry JunYoung KIM henry.jy...@gmail.com
15 for datanodes and 3 for replication factor.
2013. 4. 1., 오후 3:23, varun kumar varun@gmail.com 작성:
How many nodes do you have and replication factor for it.
You can get detail information from the Greenplum website:
http://www.greenplum.com/products/pivotal-hd
2013/3/28 oualid ait wafli oualid.aitwa...@gmail.com
Hi
Sameone know samething about EMC distribution for Big Data which itegrate
Hadoop and other tools ?
Thanks
1st when client wants to write data to HDFS, it should be create
DFSOutputStream.
Then the client write data to this output stream and this stream will
transfer data to all DataNodes with the constructed pipeline by the means
of Packet whose size is 64KB.
These two operations is concurrent, so the
You can try to add some probes to source code and recompile it.
If you want to know the keys and values you add at each step, you can add
print code to map() function of class Mapper and reduce() function of class
Reducer.
The shortcoming is that you will produce many log output which may fill the
of each datanode operation.
2013/3/28 Yanbo Liang yanboha...@gmail.com
1st when client wants to write data to HDFS, it should be create
DFSOutputStream.
Then the client write data to this output stream and this stream will
transfer data to all DataNodes with the constructed pipeline
From your description split the data in to chunks, feed the chunks to the
application, and merge the processed chunks to get A back is just suit for
the MapReduce paradigm. First you can feed the split chunks to Mapper and
merge the processed chunks at Reducer. Why did you not use MapReduce
dfs.datanode.max.xcievers value should set across the cluster rather than
particular DataNode.
It means the upper bound on the number of files that the DataNode will
serve at any one time.
2013/3/17 Dhanasekaran Anbalagan bugcy...@gmail.com
Hi Guys,
We are having few data nodes in an
These test classes are used for unit testing.
You can run these cases to test particular function of a class.
But when we run these test case, we need some additional classes and
functions to simulate some underlying function which were called by these
test cases.
InMemoryNativeFileSystemStore is
It just unit test, so you don't need to set any parameters in configuration
files.
2013/3/18 Agarwal, Nikhil nikhil.agar...@netapp.com
Hi,
** **
Thanks for the quick reply. In order to test the class
TestInMemoryNativeS3FileSystemContract and its functions what should be the
value
You must change to user dasmohap to execute this client program otherwise
you can not create file under the directory /user/dasmohap.
If you do not have a user called dasmohap at client machine, create it or
hack as these step
It means :
the minimum number of used storage capacity / total storage capacity of a
datanode;
the median number of used storage capacity / total storage capacity of a
datanode;
the maxmum number of used storage capacity / total storage capacity of a
datanode;
and the standard deviation of all
I guess may be one of them is the speculative execution.
You can check the parameter mapred.map.tasks.speculative.execution to
ensure whether it is allowed speculative execution.
You can get the precise information that whether it is speculative map task
from the tasktracker log.
2013/3/12 samir
you can try to use the new parameter dfs.namenode.name.dir to
specify the directory.
2013/2/6, Andrey V. Romanchev andrey.romanc...@gmail.com:
Hello!
I'm trying to install Hadoop 1.1.2.21 on CentOS 6.3.
I've configured dfs.name.dir in /etc/hadoop/conf/hdfs-site.xml file
As far as I know, The local.cache.size parameter controls the size of the
DistributedCache. By default, it’s set to 10 GB.
And the parameter io.sort.mb is not used here, it used as each map task has
a circular memory buffer that it writes the output to.
2012/11/16 yingnan.ma
There are two candidate:
1) You need to copy your Hadoop/HBase configuration such as
common-site.xml, hdfs-site.xml, or *hbase-site.xml *file from etc or
conf subdirectory of Hadoop/HBase installation directory into the Java
project directory. Then the configuration of Hadoop/HBase will be auto
Because you did not set defaultFS in conf, so you need to explicit indicate
the absolute path (include schema) of the file in S3 when you run a MR job.
2012/10/16 Rahul Patodi patodirahul.had...@gmail.com
I think these blog posts will answer your question:
You can use scribe or flume to collect log data and integrated with hadoop.
2012/8/4 Nguyen Manh Tien tien.nguyenm...@gmail.com
Hi,
I plan to streaming logs data HDFS using many writer, each writer write a
stream of data to a HDFS file (may rotate)
I wonder how many concurrent writer i
namenode -format is saying succesfully formatted namenode
dir S3://bucket/hadoop/namenode , when it is not even existing there!
any suggestion?
Thanks again.
On Tue, Jul 24, 2012 at 4:11 PM, Yanbo Liang yanboha...@gmail.comwrote:
I think you have made confusion about the integration
succesfully formatted namenode
dir S3://bucket/hadoop/namenode , when it is not even existing there!
any suggestion?
Thanks again.
On Tue, Jul 24, 2012 at 4:11 PM, Yanbo Liang yanboha...@gmail.com wrote:
I think you have made confusion about the integration of hadoop and S3.
1) If you set
It's available at Hadoop 2.0.
HDFS Federation supplied multiple Namespaces for the whole storage pool.
The High Availability is specific for each Namespace/NameNode, so you can
configure HA for each NameNode in the federation.
You can get some document from
I wonder why this unbalance produce?
2012/3/17 Zizon Qiu zzd...@gmail.com
if there are only dfs files under /data and /data2,it will be ok when
filled up.
unless some other files like mapreduce teme folder or even a namenode
image,it may broken the cluster when disk was filled up(as namenode
There is one member variable which called dfs in DistributedFileSystem
class,
The type of dfs is DFSClient class.
All of the file system operation in DistributedFileSystem class
is transferred to the corresponding operation of dfs.
And the dfs will communicate with NameNode server by the meaning
42 matches
Mail list logo