Hello everyone,
I am doing some evaluation on my 6 nodes mini cluster. Each node has 4 core
Intel(R) Xeon(R) CPU 5130 @ 2.00GHz, 8GB memory, 500 GB disk, running Linux
version 2.6.18-164.11.1.el5(Red Hat 4.1.2-46).
I was trying to use different packet size (dfs.write.packet.size) and
bytePerChun
hello folks,
I can see from the design doc of HDFS, says: client will buffer a block size
worth of data before contacting namenode for data node info. This is a
network throughput optimal way.
However, I could not find this buffer processing procedure in source code.
In DFSClient.DataStreamer, it
network-bound. Is this the reason?
On Wed, Aug 11, 2010 at 2:55 AM, Hairong Kuang wrote:
> DataNode only buffers a packet before it contacts NameNode for allocating
> DataNodes to place the block. The doc you read might be too old.
>
> Hairong
>
>
> On 8/9/10 7:14 PM, "el
I heard some gossip about this. Is this true?
Hello,
I was benchmarking write/read of HDFS.
I changed the chunksize, i.e. bytesPerChecksum or bpc, and create a 1G file
with 128MB block size. The bpc I used: 512B, 32KB, 64KB, 256KB, 512KB, 2MB,
8MB.
The result surprised me. The performance for 512B, 32KB, 64KB are quite
similar, and then, as
to datanodes in packets. The default packet size is 64K. If the
> chunksize is bigger than 64K, the packet size automatically adjusts to
> include at least one chunk.
>
> Please set the packet size to be 8MB by configuring
> dfs.client-write-packet-size (in trunk) and rerun your experiment
I don't think data node needs much memory, as Stu suggested. There will be
requirement for memory when running a map reduce job. In that case, more
memory is better for in memory sorting (on map) and in-memory copy (on
reduce).
Namenode needs memory to hold meta data though.
On Thu, Mar 10, 201
>Caused by: java.io.IOException: Could not obtain block:
blk_-3695352030358969086_130839
file=/user/emeij/icwsm-data-test/01-26-?>SOCIAL_MEDIA.tar.gz
> at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1977)
a question for you:
Does the exception always compl
I have a HDFS with 10 nodes. Each nodes have 4 disks attached, so I assign 4
directories for hdfs in configuration:
dfs.data.dir
/data1/hdfs-data,/data2/hdfs-data,/data3/hdfs-data,/data4/hdfs-data
Now I want to remove 1 disk from each node, say /data4/hdfs-data. What I
should do to keep data i
?
Why not hadoop has this functionality?
On Mon, Apr 4, 2011 at 5:05 PM, Harsh Chouraria wrote:
> Hello Elton,
>
> On Mon, Apr 4, 2011 at 11:44 AM, elton sky wrote:
> > Now I want to remove 1 disk from each node, say /data4/hdfs-data. What I
> > should do to keep data int
back (although the simpler version still stands).
>
> On Mon, Apr 4, 2011 at 4:51 PM, elton sky wrote:
> > Thanks Harsh,
> > I will give it a go as you suggested.
> > But I feel it's not convenient in my case. Decommission is for taking
> down a
> > node. What
Hassen,
Read in hdfs is sequential, i.e. read one block after another. Each time the
client will connect to one data node to read a block. Then connect to
another (or the same) data node to read next block.
The reason for this sequential design, I guess, is avoiding n/w traffic
explosion in a heav
This is a tradition from native file system, for avoiding the waste of disk
space. In linux, each data block is 4K. A file is sliced into data blocks
and stored on disk. If the tail block has less than 4K data, the rest of
block space is wasted. So if all your files are multiple of 4K in linux, you
13 matches
Mail list logo