Is FileSystem thread-safe?

2013-03-31 Thread John Lilley
In other words, given an instance of org.apache.hadoop.fs.FileSystem, can multiple threads access it simultaneously? I am assuming no since it is not synchronized. Thanks, John Lilley

Re: Who splits the file into blocks

2013-03-31 Thread Sai Sai
Here is my understanding about putting a file into hdfs: A client contacts name node and gets the location of blocks where it needs to put the blocks in data nodes. But before this how does the name node know how many blocks it needs to split a file into. Who splits the file is it the client itse

Re: Using Hadoop for codec functionality

2013-03-31 Thread Bertrand Dechoux
Your question could be interpreted in another way : should I use Hadoop in order to perform massive compression/decompression using my own (eventually, proprietary) utility? So yes, Hadoop can be used to parallelize the work. But the real answer will depend on your context, like always. How many f

Re: Is FileSystem thread-safe?

2013-03-31 Thread Ted Yu
FileSystem is an abstract class, what concrete class are you using (DistributedFileSystem, etc) ? For FileSystem, I find the following for create() method: * but the implementation is thread-safe. The other option is to change the * value of umask in configuration to be 0, but it is not thr

Re: Who splits the file into blocks

2013-03-31 Thread Jens Scheidtmann
Dear Sai Sai, "Hadoop, the definitive guide" says regarding default replica placement: - first replica is placed on the same node as the client (lowest bandwidth penalty). - second replica is placed off-rack, at a random node of the other rack (avoiding busy racks). - third replicate is placed on

Re: Who splits the file into blocks

2013-03-31 Thread Rahul Bhattacharjee
I think what Sai was asking is when client asks namenode to give it a list of data nodes then how does the namenode knows as how many blocks would be required to store the entire file. I think the way it works is client requests the NN for list of blocks and then the client writes the first block

Why big block size for HDFS.

2013-03-31 Thread Rahul Bhattacharjee
Hi, Many places it has been written that to avoid huge no of disk seeks , we store big blocks in HDFS , so that once we seek to the location , then there is only data transfer rate which would be predominant , no more seeks. I am not sure if I have understood this correctly. My question is , no m

Problem with field separator in FieldSelectionHelper

2013-03-31 Thread 韦锴 Wei Kai
I found that org.apache.hadoop.mapreduce.lib.fieldsel.FieldSelectionHelper and the corresponding old api org.apache.hadoop.mapred.lib.FieldSelectionMapReduce take user specified separator string as a regular expression in String.split(), but also use it as a normal string in StringBuffer.append().

Re: Who splits the file into blocks

2013-03-31 Thread Harsh J
The client splits the file. The block size attribute is a per-file one, sent along with the file creation request. Blocks are requested as-it-goes, not pre-allocated. On Sun, Mar 31, 2013 at 2:15 PM, Sai Sai wrote: > Here is my understanding about putting a file into hdfs: > A client contacts nam

Re: Streaming value of (200MB) from a SequenceFile

2013-03-31 Thread Sandy Ryza
Hi Jerry, I assume you're providing your own Writable implementation? The Writable readFields method is given a stream. Are you able to perform you able to perform your processing while reading the it there? -Sandy On Sat, Mar 30, 2013 at 10:52 AM, Jerry Lam wrote: > Hi everyone, > > I'm havi

RE: Is FileSystem thread-safe?

2013-03-31 Thread John Lilley
From: Ted Yu [mailto:yuzhih...@gmail.com] Subject: Re: Is FileSystem thread-safe? >>FileSystem is an abstract class, what concrete class are you using >>(DistributedFileSystem, etc) ? Good point. I am calling FileSystem.get(URI uri, Configuration conf) with an URI like "hdfs://server:port/..." o

Re: How to configure mapreduce archive size?

2013-03-31 Thread Ted Yu
This question is more related to mapreduce. I put user@hbase in Bcc. Cheers On Sun, Mar 31, 2013 at 11:15 AM, tojaneyang wrote: > Hi Ted, > > Do you have any suggestions for this? > > I am using hadoop which is packaged within hbase -0.94.1. It is hadoop > 1.0.3. > > Thanks, > > Xia > > > > -

Re: Streaming value of (200MB) from a SequenceFile

2013-03-31 Thread Jerry Lam
Hi Sandy: Thank you for the advice. It sounds a logical way to resolve this issue. I will look into the writable interface and see how I can stream the value from HDFS in a MapFileInputFormat. I'm a bit concern when no one discussed about this issue because it might mean that I'm not using hdf

Re: Is FileSystem thread-safe?

2013-03-31 Thread Ted Yu
If you look at DistributedFileSystem source code, you would see that it calls the DFSClient field member for most of the actions. Requests to Namenode are then made through ClientProtocol. An hdfs committer would be able to give you affirmative answer. On Sun, Mar 31, 2013 at 11:27 AM, John Lill

RE: Why big block size for HDFS.

2013-03-31 Thread John Lilley
From: Rahul Bhattacharjee [mailto:rahul.rec@gmail.com] Subject: Why big block size for HDFS. >Many places it has been written that to avoid huge no of disk seeks , we store >big blocks in HDFS , so that once we seek to the location , then there is only >data transfer rate which would be pre

Re: Why big block size for HDFS.

2013-03-31 Thread Azuryy Yu
When you seek to a position within a HDFS file, you are not seek from the start of the first block and then one by one. Actually DFSClient can skip some blocks until find one block, which offset and block length includes your seek position. On Mon, Apr 1, 2013 at 12:55 AM, Rahul Bhattacharjee

ping

2013-03-31 Thread 韦锴 Wei Kai

Re: Why big block size for HDFS.

2013-03-31 Thread Rahul Bhattacharjee
Thanks a lot John , Azurya. I guessed about the optimization of HDD. Then it might be good to defrag the underlying disk during general maintenance downtime. Thanks, Rahul On Mon, Apr 1, 2013 at 12:28 AM, John Lilley wrote: > ** ** > > *From:* Rahul Bhattacharjee [mailto:rahul.rec@gmail

Re: Streaming value of (200MB) from a SequenceFile

2013-03-31 Thread Rahul Bhattacharjee
Hi Sandy, I am also new to Hadoop and have a question here. The writable does have a DataInput stream so that the objects can be constructed from the byte stream. Are you suggesting to save the stream for later use ,but late we cannot ascertain the state of the stream. For a large value , I think

Re: Streaming value of (200MB) from a SequenceFile

2013-03-31 Thread Rahul Bhattacharjee
Sorry for the multiple replies. There is one more thing that can be done (I guess) for streaming the values rather then constructing the whole object itself.We can store the value in hdfs as file and have the location as value of the mapper.Mapper can open a stream using the location specified. N

are we able to decommission multi nodes at one time?

2013-03-31 Thread Henry JunYoung KIM
hi, hadoop users. to decommission a single node, there is necessary to wait to remove a node from a cluster for awhile. (in my case, 20 ~ 30 minutes) for the safety, I am decommissioning a node at a time. for the performance, am I able to remove multi nodes at same time?

Re: Streaming value of (200MB) from a SequenceFile

2013-03-31 Thread Sandy Ryza
Hi Rahul, I don't think saving the stream for later use would work - I was just suggesting that if only some aggregate statistics needed to be calculated, they could be calculated at read time instead of in the mapper. Nothing requires a Writable to contain all the data that it reads. That's a g

Secondary Name Node Issue CDH4.1.2

2013-03-31 Thread samir.helpdoc
Hi All, Could you please share me , how to start/configure SSN in Different Physical Machine. Currently it is start on name node machine I want to start SSN in Different machine. Regards, samir.

Re: Secondary Name Node Issue CDH4.1.2

2013-03-31 Thread Kai Wei
You can add the SSN host name to conf/masters. On Mon, Apr 1, 2013 at 2:10 PM, samir.helpdoc wrote: > Hi All, > Could you please share me , how to start/configure SSN in Different > Physical Machine. Currently it is start on name node machine I want to > start SSN in Different machine. > > Reg

Re: are we able to decommission multi nodes at one time?

2013-03-31 Thread varun kumar
How many nodes do you have and replication factor for it.

Word count on cluster configuration

2013-03-31 Thread Varsha Raveendran
Hello! I did the setup for a cluster configuration of Hadoop. After running the word count example the output shown in the part-r-0 file is as shown : hduser@MT2012158:/usr/local/hadoop$ head /tmp/gutenberg-output/gutenberg-output 40 2 4 ��� � � � �@��2 ��� � � � �@�@��1 �

Re: Streaming value of (200MB) from a SequenceFile

2013-03-31 Thread Rahul Bhattacharjee
Thanks Sandy for the excellent explanation. Didn't think about the lose of data-locality. Regards, Rahul On Mon, Apr 1, 2013 at 11:29 AM, Sandy Ryza wrote: > Hi Rahul, > > I don't think saving the stream for later use would work - I was just > suggesting that if only some aggregate statistics

Re: Word count on cluster configuration

2013-03-31 Thread Wenming Ye
because many of the “words” are unicode, check the next blog. http://blogs.msdn.com/b/hpctrekker/archive/2013/04/01/make-another-small-step-with-the-javascript-console-pig-in-hdinsight.aspx From: Varsha Raveendran Sent: Sunday, March 31, 2013 11:43 PM To: user@hadoop.apache.org Subject: Word co