In other words, given an instance of org.apache.hadoop.fs.FileSystem, can
multiple threads access it simultaneously? I am assuming no since it is not
synchronized.
Thanks,
John Lilley
Here is my understanding about putting a file into hdfs:
A client contacts name node and gets the location of blocks where it needs to
put the blocks in data nodes.
But before this how does the name node know how many blocks it needs to split a
file into.
Who splits the file is it the client itse
Your question could be interpreted in another way : should I use Hadoop in
order to perform massive compression/decompression using my own
(eventually, proprietary) utility?
So yes, Hadoop can be used to parallelize the work. But the real answer
will depend on your context, like always.
How many f
FileSystem is an abstract class, what concrete class are you using
(DistributedFileSystem, etc) ? For FileSystem, I find the following for
create() method:
* but the implementation is thread-safe. The other option is to change
the
* value of umask in configuration to be 0, but it is not thr
Dear Sai Sai,
"Hadoop, the definitive guide" says regarding default replica placement:
- first replica is placed on the same node as the client (lowest bandwidth
penalty).
- second replica is placed off-rack, at a random node of the other rack
(avoiding busy racks).
- third replicate is placed on
I think what Sai was asking is when client asks namenode to give it a list
of data nodes then how does the namenode knows as how many blocks would be
required to store the entire file.
I think the way it works is client requests the NN for list of blocks and
then the client writes the first block
Hi,
Many places it has been written that to avoid huge no of disk seeks , we
store big blocks in HDFS , so that once we seek to the location , then
there is only data transfer rate which would be predominant , no more
seeks. I am not sure if I have understood this correctly.
My question is , no m
I found that org.apache.hadoop.mapreduce.lib.fieldsel.FieldSelectionHelper
and the corresponding old
api org.apache.hadoop.mapred.lib.FieldSelectionMapReduce take user
specified separator string as a regular expression in String.split(), but
also use it as a normal string in StringBuffer.append().
The client splits the file. The block size attribute is a per-file
one, sent along with the file creation request. Blocks are requested
as-it-goes, not pre-allocated.
On Sun, Mar 31, 2013 at 2:15 PM, Sai Sai wrote:
> Here is my understanding about putting a file into hdfs:
> A client contacts nam
Hi Jerry,
I assume you're providing your own Writable implementation? The Writable
readFields method is given a stream. Are you able to perform you able to
perform your processing while reading the it there?
-Sandy
On Sat, Mar 30, 2013 at 10:52 AM, Jerry Lam wrote:
> Hi everyone,
>
> I'm havi
From: Ted Yu [mailto:yuzhih...@gmail.com]
Subject: Re: Is FileSystem thread-safe?
>>FileSystem is an abstract class, what concrete class are you using
>>(DistributedFileSystem, etc) ?
Good point. I am calling FileSystem.get(URI uri, Configuration conf) with an
URI like "hdfs://server:port/..." o
This question is more related to mapreduce.
I put user@hbase in Bcc.
Cheers
On Sun, Mar 31, 2013 at 11:15 AM, tojaneyang wrote:
> Hi Ted,
>
> Do you have any suggestions for this?
>
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
> 1.0.3.
>
> Thanks,
>
> Xia
>
>
>
> -
Hi Sandy:
Thank you for the advice. It sounds a logical way to resolve this issue. I will
look into the writable interface and see how I can stream the value from HDFS
in a MapFileInputFormat.
I'm a bit concern when no one discussed about this issue because it might mean
that I'm not using hdf
If you look at DistributedFileSystem source code, you would see that it
calls the DFSClient field member for most of the actions.
Requests to Namenode are then made through ClientProtocol.
An hdfs committer would be able to give you affirmative answer.
On Sun, Mar 31, 2013 at 11:27 AM, John Lill
From: Rahul Bhattacharjee [mailto:rahul.rec@gmail.com]
Subject: Why big block size for HDFS.
>Many places it has been written that to avoid huge no of disk seeks , we store
>big blocks in HDFS , so that once we seek to the location , then there is only
>data transfer rate which would be pre
When you seek to a position within a HDFS file, you are not seek from the
start of the first block and then one by one.
Actually DFSClient can skip some blocks until find one block, which offset
and block length includes your seek position.
On Mon, Apr 1, 2013 at 12:55 AM, Rahul Bhattacharjee
Thanks a lot John , Azurya.
I guessed about the optimization of HDD. Then it might be good to defrag
the underlying disk during general maintenance downtime.
Thanks,
Rahul
On Mon, Apr 1, 2013 at 12:28 AM, John Lilley wrote:
> ** **
>
> *From:* Rahul Bhattacharjee [mailto:rahul.rec@gmail
Hi Sandy,
I am also new to Hadoop and have a question here.
The writable does have a DataInput stream so that the objects can be
constructed from the byte stream.
Are you suggesting to save the stream for later use ,but late we cannot
ascertain the state of the stream.
For a large value , I think
Sorry for the multiple replies.
There is one more thing that can be done (I guess) for streaming the values
rather then constructing the whole object itself.We can store the value in
hdfs as file and have the location as value of the mapper.Mapper can open a
stream using the location specified.
N
hi, hadoop users.
to decommission a single node, there is necessary to wait to remove a node from
a cluster for awhile. (in my case, 20 ~ 30 minutes)
for the safety, I am decommissioning a node at a time.
for the performance, am I able to remove multi nodes at same time?
Hi Rahul,
I don't think saving the stream for later use would work - I was just
suggesting that if only some aggregate statistics needed to be calculated,
they could be calculated at read time instead of in the mapper. Nothing
requires a Writable to contain all the data that it reads.
That's a g
Hi All,
Could you please share me , how to start/configure SSN in Different
Physical Machine. Currently it is start on name node machine I want to
start SSN in Different machine.
Regards,
samir.
You can add the SSN host name to conf/masters.
On Mon, Apr 1, 2013 at 2:10 PM, samir.helpdoc wrote:
> Hi All,
> Could you please share me , how to start/configure SSN in Different
> Physical Machine. Currently it is start on name node machine I want to
> start SSN in Different machine.
>
> Reg
How many nodes do you have and replication factor for it.
Hello!
I did the setup for a cluster configuration of Hadoop. After running the
word count example the output shown in the part-r-0 file is as shown :
hduser@MT2012158:/usr/local/hadoop$ head
/tmp/gutenberg-output/gutenberg-output
40
2
4
��� � � � �@��2
��� � � � �@�@��1
�
Thanks Sandy for the excellent explanation. Didn't think about the lose of
data-locality.
Regards,
Rahul
On Mon, Apr 1, 2013 at 11:29 AM, Sandy Ryza wrote:
> Hi Rahul,
>
> I don't think saving the stream for later use would work - I was just
> suggesting that if only some aggregate statistics
because many of the “words” are unicode, check the next blog.
http://blogs.msdn.com/b/hpctrekker/archive/2013/04/01/make-another-small-step-with-the-javascript-console-pig-in-hdinsight.aspx
From: Varsha Raveendran
Sent: Sunday, March 31, 2013 11:43 PM
To: user@hadoop.apache.org
Subject: Word co
28 matches
Mail list logo