Since I did not get any response, I am reposting it to get the attention...
On Fri, May 27, 2011 at 7:57 PM, sudhanshu arora
wrote:
> I am writing multiple files using multiple FSOutputStreams through
> different threads in HDFS. All the files are getting written properly and I
> see that namenod
Keeping alias in the loop
-Original Message-
From: Stuti Awasthi
Sent: Monday, May 30, 2011 10:56 AM
To: 'Jain, Prem'
Subject: RE: Can't start datanode?
Hi Prem,
The datanode pid file name is "hadoop-[USERNAME]-datanode.pid" and by default
it is present at location /tmp directory.
Here
Your best bet would be to take a look at synthetic load generator.
10^8 files would be a problem for most cases because you'd need to have a
really beefy NN for that (~48GB of JVM heap and all that). The biggest I've
heard about hold something at the order of 1.15*10^8 objects (files & dirs)
and i
First, it is virtually impossible to create 100 million files in HDFS
because the name node can't hold that many.
Secondly, file creation is bottle-necked by the name node so the files that
you can create can't be created at more than about 1000 per second (and
achieving more than half that rate i
Hi all
I'm doing a test and need create lots of files ( 100 million ) in HDFS, I use a
shell script to do this , it's very very slow, how to create a lot files in
HDFS quickly?
Thanks