I ran in this problem, hard, and I can vouch that this is not a windows-only
problem. ReiserFS, ext3 and OSX's HFS+ become cripplingly slow with more
than a few hundred thousand files in the same directory. (The operation to
correct this mistake took a week to run.) That is one of several hard
les
Mark Kerzner wrote:
But it would seem then that making a balanced directory tree would not help
either - because there would be another binary search, correct? I assume,
either way it would be as fast as can be :)
But the cost of memory copies would be much less with a tree (when you
add and d
But it would seem then that making a balanced directory tree would not help
either - because there would be another binary search, correct? I assume,
either way it would be as fast as can be :)
On Fri, Jan 23, 2009 at 5:08 PM, Raghu Angadi wrote:
>
> If you are adding and deleting files in the
hi everyone,
I got a question, maybe you can help me.
- how can we get the meta data of a file on HDFS ?
For example: If I have a file with e.g. 2 GB on HDFS, this file is split
into many chunks and these chunks are distributed on many nodes. Is there
any trick to know, which chunks belong to
%Remaining is much more fluctuate than %dfs used. This is because dfs shares
the disks with mapred and mapred tasks may use a lot of disks temporally. So
trying to keep the same %free is impossible most of the time.
Hairong
On 1/19/09 10:28 PM, "Billy Pearson" wrote:
> Why do we not use the Re
Raghu Angadi wrote:
If you are adding and deleting files in the directory, you might notice
CPU penalty (for many loads, higher CPU on NN is not an issue). This is
mainly because HDFS does a binary search on files in a directory each
time it inserts a new file.
I should add that equal or ev
On Sat, Jan 24, 2009 at 10:03 AM, Mark Kerzner wrote:
> Hi,
>
> there is a performance penalty in Windows (pardon the expression) if you put
> too many files in the same directory. The OS becomes very slow, stops seeing
> them, and lies about their status to my Java requests. I do not know if this
If you are adding and deleting files in the directory, you might notice
CPU penalty (for many loads, higher CPU on NN is not an issue). This is
mainly because HDFS does a binary search on files in a directory each
time it inserts a new file.
If the directory is relatively idle, then there is
Hi,
there is a performance penalty in Windows (pardon the expression) if you put
too many files in the same directory. The OS becomes very slow, stops seeing
them, and lies about their status to my Java requests. I do not know if this
is also a problem in Linux, but in HDFS - do I need to balance
Thanks Mark. I'll be getting in touch early next week.
Others, I see replies default strait to the list. Please feel free to
email just me (christo...@cloudera.com), unless, well, you're in the
mood to share you bio with everyone :-)
Cheers,
Christophe
On Fri, Jan 23, 2009 at 2:31 PM, Mark Kerzn
Tim,
I looked there, but it is a set up manual. I read the MapReduce, Sazall, and
the MS paper on these, but I need "best practices."
Thank you,
Mark
On Fri, Jan 23, 2009 at 3:22 PM, tim robertson wrote:
> Hi,
>
> Sounds like you might want to look at the Nutch project architecture
> and then s
Christophe,
I am writing my first Hadoop project now, and I have 20 years of consulting,
and I am in Houston. Here is my resume, http://markkerzner.googlepages.com.
I have used EC2.
Sincerely,
Mark
On Fri, Jan 23, 2009 at 4:04 PM, Christophe Bisciglia <
christo...@cloudera.com> wrote:
> Hey al
Hey all, I wanted to reach out to the user / development community to
start identifying those of you who are interested in consulting /
contract work for new Hadoop deployments.
A number of our larger customers are asking for more extensive on-site
help than would normally happen under a support c
Thanks a lot for your help. I solved that problem by removing LDFLAGS
(containing libjvm.so) from hdfs_test compilation. I added that flag to compile
correctly using Makefile but that was the real problem. Only after removing it
I was able to run with ant.
Thanks,
Arifa
-Original Message-
Hi,
Sounds like you might want to look at the Nutch project architecture
and then see the Nutch on Hadoop tutorial -
http://wiki.apache.org/nutch/NutchHadoopTutorial It does web
crawling, and indexing using Lucene. It would be a good place to
start anyway for ideas, even if it doesn't end up mee
Hi, esteemed group,
how would I form Maps in MapReduce to recursevely look at every file in a
directory, and do something to this file, such as produce a PDF or compute
its hash?
For that matter, Google builds its index using MapReduce, or so the papers
say. First the crawlers store all the files.
I am looking to create some RA scripts and experiment with starting
hadoop via linux-ha cluster manager. Linux HA would handle restarting
downed nodes and eliminate the ssh key dependency.
Can you please attach your latest version of this to
https://issues.apache.org/jira/browse/HADOOP-496?
Thanks,
Doug
Boris Musykantski wrote:
we have fixed some patches in JIRA for support of webdav server on
top of HDFS, updated to work with newer version (0.18.0 IIRC) and
added support for
> It seems hdfs isn't so robust or reliable as the website says and/or I
> have a configuration issue.
quite possible. How robust does the website say it is?
I agree debuggings failures like the following is pretty hard for casual
users. You need look at the logs for block, or run 'bin/hadoop
Hi,
Since I¹ve upgraded to 0.19.0, I¹ve been getting the following exceptions
when restarting jobs, or even when a failed reducer is being restarted by
the job tracker. It appears that stale file locks in the namenode don¹t get
properly released sometimes:
org.apache.hadoop.ipc.RemoteException:
o
Yes guys. We observed such problems.
They will be common for 0.18.2 and 0.19.0 exactly as you
described it when data-nodes become unstable.
There were several issues, please take a look
HADOOP-4997 workaround for tmp file handling on DataNodes
HADOOP-4663 - links to other related
HADOOP-4810 Data
Yes you may overload your machines that way because of the small number. One
thing to do would be to look in the logs for any signs of IOExceptions and
report them back here. Another thing you can do is to change some configs.
Increase *dfs.datanode.max.xcievers* to 512 and set the
*dfs.datanode.so
It happens right after the MR job (though once or twice its happened
during). I am not using EBS, just HDFS between the machines. As for tasks,
there are 4 mappers and 0 reducers.
Richard J. Zak
-Original Message-
From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of
Jean-D
xlarge is good. Is it normally happening during a MR job? If so, how many
tasks do you have running at the same moment overall? Also, is your data
stored on EBS?
Thx,
J-D
On Fri, Jan 23, 2009 at 12:55 PM, Zak, Richard [USA] wrote:
> 4 slaves, 1 master, all are the m1.xlarge instance type.
>
>
>
4 slaves, 1 master, all are the m1.xlarge instance type.
Richard J. Zak
-Original Message-
From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of
Jean-Daniel Cryans
Sent: Friday, January 23, 2009 12:34
To: core-user@hadoop.apache.org
Subject: Re: HDFS loosing blocks or connec
Richard,
This happens when the datanodes are too slow and eventually all replicas for
a single block are tagged as "bad". What kind of instances are you using?
How many of them?
J-D
On Fri, Jan 23, 2009 at 12:13 PM, Zak, Richard [USA] wrote:
> Might there be a reason for why this seems to rou
Might there be a reason for why this seems to routinely happen to me when
using Hadoop 0.19.0 on Amazon EC2?
09/01/23 11:45:52 INFO hdfs.DFSClient: Could not obtain block
blk_-1757733438820764312_6736 from any node: java.io.IOException: No live
nodes contain current block
09/01/23 11:45:55 INFO
Hi Tom,
Thanks for your reply. That's what I wanted to know. And it's good to know that
it would not be a show stopper if our ops department would like to use their
own system to control daemons.
Regards
Matthias
> -Ursprüngliche Nachricht-
> Von: Tom White [mailto:t...@cloudera.com]
Hi, Arifa
I had to add "LD_LIBRARY_PATH" env. var. to correctly run my example.
But I have no idea if it helps, because my error wasn't a segmentation
fault. I would try it anyway.
LD_LIBRARY_PATH:/usr/JRE/jre1.6.0_11/jre1.6.0_11/lib:/usr/JRE/jre1.6.0_11/jre1.6.0_11/lib/amd64/server
(server dire
It would be nice to make this more uniform. There's an outstanding
Jira on this if anyone is interested in looking at it:
https://issues.apache.org/jira/browse/HADOOP-2914
Tom
On Fri, Jan 23, 2009 at 12:14 AM, Aaron Kimball wrote:
> Hi Bhupesh,
>
> I've noticed the same problem -- LocalJobRunner
I saw some puzzling behavior tonight when running a MapReduce program I
wrote.
It would perform the mapping just fine, and would begin to shuffle. It got
to 33% complete reduce (end of shuffle) and then the task fails, claiming
that /_temporary was deleted.
I didn't touch HDFS while this was goin
31 matches
Mail list logo