Hi,
For a few weeks now, we experience a rather annoying problem with a
Nutch/Hadoop installation.
It's a very simple setup: the Hadoop configuration is the default from
Nutch. The version of Hadoop is the hadoop-0.17.1 jar provided by
Nutch.
During the injection operation, we now have the
Do you mean, without scanning all the files line by line?
I know little about implementation of hadoop, but as a programmer, I can
presume that it's not possible without a complete scan.
But I can suggest a work-around:
- compute number of records manually before putting a file to HDFS.
- Append
Both DFS viewer and job submission work on eclipse v. 3.3.2.
I've given up using Ganymede, unfortunately..
2009/1/26 Aaron Kimball aa...@cloudera.com
The Eclipse plugin (which, btw, is now part of Hadoop core in src/contrib/)
currently is inoperable. The DFS viewer works, but the job
Thanks for responses,
Sorry, I made a mistake, it's actually not a db what I wanted. We need a
simple storage for files. Only get and put commands are enough (no queries
needed). We don't even need append, chmod, etc.
Probably from a thread on this list, I came across a link to a KFS-HDFS
Hi,
We have a system based on Hadoop 0.18 / Cascading 0.8.1 and now I'm
trying to port it to Hadoop 0.19 / Cascading 1.0. The first serious
problem I've got into that we're extensively using MultipleOutputs in
our jobs dealing with sequence files that store Cascading's Tuples.
Since Cascading
Pl check which nodes have these failures.
I guess the new tasktrackers/machines are not configured correctly.
As a result, the map-task will die and the remaining map-tasks will be
sucked onto these machines
-Sagar
David J. O'Dell wrote:
We've been running 0.18.2 for over a month on an 8
I am experimenting with Hadoop backed by Amazon s3 filesystem as one
of our backup storage solution. Just the hadoop and s3(block based
since it overcomes the 5gb limit) so far seems to be fine.
My problem is that i want to mount this filesystem using fuse-dfs
( since i don't have to worry
Hi Roopa,
I cant comment on the S3 specifics. However, fuse-dfs is based on a C
interface called libhdfs which allows C programs (such as fuse-dfs) to
connect to the Hadoop file system Java API. This being the case,
fuse-dfs should (theoretically) be able to connect to any file system
that
Hi David,
If your tasks are failing on only the new nodes, it's likely that you're
missing a library or something on those machines. See this Hadoop tutorial
http://public.yahoo.com/gogate/hadoop-tutorial/html/module5.html about
distributing debug scripts. These will allow you to capture
Thanks for the response craig.
I looked at fuse-dfs c code and looks like it does not like anything
other than dfs:// so with the fact that hadoop can connect to S3
file system ..allowing s3 scheme should solve my problem?
Roopa
On Jan 28, 2009, at 1:03 PM, Craig Macdonald wrote:
Hi
In theory, yes.
On inspection of libhdfs, which underlies fuse-dfs, I note that:
* libhdfs takes a host and port number as input when connecting, but
not a scheme (hdfs etc). The easiest option would be to set the S3 as
your default file system in your hadoop-site.xml, then use the host of
It was failing on all the nodes both new and old.
The problem was there were too many subdirectories under
$HADOOP_HOME/logs/userlogs
The fix was just to delete the subdirs and change this setting from 24
hours(the default) to 2 hours.
mapred.userlog.retain.hours
Would have been nice if there was
Hi,
Is there a tool that one could run on a datanode to scrub all the
blocks on that node?
Sriram
Hey Craig,
I tried the way u suggested..but i get this transport endpoint not
connected. Can i see the logs anywhere? I dont see anything in /var/
log/messages either
looks like it tries to create the file system in hdfs.c but not sure
where it fails.
I have the hadoop home set so i
Hi Roopa,
Firstly, can you get the fuse-dfs working for an instance HDFS?
There is also a debug mode for fuse: enable this by adding -d on the
command line.
C
Roopa Sudheendra wrote:
Hey Craig,
I tried the way u suggested..but i get this transport endpoint not
connected. Can i see the
Thanks, Yes a setup with fuse-dfs and hdfs works fine.I think the
mount point was bad for whatever reason and was failing with that
error .I created another mount point for mounting which resolved the
transport end point error.
Also i had -d option on my command..:)
Roopa
On Jan 28,
Hi Roopa,
Glad it worked :-)
Please file JIRA issues against the fuse-dfs / libhdfs components that
would have made it easier to mount the S3 filesystem.
Craig
Roopa Sudheendra wrote:
Thanks, Yes a setup with fuse-dfs and hdfs works fine.I think the
mount point was bad for whatever reason
Wow. How many subdirectories were there? how many jobs do you run a day?
- Aaron
On Wed, Jan 28, 2009 at 12:13 PM, David J. O'Dell dod...@videoegg.comwrote:
It was failing on all the nodes both new and old.
The problem was there were too many subdirectories under
$HADOOP_HOME/logs/userlogs
By scrub do you mean delete the blocks from the node?
Read your conf/hadoop-site.xml file to determine where dfs.data.dir points,
then for each directory in that list, just rm the directory. If you want to
ensure that your data is preserved with appropriate replication levels on
the rest of your
By scrub I mean, have a tool that reads every block on a given data
node. That way, I'd be able to find corrupted blocks proactively
rather than having an app read the file and find it.
Sriram
On Wed, Jan 28, 2009 at 5:57 PM, Aaron Kimball aa...@cloudera.com wrote:
By scrub do you mean delete
Check out fsck
bin/hadoop fsck path -files -location -blocks
Sriram Rao wrote:
By scrub I mean, have a tool that reads every block on a given data
node. That way, I'd be able to find corrupted blocks proactively
rather than having an app read the file and find it.
Sriram
On Wed, Jan 28,
I wanted to provide two additional notes about my talk on this list.
First, you're really coming to see Aaron Kimball and Tom White - I'm
working on getting that fixed on the conference pages.
Second, my talk is actually a full day of intermediate/advanced
Hadoop training on Monday. It will be
Does this read every block of every file from all replicas and verify
that the checksums are good?
Sriram
On Wed, Jan 28, 2009 at 6:20 PM, Sagar Naik sn...@attributor.com wrote:
Check out fsck
bin/hadoop fsck path -files -location -blocks
Sriram Rao wrote:
By scrub I mean, have a tool
In addition to datanode itself finding corrupted blocks (As Owen mention)
if the client finds a corrupted - block, it will go to other replica
Whts your replication factor ?
-Sagar
Sriram Rao wrote:
Does this read every block of every file from all replicas and verify
that the checksums are
I'm running Hadoop 0.19.0 on Solaris (SunOS 5.10 on x86) and many jobs are
failing with this exception:
Error initializing attempt_200901281655_0004_m_25_0:
java.io.IOException: Cannot run program chmod: error=12, Not enough space
at
The failover is fine; we are more interested in finding corrupt blocks
sooner rather than later. Since there is the thread in the datanode,
that is good.
The replication factor is 3.
Sriram
On Wed, Jan 28, 2009 at 6:45 PM, Sagar Naik sn...@attributor.com wrote:
In addition to datanode itself
Hi Hadoop Users,
I am trying to build a storage system for the office of about 20-30 users
which will store everything.
From normal everyday documents to computer configuration files to big files
(600mb) which are generated every hour.
Is Hadoop suitable for this kind of environment?
Definitely not,
You should be looking at expandable Ethernet storage that can be extended by
connecting additional SAS arrays. (like dell powervault and similar things
from other companies)
600Mb is just 6 seconds over gigabit network...
---
Dmitry Pushkarev
-Original Message-
From:
But we are looking for an open source solution.
If I do decide to implement this for the office storage, what problems will
I run into?
-Original Message-
From: Dmitry Pushkarev [mailto:u...@stanford.edu]
Sent: Thursday, 29 January 2009 5:15 PM
To: core-user@hadoop.apache.org
Cc:
Owen O'Malley wrote:
On Jan 28, 2009, at 6:16 PM, Sriram Rao wrote:
By scrub I mean, have a tool that reads every block on a given data
node. That way, I'd be able to find corrupted blocks proactively
rather than having an app read the file and find it.
The datanode already has a thread
30 matches
Mail list logo