tools for scrubbing HDFS data nodes?

2009-01-28 Thread Sriram Rao
Hi, Is there a tool that one could run on a datanode to scrub all the blocks on that node? Sriram

Re: tools for scrubbing HDFS data nodes?

2009-01-28 Thread Aaron Kimball
By "scrub" do you mean delete the blocks from the node? Read your conf/hadoop-site.xml file to determine where dfs.data.dir points, then for each directory in that list, just rm the directory. If you want to ensure that your data is preserved with appropriate replication levels on the rest of your

Re: tools for scrubbing HDFS data nodes?

2009-01-28 Thread Sriram Rao
By "scrub" I mean, have a tool that reads every block on a given data node. That way, I'd be able to find corrupted blocks proactively rather than having an app read the file and find it. Sriram On Wed, Jan 28, 2009 at 5:57 PM, Aaron Kimball wrote: > By "scrub" do you mean delete the blocks fro

Re: tools for scrubbing HDFS data nodes?

2009-01-28 Thread Sagar Naik
Check out fsck bin/hadoop fsck -files -location -blocks Sriram Rao wrote: By "scrub" I mean, have a tool that reads every block on a given data node. That way, I'd be able to find corrupted blocks proactively rather than having an app read the file and find it. Sriram On Wed, Jan 28, 2009 a

Re: tools for scrubbing HDFS data nodes?

2009-01-28 Thread Sriram Rao
Does this read every block of every file from all replicas and verify that the checksums are good? Sriram On Wed, Jan 28, 2009 at 6:20 PM, Sagar Naik wrote: > Check out fsck > > bin/hadoop fsck -files -location -blocks > > Sriram Rao wrote: >> >> By "scrub" I mean, have a tool that reads every

Re: tools for scrubbing HDFS data nodes?

2009-01-28 Thread Owen O'Malley
On Jan 28, 2009, at 6:16 PM, Sriram Rao wrote: By "scrub" I mean, have a tool that reads every block on a given data node. That way, I'd be able to find corrupted blocks proactively rather than having an app read the file and find it. The datanode already has a thread that checks the blocks

Re: tools for scrubbing HDFS data nodes?

2009-01-28 Thread Sagar Naik
In addition to datanode itself finding corrupted blocks (As Owen mention) if the client finds a corrupted - block, it will go to other replica Whts your replication factor ? -Sagar Sriram Rao wrote: Does this read every block of every file from all replicas and verify that the checksums are g

Re: tools for scrubbing HDFS data nodes?

2009-01-28 Thread Sriram Rao
The failover is fine; we are more interested in finding corrupt blocks sooner rather than later. Since there is the thread in the datanode, that is good. The replication factor is 3. Sriram On Wed, Jan 28, 2009 at 6:45 PM, Sagar Naik wrote: > In addition to datanode itself finding corrupted bl

Re: tools for scrubbing HDFS data nodes?

2009-01-28 Thread Raghu Angadi
Owen O'Malley wrote: On Jan 28, 2009, at 6:16 PM, Sriram Rao wrote: By "scrub" I mean, have a tool that reads every block on a given data node. That way, I'd be able to find corrupted blocks proactively rather than having an app read the file and find it. The datanode already has a thread t

Re: tools for scrubbing HDFS data nodes?

2009-01-29 Thread Tom White
Each datanode has a web page at http://datanode:50075/blockScannerReport where you can see details about the scans. Tom On Thu, Jan 29, 2009 at 7:29 AM, Raghu Angadi wrote: > Owen O'Malley wrote: >> >> On Jan 28, 2009, at 6:16 PM, Sriram Rao wrote: >> >>> By "scrub" I mean, have a tool that read

Re: tools for scrubbing HDFS data nodes?

2009-01-29 Thread Steve Loughran
Sriram Rao wrote: Does this read every block of every file from all replicas and verify that the checksums are good? Sriram The DataBlockScanner thread on every datanode does this for you automatically. You can tune the rate it reads it, but it reads in all local blocks and compares the MD5