dfs.block.size vs avg block size

2008-05-16 Thread Otis Gospodnetic
Hello, I checked the ML archives and the Wiki, as well as the HDFS user guide, but could not find information about how to change block size of an existing HDFS. After running fsck I can see that my avg. block size is 12706144 B (cca 12MB), and that's a lot smaller than what I have configured:

Re: java.io.IOException: Could not obtain block / java.io.IOException: Could not get block locations

2008-05-16 Thread André Martin
Hi dhruba, we are running the latest Sun Java 6u10-beta, and the namenode runs with 25 threads on a quad core machine. Cu on the 'net, Bye - bye, < André èrbnA > Dhruba Borthakur wrote: What version of java are you using

Re: Making the case for Hadoop

2008-05-16 Thread Robert Krüger
thanks but I read the list before I posted. I was hoping for examples a bit closer to what we're planning to use it for, i.e. as the storage for media assets. Most people seem to use it rather for large amounts of not-so-critical or processing of temporary data. lohit wrote: You could also

Re: Mirroring data to a non-Hadoop FS

2008-05-16 Thread Robert Krüger
The reasoning was that in the event of system-inherent failures (i.e. bugs in HDFS which corrupt the files) a system set up with a completely different technology would protect from that type of failure would prevent it from becoming catastrophic. Sounds (and probably in our case is) a bit pa

Re: About HDFS`s Certification and Authorization?

2008-05-16 Thread Hairong Kuang
Release 0.15 does not have any permission/security control. Release 0.16 supports permission control. An initial design of user authentication is coming soon. A jira issue regarding this will open in the next couple of weeks. Please contribute if you have any ideas. Hairong On 5/16/08 1:32 AM, "

Re: Mirroring data to a non-Hadoop FS

2008-05-16 Thread Jim R. Wilson
There was some chatter on the Hbase list about a dual hdfs/s3 driver class which would write to both but only read from hdfs. Of course, having this functionality at the hadoop level would be better than in a subsidiary project. Maybe the ability to specify a secondary filesystem in the hadoop-si

Re: Making the case for Hadoop

2008-05-16 Thread lohit
You could also find some info about companies/projects using Hadoop at PoweredBy page http://wiki.apache.org/hadoop/PoweredBy Thanks, Lohit - Original Message From: Ted Dunning <[EMAIL PROTECTED]> To: core-user@hadoop.apache.org Sent: Friday, May 16, 2008 10:02:25 AM Subject: Re: Making

Re: java.io.IOException: Could not obtain block / java.io.IOException: Could not get block locations

2008-05-16 Thread Dhruba Borthakur
What version of java are you using? How may threads are you running on the namenode? How many cores does your machines have? thanks, dhruba On Fri, May 16, 2008 at 6:02 AM, André Martin <[EMAIL PROTECTED]> wrote: > Hi Hadoopers, > we are experiencing a lot of "Could not obtain block / Could not g

Re: Mirroring data to a non-Hadoop FS

2008-05-16 Thread Ted Dunning
Why not go to the next step and use a second cluster as the backup? On 5/16/08 6:33 AM, "Robert Krüger" <[EMAIL PROTECTED]> wrote: > > Hi, > > what are the options to keep a copy of data from an HDFS instance in > sync with a backup file system which is not HDFS? Are there Rsync-like > tools

Re: Making the case for Hadoop

2008-05-16 Thread Ted Dunning
Nothing is the best case! On 5/16/08 7:00 AM, "Edward Capriolo" <[EMAIL PROTECTED]> wrote: > So hadoop is a fact. My advice for convincing IT executives. Ask them > to present their alternative. (usually its nothing)

Re: How do people keep their client configurations in sync with the remote cluster(s)

2008-05-16 Thread Ted Dunning
That is all that almost all of my arms-length clients need. With 18, all clients should be able to ask for the default configuration if they have a root URL which will make the amount of information needed for any and all clients very small. On 5/16/08 2:03 AM, "Steve Loughran" <[EMAIL PROTECTE

Re: How do people keep their client configurations in sync with the remote cluster(s)

2008-05-16 Thread Ted Dunning
I would think that this would cover about 80% of the newbie problem reports. It would be especially good if it included ssh'ed commands run on the slaves that look back at the namenode and job tracker. Forward and reverse name lookups are also important. This is worth a Jira. On 5/16/08 2:03

Re: Making the case for Hadoop

2008-05-16 Thread Ted Dunning
Here at Veoh, we have committed to this style of file system in a very big way. We currently have around a billion files that we manage using replicated file storage. We didn't go with HDFS for this, but the reasons probably do not apply in your case. In our case, we have lots (as in LOTS) of f

Re: HDFS corrupt...how to proceed?

2008-05-16 Thread Michael Di Domenico
I second that request... I use DRDB for another project where I work and definitely see it's benefits, but I haven't tried it with hadoop yet. Thanks On Tue, May 13, 2008 at 11:17 AM, Otis Gospodnetic < [EMAIL PROTECTED]> wrote: > Hi, > > I'd love to see the DRBD+Hadoop write up! Not only woul

Re: Making the case for Hadoop

2008-05-16 Thread Edward Capriolo
Conservative IT executiveSounds like your working at my last job. :) Yahoo uses hadoop. For a very large cluster. http://developer.yahoo.com/blogs/hadoop/ And afterall hadoop is a work alike of the Google File System, google uses that for all types of satelite data, The new york times is usi

Mirroring data to a non-Hadoop FS

2008-05-16 Thread Robert Krüger
Hi, what are the options to keep a copy of data from an HDFS instance in sync with a backup file system which is not HDFS? Are there Rsync-like tools that allow only to transfer deltas or would one have to implement that oneself (e.g. by writing a java program that accesses both filesystems)

java.io.IOException: Could not obtain block / java.io.IOException: Could not get block locations

2008-05-16 Thread André Martin
Hi Hadoopers, we are experiencing a lot of "Could not obtain block / Could not get block locations IOExceptions" when processing a 400 GB large Map/Red job using our 6 nodes DFS & MapRed (v. 0.16.4) cluster. Each node is equipped with a 400GB Sata HDD and running Suse Linux Enterprise Edition.

Re: How do people keep their client configurations in sync with the remote cluster(s)

2008-05-16 Thread Steve Loughran
Ted Dunning wrote: I use several strategies: A) avoid dependency on hadoop's configuration by using http access to files. I use this, for example, where we have a PHP or grails or oracle app that needs to read a data file or three from HDFS. B) rsync early and often and lock down the config dir

About HDFS`s Certification and Authorization?

2008-05-16 Thread wangxiaowei
hi,all I now use hadoop-0.15.3.Does it`s HDFS have the functionality of certification and authorization? So that one user can just access one part of HDFS,and cann`t access other parts without permitting?If it does ,how can I implement it? Thanks a lot.

Making the case for Hadoop

2008-05-16 Thread Robert Krüger
Hi, I'm currently trying to make the case for using Hadoop (or more precisely HDFS) as part of a storage architecture for a large media asset repository. HDFS will be used for storing up to total of 1 PB of high-resolution video (average file size will be > 1GB). We think HDFS is a perfect m