Re: HDFS - millions of files in one directory?

2009-01-27 Thread Sagar Naik
System with: 1 billion small files. Namenode will need to maintain the data-structure for all those files. System will have atleast 1 block per file. And if u have replication factor set to 3, the system will have 3 billion blocks. Now , if you try to read all these files in a job , you will be

Re: decommissioned node showing up ad dead node in web based interface to namenode (dfshealth.jsp)

2009-01-27 Thread paul
Once the nodes are listed as dead, if you still have the host names in your conf/exclude file, remove the entries and then run hadoop dfsadmin -refreshNodes. This works for us on our cluster. -paul On Tue, Jan 27, 2009 at 5:08 PM, Bill Au wrote: > I was able to decommission a datanode succ

decommissioned node showing up ad dead node in web based interface to namenode (dfshealth.jsp)

2009-01-27 Thread Bill Au
I was able to decommission a datanode successfully without having to stop my cluster. But I noticed that after a node has been decommissioned, it shows up as a dead node in the web base interface to the namenode (ie dfshealth.jsp). My cluster is relatively small and losing a datanode will have pe

Re: DBOutputFormat and auto-generated keys

2009-01-27 Thread Kevin Peterson
On Mon, Jan 26, 2009 at 5:40 PM, Vadim Zaliva wrote: > Is it possible to obtain auto-generated IDs when writing data using > DBOutputFormat? > > For example, is it possible to write Mapper which stores records in DB > and returns auto-generated > IDs of these records? ... > which I would like t

Re: Using HDFS for common purpose

2009-01-27 Thread Jim Twensky
You may also want to have a look at this to reach a decision based on your needs: http://www.swaroopch.com/notes/Distributed_Storage_Systems Jim On Tue, Jan 27, 2009 at 1:22 PM, Jim Twensky wrote: > Rasit, > > What kind of data will you be storing on Hbase or directly on HDFS? Do you > aim to

Re: Using HDFS for common purpose

2009-01-27 Thread Jim Twensky
Rasit, What kind of data will you be storing on Hbase or directly on HDFS? Do you aim to use it as a data source to do some key/value lookups for small strings/numbers or do you want to store larger files labeled with some sort of a key and retrieve them during a map reduce run? Jim On Tue, Jan

Re: files are inaccessible after HDFS upgrade from 0.18.1 to 1.19.0

2009-01-27 Thread Yuanyuan Tian
Yes, I did run fsck after upgrade. No error message. Everything is "OK". yy Brian Bockelman To

Re: files are inaccessible after HDFS upgrade from 0.18.1 to 1.19.0

2009-01-27 Thread Yuanyuan Tian
Yes, I did that. But there is some error message that asks me to rollback first. So, I ended up a -rollback first and then and -upgrade. yy Bill Au

RE: Using HDFS for common purpose

2009-01-27 Thread Jonathan Gray
Perhaps what you are looking for is HBase? http://hbase.org HBase is a column-oriented, distributed store that sits on top of HDFS and provides random access. JG > -Original Message- > From: Rasit OZDAS [mailto:rasitoz...@gmail.com] > Sent: Tuesday, January 27, 2009 1:20 AM > To: core-

Re: HDFS - millions of files in one directory?

2009-01-27 Thread Philip (flip) Kromer
Tossing one more on this king of all threads: Stuart Sierra of AltLaw wrote a nice little tool to serialize tar.bz2 files into SequenceFile, with filename as key and its contents a BLOCK-compressed blob. http://stuartsierra.com/2008/04/24/a-million-little-files flip On Mon, Jan 26, 2009 at 3:2

Re: files are inaccessible after HDFS upgrade from 0.18.1 to 1.19.0

2009-01-27 Thread Brian Bockelman
Hey YY, At a more basic level -- have you run fsck on that file? What were the results? Brian On Jan 27, 2009, at 10:54 AM, Bill Au wrote: Did you start your namenode with the -upgrade after upgrading from 0.18.1 to 0.19.0? Bill On Mon, Jan 26, 2009 at 8:18 PM, Yuanyuan Tian wrote:

Re: files are inaccessible after HDFS upgrade from 0.18.1 to 1.19.0

2009-01-27 Thread Bill Au
Did you start your namenode with the -upgrade after upgrading from 0.18.1 to 0.19.0? Bill On Mon, Jan 26, 2009 at 8:18 PM, Yuanyuan Tian wrote: > > > Hi, > > I just upgraded hadoop from 0.18.1 to 0.19.0 following the instructions on > http://wiki.apache.org/hadoop/Hadoop_Upgrade. After upgrade,

Funding opportunities for Academic Research on/near Hadoop

2009-01-27 Thread Steve Loughran
This is a little note to advise universities working on Hadoop related projects, that they may be able to get some money and cluster time for some fun things http://www.hpl.hp.com/open_innovation/irp/ "The HP Labs Innovation Research Program is designed to create opportunities -- at college

[ANNOUNCE] Registration for ApacheCon Europe 2009 is now open!

2009-01-27 Thread Owen O'Malley
All, I'm broadcasting this to all of the Hadoop dev and users lists, however, in the future I'll only send cross-subproject announcements to gene...@hadoop.apache.org. Please subscribe over there too! It is very low traffic. Anyways, ApacheCon Europe is coming up in March. There are a r

Number of records in a MapFile

2009-01-27 Thread Andy Liu
Is there a way to programatically get the number of records in a MapFile without doing a complete scan?

Re: Where are the meta data on HDFS ?

2009-01-27 Thread Rasit OZDAS
Hi Tien, Configuration config = new Configuration(true); config.addResource(new Path("/etc/hadoop-0.19.0/conf/hadoop-site.xml")); FileSystem fileSys = FileSystem.get(config); BlockLocation[] locations = fileSys.getFileBlockLocations(. I copied some lines of my code, it can also help if you p

Re: Zeroconf for hadoop

2009-01-27 Thread Steve Loughran
Edward Capriolo wrote: Zeroconf is more focused on simplicity then security. One of the original problems that may have been fixes is that any program can announce any service. IE my laptop can announce that it is the DNS for google.com etc. -1 to zeroconf as it is way too chatty. Every DNS lo

Re: Interrupting JobClient.runJob

2009-01-27 Thread Amareshwari Sriramadasu
Edwin wrote: Hi I am looking for a way to interrupt a thread that entered JobClient.runJob(). The runJob() method keep polling the JobTracker until the job is completed. After reading the source code, I know that the InterruptException is caught in runJob(). Thus, I can't interrupt it using Thre

Interrupting JobClient.runJob

2009-01-27 Thread Edwin
Hi I am looking for a way to interrupt a thread that entered JobClient.runJob(). The runJob() method keep polling the JobTracker until the job is completed. After reading the source code, I know that the InterruptException is caught in runJob(). Thus, I can't interrupt it using Thread.interrupt()

Using HDFS for common purpose

2009-01-27 Thread Rasit OZDAS
Hi, I wanted to ask, if HDFS is a good solution just as a distributed db (no running jobs, only get and put commands) A review says that "HDFS is not designed for low latency" and besides, it's implemented in Java. Do these disadvantages prevent us using it? Or could somebody suggest a better (fast