date:20090126

Re: Debugging in Hadoop

2009-01-26 Thread Amareshwari Sriramadasu

patektek wrote: Hello list, I am trying to add some functionality to Hadoop-core and I am having serious issues debugging it. I have searched in the list archive and still have not been able to resolve the issues. Simple question: If I want to insert "LOG.INFO()" statements in Hadoop code is not

Re: Netbeans/Eclipse plugin

2009-01-26 Thread Amit k. Saha

On Tue, Jan 27, 2009 at 2:52 AM, Aaron Kimball wrote: > The Eclipse plugin (which, btw, is now part of Hadoop core in src/contrib/) > currently is inoperable. The DFS viewer works, but the job submission code > is broken. I have started conversation with 3 other community members to work on the N

Re: Zeroconf for hadoop

2009-01-26 Thread Vadim Zaliva

On Mon, Jan 26, 2009 at 11:22, Edward Capriolo wrote: > Zeroconf is more focused on simplicity then security. One of the > original problems that may have been fixes is that any program can > announce any service. IE my laptop can announce that it is the DNS for > google.com etc. I see two distin

DBOutputFormat and auto-generated keys

2009-01-26 Thread Vadim Zaliva

Is it possible to obtain auto-generated IDs when writing data using DBOutputFormat? For example, is it possible to write Mapper which stores records in DB and returns auto-generated IDs of these records? Let me explain what I am trying to achieve: I have data like this which I would like to st

files are inaccessible after HDFS upgrade from 0.18.1 to 1.19.0

2009-01-26 Thread Yuanyuan Tian

Hi, I just upgraded hadoop from 0.18.1 to 0.19.0 following the instructions on http://wiki.apache.org/hadoop/Hadoop_Upgrade. After upgrade, I run fsck, everything seems fine. All the files can be listed in hdfs and the sizes are also correct. But when a mapreduce job tries to read the files as i

Re: Mapred job parallelism

2009-01-26 Thread Aaron Kimball

Indeed, you will need to enable the Fair Scheduler or Capacity Scheduler (which are both in 0.19) to do this. mapred.map.tasks is more a hint than anything else -- if you have more files to map than you set this value to, it will use more tasks than you configured the job to. The newer schedulers w

Re: Mapred job parallelism

2009-01-26 Thread jason hadoop

I believe that the schedule code in 0.19.0 has a framework for this, but I haven't dug into it in detail yet. http://hadoop.apache.org/core/docs/r0.19.0/capacity_scheduler.html >From what I gather you would set up 2 queues, each with guaranteed access to 1/2 of the cluster Then you submit your jo

Mapred job parallelism

2009-01-26 Thread Sagar Naik

Hi Guys, I was trying to setup a cluster so that two jobs can run simultaneously. The conf : number of nodes : 4(say) mapred.tasktracker.map.tasks.maximum=2 and in the joblClient mapred.map.tasks=4 (# of nodes) I also have a condition, that each job should have only one map-task per node

Re: Netbeans/Eclipse plugin

2009-01-26 Thread Aaron Kimball

The Eclipse plugin (which, btw, is now part of Hadoop core in src/contrib/) currently is inoperable. The DFS viewer works, but the job submission code is broken. - Aaron On Sun, Jan 25, 2009 at 9:07 PM, Amit k. Saha wrote: > On Sun, Jan 25, 2009 at 9:32 PM, Edward Capriolo > wrote: > > On Sun,

Re: HDFS - millions of files in one directory?

2009-01-26 Thread Mark Kerzner

Jason, this is awesome, thank you. By the way, is there a book or manual with "best practices?" On Mon, Jan 26, 2009 at 3:13 PM, jason hadoop wrote: > Sequence files rock, and you can use the > * > bin/hadoop dfs -text FILENAME* command line tool to get a toString level > unpacking of the sequenc

Re: What happens in HDFS DataNode recovery?

2009-01-26 Thread Aaron Kimball

Also, see the balancer tool that comes with Hadoop. This background process should be run periodically (Every week or so?) to make sure that data's evenly distributed. http://hadoop.apache.org/core/docs/r0.19.0/hdfs_user_guide.html#Rebalancer - Aaron On Sat, Jan 24, 2009 at 7:40 PM, jason hadoop

Re: HDFS - millions of files in one directory?

2009-01-26 Thread jason hadoop

Sequence files rock, and you can use the * bin/hadoop dfs -text FILENAME* command line tool to get a toString level unpacking of the sequence file key,value pairs. If you provide your own key or value classes, you will need to implement a toString method to get some use out of this. Also, your cla

Re: HDFS - millions of files in one directory?

2009-01-26 Thread Mark Kerzner

Thank you, Doug, then all is clear in my head. Mark On Mon, Jan 26, 2009 at 3:05 PM, Doug Cutting wrote: > Mark Kerzner wrote: > >> Okay, I am convinced. I only noticed that Doug, the originator, was not >> happy about it - but in open source one has to give up control sometimes. >> > > I think

Re: HDFS - millions of files in one directory?

2009-01-26 Thread Doug Cutting

Mark Kerzner wrote: Okay, I am convinced. I only noticed that Doug, the originator, was not happy about it - but in open source one has to give up control sometimes. I think perhaps you misunderstood my remarks. My point was that, if you looked to Nutch's Content class for an example, it is,

Re: HDFS - millions of files in one directory?

2009-01-26 Thread Mark Kerzner

Okay, I am convinced. I only noticed that Doug, the originator, was not happy about it - but in open source one has to give up control sometimes. Thank you, Mark On Mon, Jan 26, 2009 at 2:36 PM, Andy Liu wrote: > SequenceFile supports transparent block-level compression out of the box, > so > yo

Re: HDFS - millions of files in one directory?

2009-01-26 Thread Andy Liu

SequenceFile supports transparent block-level compression out of the box, so you don't have to compress data in your code. Most the time, compression not only saves disk space but improves performance because there's less data to write. Andy On Mon, Jan 26, 2009 at 12:35 PM, Mark Kerzner wrote:

Re: Hadoop 0.19 over OS X : dfs error

2009-01-26 Thread nitesh bhatia

Well its strange.. although I changed default JAVA environment to Java 6 64bit but still my /Library/Java/Home was pointing to java 5. So in config/hadoop_env.sh I changed the path of JAVA_HOME to actual path i.e /System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home .Its working now. O

Re: Hadoop 0.19 over OS X : dfs error

2009-01-26 Thread Raghu Angadi

nitesh bhatia wrote: Thanks. It worked. :) in hadoop-env.sh its required to write exact path for java framework. I changed it to export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home and it started. In hadoop 0.18.2 export JAVA_HOME=/Library/Java/Home is working fine.

Re: Zeroconf for hadoop

2009-01-26 Thread Raghu Angadi

nitesh bhatia wrote: Hi Apple provides opensource discovery service called Bonjour (zeroconf). Is it possible to integrate Zeroconf with Hadoop so that discovery of nodes become automatic ? Presently for setting up multi-node cluster we need to add IPs manually. Integrating it with bonjour can ma

Re: Zeroconf for hadoop

2009-01-26 Thread Edward Capriolo

Zeroconf is more focused on simplicity then security. One of the original problems that may have been fixes is that any program can announce any service. IE my laptop can announce that it is the DNS for google.com etc. I want to mention a related topic to the list. People are approaching the auto-

Re: Zeroconf for hadoop

2009-01-26 Thread nitesh bhatia

For a closed uniform system (yahoo, google), this can work best. This can provide plug-n-play type of system. Through this we can change clusters to dynamic grids. But I am not sure of outcome so far, I am reading the documentation. --nitesh On Mon, Jan 26, 2009 at 1:59 PM, Allen Wittenauer wro

Re: Zeroconf for hadoop

2009-01-26 Thread Raghu Angadi

Nitay wrote: Why not use the distributed coordination service ZooKeeper? When nodes come up they write some ephemeral file in a known ZooKeeper directory and anyone who's interested, i.e. NameNode, can put a watch on the directory and get notified when new children come up. NameNode does not do

Re: Zeroconf for hadoop

2009-01-26 Thread Nitay

Why not use the distributed coordination service ZooKeeper? When nodes come up they write some ephemeral file in a known ZooKeeper directory and anyone who's interested, i.e. NameNode, can put a watch on the directory and get notified when new children come up. On Mon, Jan 26, 2009 at 10:59 AM, Al

Re: HDFS - millions of files in one directory?

2009-01-26 Thread Raghu Angadi

Mark Kerzner wrote: Raghu, if I write all files only one, is the cost the same in one directory or do I need to find the optimal directory size and when full start another "bucket?" If you write only once, then writing won't be much of an issue. You can write them in lexical order to help wit

Re: Zeroconf for hadoop

2009-01-26 Thread Allen Wittenauer

On 1/25/09 8:45 AM, "nitesh bhatia" wrote: > Apple provides opensource discovery service called Bonjour (zeroconf). Is it > possible to integrate Zeroconf with Hadoop so that discovery of nodes become > automatic ? Presently for setting up multi-node cluster we need to add IPs > manually. Integ

Re: HDFS - millions of files in one directory?

2009-01-26 Thread jason hadoop

We like compression if the data is readily compressible and large as it saves on IO time. On Mon, Jan 26, 2009 at 9:35 AM, Mark Kerzner wrote: > Doug, > SequenceFile looks like a perfect candidate to use in my project, but are > you saying that I better use uncompressed data if I am not interes

setNumTasksToExecutePerJvm and Configure

2009-01-26 Thread Saptarshi Guha

Hello, Suppose I set setNumTasksToExecutePerJvm to -1. Then, the same jvm may run several tasks consecutively. 1) Will the configure method(if present) be run for every task? Or only for the first task that the jvm runs? 2)Similarly, the close method(if present) will be run for the / las

Re: HDFS - millions of files in one directory?

2009-01-26 Thread Mark Kerzner

Doug, SequenceFile looks like a perfect candidate to use in my project, but are you saying that I better use uncompressed data if I am not interested in saving disk space? Thank you, Mark On Mon, Jan 26, 2009 at 11:30 AM, Doug Cutting wrote: > Philip (flip) Kromer wrote: > >> Heretrix

Re: HDFS - millions of files in one directory?

2009-01-26 Thread Doug Cutting

Philip (flip) Kromer wrote: Heretrix , Nutch, others use the ARC file format http://www.archive.org/web/researcher/ArcFileFormat.php http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml Nutch does not use A

Re: HDFS - millions of files in one directory?

2009-01-26 Thread Steve Loughran

Philip (flip) Kromer wrote: I ran in this problem, hard, and I can vouch that this is not a windows-only problem. ReiserFS, ext3 and OSX's HFS+ become cripplingly slow with more than a few hundred thousand files in the same directory. (The operation to correct this mistake took a week to run.) T

Re: Debugging in Hadoop

Re: Netbeans/Eclipse plugin

Re: Zeroconf for hadoop

DBOutputFormat and auto-generated keys

files are inaccessible after HDFS upgrade from 0.18.1 to 1.19.0

Re: Mapred job parallelism

Re: Mapred job parallelism

Mapred job parallelism

Re: Netbeans/Eclipse plugin

Re: HDFS - millions of files in one directory?

Re: What happens in HDFS DataNode recovery?

Re: HDFS - millions of files in one directory?

Re: HDFS - millions of files in one directory?

Re: HDFS - millions of files in one directory?

Re: HDFS - millions of files in one directory?

Re: HDFS - millions of files in one directory?

Re: Hadoop 0.19 over OS X : dfs error

Re: Hadoop 0.19 over OS X : dfs error

Re: Zeroconf for hadoop

Re: Zeroconf for hadoop

Re: Zeroconf for hadoop

Re: Zeroconf for hadoop

Re: Zeroconf for hadoop

Re: HDFS - millions of files in one directory?

Re: Zeroconf for hadoop

Re: HDFS - millions of files in one directory?

setNumTasksToExecutePerJvm and Configure

Re: HDFS - millions of files in one directory?

Re: HDFS - millions of files in one directory?

Re: HDFS - millions of files in one directory?

30 matches

Site Navigation

Mail list logo

Footer information