Re: Hadoop0.20 - Class Not Found exception

2009-06-29 Thread Steve Loughran
Amandeep Khurana wrote: I'm getting the following error while starting a MR job: Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver at org.apache.hadoop.mapred.lib.db.DBInputFormat.configure(DBInputFormat.java:297) ... 21 more Caused

Re: FYI, Large-scale graph computing at Google

2009-06-29 Thread Steve Loughran
Edward J. Yoon wrote: I just made a wiki page -- http://wiki.apache.org/hadoop/Hambrug -- Let's discuss about the graph computing framework named Hambrug. ok, first Q, why the Hambrug. To me that's just Hamburg typed wrong, which is going to cause lots of confusion. What about something mor

Re: FYI, Large-scale graph computing at Google

2009-06-29 Thread Steve Loughran
Patterson, Josh wrote: Steve, I'm a little lost here; Is this a replacement for M/R or is it some new code that sits ontop of M/R that runs an iteration over some sort of graph's vertexes? My quick scan of Google's article didn't seem to yeild a distinction. Either way, I'd say for our data that

Re: FYI, Large-scale graph computing at Google

2009-06-25 Thread Steve Loughran
30015 == Exiting project "citerank" == BUILD SUCCESSFUL - at 25/06/09 17:09 Total time: 9 minutes 1 second -- Steve Loughran http://www.10

Re: FYI, Large-scale graph computing at Google

2009-06-25 Thread Steve Loughran
Edward J. Yoon wrote: What do you think about another new computation framework on HDFS? On Mon, Jun 22, 2009 at 3:50 PM, Edward J. Yoon wrote: http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html -- It sounds like Pregel seems, a computing framework based on d

Re: Hadoop 0.20.0, xml parsing related error

2009-06-25 Thread Steve Loughran
Ram Kulbak wrote: Hi, The exception is a result of having xerces in the classpath. To resolve, make sure you are using Java 6 and set the following system property: -Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl This can also be re

Re: "Too many open files" error, which gets resolved after some time

2009-06-22 Thread Steve Loughran
Stas Oskin wrote: Hi. So what would be the recommended approach to pre-0.20.x series? To insure each file is used only by one thread, and then it safe to close the handle in that thread? Regards. good question -I'm not sure. For anythiong you get with FileSystem.get(), its now dangerous to

Re: "Too many open files" error, which gets resolved after some time

2009-06-22 Thread Steve Loughran
Raghu Angadi wrote: Is this before 0.20.0? Assuming you have closed these streams, it is mostly https://issues.apache.org/jira/browse/HADOOP-4346 It is the JDK internal implementation that depends on GC to free up its cache of selectors. HADOOP-4346 avoids this by using hadoop's own cache.

Re: "Too many open files" error, which gets resolved after some time

2009-06-22 Thread Steve Loughran
Scott Carey wrote: Furthermore, if for some reason it is required to dispose of any objects after others are GC'd, weak references and a weak reference queue will perform significantly better in throughput and latency - orders of magnitude better - than finalizers. Good point. I would mak

Re: "Too many open files" error, which gets resolved after some time

2009-06-22 Thread Steve Loughran
jason hadoop wrote: Yes. Otherwise the file descriptors will flow away like water. I also strongly suggest having at least 64k file descriptors as the open file limit. On Sun, Jun 21, 2009 at 12:43 PM, Stas Oskin wrote: Hi. Thanks for the advice. So you advice explicitly closing each and eve

Re: Name Node HA (HADOOP-4539)

2009-06-22 Thread Steve Loughran
Andrew Wharton wrote: https://issues.apache.org/jira/browse/HADOOP-4539 I am curious about the state of this fix. It is listed as "Incompatible", but is resolved and committed (according to the comments). Is the backup name node going to make it into 0.21? Will it remove the SPOF for HDFS? And i

Re: Hadoop Eclipse Plugin

2009-06-18 Thread Steve Loughran
s are interfering If everything works, the problem is in the eclipse plugin (which I don't use, and cannot assist with) -- Steve Loughran http://www.1060.org/blogxter/publish/5 Author: Ant in Action http://antbook.org/

Re: Running Hadoop/Hbase in a OSGi container

2009-06-12 Thread Steve Loughran
Ninad Raut wrote: OSGi provides navigability to your components and create a life cycle for each of those components viz; install. start, stop, un- deploy etc. This is the reason why we are thinking of creating components using OSGi. The problem we are facing is our components using mapreduce and

Re: Multiple NIC Cards

2009-06-10 Thread Steve Loughran
John Martyniak wrote: Does hadoop "cache" the server names anywhere? Because I changed to using DNS for name resolution, but when I go to the nodes view, it is trying to view with the old name. And I changed the hadoop-site.xml file so that it no longer has any of those values. in SVN hea

Re: Multiple NIC Cards

2009-06-09 Thread Steve Loughran
John Martyniak wrote: When I run either of those on either of the two machines, it is trying to resolve against the DNS servers configured for the external addresses for the box. Here is the result Server:xxx.xxx.xxx.69 Address:xxx.xxx.xxx.69#53 OK. in an ideal world, each NIC ha

Re: Multiple NIC Cards

2009-06-09 Thread Steve Loughran
John Martyniak wrote: I am running Mac OS X. So en0 points to the external address and en1 points to the internal address on both machines. Here is the internal results from duey: en1: flags=8963 mtu 1500 inet6 fe80::21e:52ff:fef4:65%en1 prefixlen 64 scopeid 0x5 inet 192.168.1.102 n

Re: Multiple NIC Cards

2009-06-09 Thread Steve Loughran
John Martyniak wrote: My original names where huey-direct and duey-direct, both names in the /etc/hosts file on both machines. Are nn.internal and jt.interal special names? no, just examples on a multihost network when your external names could be something completely different. What does

Re: Multiple NIC Cards

2009-06-09 Thread Steve Loughran
John Martyniak wrote: David, For the Option #1. I just changed the names to the IP Addresses, and it still comes up as the external name and ip address in the log files, and on the job tracker screen. So option 1 is a no go. When I change the "dfs.datanode.dns.interface" values it doesn't

Re: Every time the mapping phase finishes I see this

2009-06-08 Thread Steve Loughran
Mayuran Yogarajah wrote: There are always a few 'Failed/Killed Task Attempts' and when I view the logs for these I see: - some that are empty, ie stdout/stderr/syslog logs are all blank - several that say: 2009-06-06 20:47:15,309 WARN org.apache.hadoop.mapred.TaskTracker: Error running child

Re: Hadoop scheduling question

2009-06-08 Thread Steve Loughran
Aaron Kimball wrote: Finally, there's a third scheduler called the Capacity scheduler. It's similar to the fair scheduler, in that it allows guarantees of minimum availability for different pools. I don't know how it apportions additional extra resources though -- this is the one I'm least famil

Re: Monitoring hadoop?

2009-06-07 Thread Steve Loughran
Matt Massie wrote: Anthony- The ganglia web site is at http://ganglia.info/ with documentation in a wiki at http://ganglia.wiki.sourceforge.net/. There is also a good wiki page at IBM as well http://www.ibm.com/developerworks/wikis/display/WikiPtype/ganglia . Ganglia packages are available

Re: Hadoop ReInitialization.

2009-06-03 Thread Steve Loughran
b wrote: But after formatting and starting DFS i need to wait some time (sleep 60) before putting data into HDFS. Else i will receive "NotReplicatedYetException". that means the namenode is up but there aren't enough workers yet.

Re: question about when shuffle/sort start working

2009-06-01 Thread Steve Loughran
Todd Lipcon wrote: Hi Jianmin, This is not (currently) supported by Hadoop (or Google's MapReduce either afaik). What you're looking for sounds like something more like Microsoft's Dryad. One thing that is supported in versions of Hadoop after 0.19 is JVM reuse. If you enable this feature, task

Re: org.apache.hadoop.ipc.client : trying connect to server failed

2009-05-29 Thread Steve Loughran
ashish pareek wrote: Yes I am able to ping and ssh between two virtual machine and even i have set ip address of both the virtual machines in their respective /etc/hosts file ... thanx for reply .. if you suggest some other thing which i could have missed or any remed

Re: hadoop hardware configuration

2009-05-28 Thread Steve Loughran
Patrick Angeles wrote: Sorry for cross-posting, I realized I sent the following to the hbase list when it's really more a Hadoop question. This is an interesting question. Obviously as an HP employee you must assume that I'm biased when I say HP DL160 servers are good value for the workers,

Re: ssh issues

2009-05-26 Thread Steve Loughran
hmar...@umbc.edu wrote: Steve, Security through obscurity is always a good practice from a development standpoint and one of the reasons why tricking you out is an easy task. :) My most recent presentation on HDFS clusters is now online, notice how it doesn't gloss over the security: http://

Re: ssh issues

2009-05-22 Thread Steve Loughran
Pankil Doshi wrote: Well i made ssh with passphares. as the system in which i need to login requires ssh with pass phrases and those systems have to be part of my cluster. and so I need a way where I can specify -i path/to key/ and passphrase to hadoop in before hand. Pankil Well, are trying

Re: Username in Hadoop cluster

2009-05-21 Thread Steve Loughran
Pankil Doshi wrote: Hello everyone, Till now I was using same username on all my hadoop cluster machines. But now I am building my new cluster and face a situation in which I have different usernames for different machines. So what changes will have to make in configuring hadoop. using same use

Re: Optimal Filesystem (and Settings) for HDFS

2009-05-20 Thread Steve Loughran
Bryan Duxbury wrote: We use XFS for our data drives, and we've had somewhat mixed results. Thanks for that. I've just created a wiki page to put some of these notes up -extensions and some hard data would be welcome http://wiki.apache.org/hadoop/DiskSetup One problem we have for hard data

Re: Suspend or scale back hadoop instance

2009-05-19 Thread Steve Loughran
John Clarke wrote: Hi, I am working on a project that is suited to Hadoop and so want to create a small cluster (only 5 machines!) on our servers. The servers are however used during the day and (mostly) idle at night. So, I want Hadoop to run at full throttle at night and either scale back or

Re: Is there any performance issue with Jrockit JVM for Hadoop

2009-05-18 Thread Steve Loughran
Tom White wrote: On Mon, May 18, 2009 at 11:44 AM, Steve Loughran wrote: Grace wrote: To follow up this question, I have also asked help on Jrockit forum. They kindly offered some useful and detailed suggestions according to the JRA results. After updating the option list, the performance did

Re: Is there any performance issue with Jrockit JVM for Hadoop

2009-05-18 Thread Steve Loughran
Grace wrote: To follow up this question, I have also asked help on Jrockit forum. They kindly offered some useful and detailed suggestions according to the JRA results. After updating the option list, the performance did become better to some extend. But it is still not comparable with the Sun JV

Re: Beware sun's jvm version 1.6.0_05-b13 on linux

2009-05-18 Thread Steve Loughran
Allen Wittenauer wrote: On 5/15/09 11:38 AM, "Owen O'Malley" wrote: We have observed that the default jvm on RedHat 5 I'm sure some people are scratching their heads at this. The default JVM on at least RHEL5u0/1 is a GCJ-based 1.4, clearly incapable of running Hadoop. We [and, r

Re: public IP for datanode on EC2

2009-05-15 Thread Steve Loughran
Tom White wrote: Hi Joydeep, The problem you are hitting may be because port 50001 isn't open, whereas from within the cluster any node may talk to any other node (because the security groups are set up to do this). However I'm not sure this is a good approach. Configuring Hadoop to use public

Re: How to do load control of MapReduce

2009-05-12 Thread Steve Loughran
Stefan Will wrote: Yes, I think the JVM uses way more memory than just its heap. Now some of it might be just reserved memory, but not actually used (not sure how to tell the difference). There are also things like thread stacks, jit compiler cache, direct nio byte buffers etc. that take up proce

Re: How to do load control of MapReduce

2009-05-12 Thread Steve Loughran
zsongbo wrote: Hi Stefan, Yes, the 'nice' cannot resolve this problem. Now, in my cluster, there are 8GB of RAM. My java heap configuration is: HDFS DataNode : 1GB HBase-RegionServer: 1.5GB MR-TaskTracker: 1GB MR-child: 512MB (max child task is 6, 4 map task + 2 reduce task) But the memory u

Re: Huge DataNode Virtual Memory Usage

2009-05-12 Thread Steve Loughran
Stefan Will wrote: Raghu, I don't actually have exact numbers from jmap, although I do remember that jmap -histo reported something less than 256MB for this process (before I restarted it). I just looked at another DFS process that is currently running and has a VM size of 1.5GB (~600 resident)

Re: Winning a sixty second dash with a yellow elephant

2009-05-12 Thread Steve Loughran
Arun C Murthy wrote: ... oh, and getting it to run a marathon too! http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte_in_162.html Owen & Arun Lovely. I will now stick up the pic of you getting the first results in on your laptop at apachecon

Re: Re-Addressing a cluster

2009-05-11 Thread Steve Loughran
jason hadoop wrote: Now that I think about it, the reverse lookups in my clusters work. and you have made sure that IPv6 is turned off, right?

Re: datanode replication

2009-05-11 Thread Steve Loughran
Jeff Hammerbacher wrote: Hey Vishal, Check out the chooseTarget() method(s) of ReplicationTargetChooser.java in the org.apache.hadoop.hdfs.server.namenode package: http://svn.apache.org/viewvc/hadoop/core/trunk/src/hdfs/org/apache/hadoop/hdfs/server/namenode/ReplicationTargetChooser.java?view=ma

Re: Re-Addressing a cluster

2009-05-11 Thread Steve Loughran
jason hadoop wrote: You should be able to relocate the cluster's IP space by stopping the cluster, modifying the configuration files, resetting the dns and starting the cluster. Be best to verify connectivity with the new IP addresses before starting the cluster. to the best of my knowledge the

Re: Is there any performance issue with Jrockit JVM for Hadoop

2009-05-08 Thread Steve Loughran
Grace wrote: Thanks all for your replying. I have run several times with different Java options for Map/Reduce tasks. However there is no much difference. Following is the example of my test setting: Test A: -Xmx1024m -server -XXlazyUnlocking -XlargePages -XgcPrio:deterministic -XXallocPrefetch

Re: Is there any performance issue with Jrockit JVM for Hadoop

2009-05-07 Thread Steve Loughran
Chris Collins wrote: a couple of years back we did a lot of experimentation between sun's vm and jrocket. We had initially assumed that jrocket was going to scream since thats what the press were saying. In short, what we discovered was that certain jdk library usage was a little bit faster w

Re: move tasks to another machine on the fly

2009-05-06 Thread Steve Loughran
Tom White wrote: Hi David, The MapReduce framework will attempt to rerun failed tasks automatically. However, if a task is running out of memory on one machine, it's likely to run out of memory on another, isn't it? Have a look at the mapred.child.java.opts configuration property for the amount

Re: What do we call Hadoop+HBase+Lucene+Zookeeper+etc....

2009-05-06 Thread Steve Loughran
Edward Capriolo wrote: 'cloud computing' is a hot term. According to the definition provided by wikipedia http://en.wikipedia.org/wiki/Cloud_computing, Hadoop+HBase+Lucene+Zookeeper, fits some of the criteria but not well. Hadoop is scalable, with HOD it is dynamically scalable. I do not think

Re: What do we call Hadoop+HBase+Lucene+Zookeeper+etc....

2009-05-05 Thread Steve Loughran
Bradford Stephens wrote: Hey all, I'm going to be speaking at OSCON about my company's experiences with Hadoop and Friends, but I'm having a hard time coming up with a name for the entire software ecosystem. I'm thinking of calling it the "Apache CloudStack". Does this sound legit to you all? :)

Re: I need help

2009-04-30 Thread Steve Loughran
that? log4j.appender.console.target=System.err -- Steve Loughran http://www.1060.org/blogxter/publish/5 Author: Ant in Action http://antbook.org/

Re: Can i make a node just an HDFS client to put/get data into hadoop

2009-04-29 Thread Steve Loughran
Usman Waheed wrote: Hi All, Is it possible to make a node just a hadoop client so that it can put/get files into HDFS but not act as a namenode or datanode? I already have a master node and 3 datanodes but need to execute puts/gets into hadoop in parallel using more than just one machine other

Re: programming java ee and hadoop at the same time

2009-04-29 Thread Steve Loughran
Bill Habermaas wrote: George, I haven't used the Hadoop perspective in Eclipse so I can't help with that specifically but map/reduce is a batch process (and can be long running). In my experience, I've written servlets that write to HDFS and then have a background process perform the map/reduce

Re: I need help

2009-04-28 Thread Steve Loughran
Razen Al Harbi wrote: Hi all, I am writing an application in which I create a forked process to execute a specific Map/Reduce job. The problem is that when I try to read the output stream of the forked process I get nothing and when I execute the same job manually it starts printing the outpu

Re: Storing data-node content to other machine

2009-04-28 Thread Steve Loughran
Vishal Ghawate wrote: Hi, I want to store the contents of all the client machine(datanode)of hadoop cluster to centralized machine with high storage capacity.so that tasktracker will be on the client machine but the contents are stored on the centralized machine. Can anybody he

Re: Processing High CPU & Memory intensive tasks on Hadoop - Architecture question

2009-04-28 Thread Steve Loughran
Aaron Kimball wrote: I'm not aware of any documentation about this particular use case for Hadoop. I think your best bet is to look into the JNI documentation about loading native libraries, and go from there. - Aaron You could also try 1. Starting the main processing app as a process on the m

Re: No route to host prevents from storing files to HDFS

2009-04-23 Thread Steve Loughran
Stas Oskin wrote: Hi. 2009/4/23 Matt Massie Just for clarity: are you using any type of virtualization (e.g. vmware, xen) or just running the DataNode java process on the same machine? What is "fs.default.name" set to in your hadoop-site.xml? This machine has OpenVZ installed indeed, bu

Re: No route to host prevents from storing files to HDFS

2009-04-22 Thread Steve Loughran
Stas Oskin wrote: Hi again. Other tools, like balancer, or the web browsing from namenode, don't work as well. This because other nodes complain about not reaching the offending node as well. I even tried netcat'ing the IP/port from another node - and it successfully connected. Any advice on t

Re: Error reading task output

2009-04-21 Thread Steve Loughran
Aaron Kimball wrote: Cam, This isn't Hadoop-specific, it's how Linux treats its network configuration. If you look at /etc/host.conf, you'll probably see a line that says "order hosts, bind" -- this is telling Linux's DNS resolution library to first read your /etc/hosts file, then check an exter

Re: getting DiskErrorException during map

2009-04-21 Thread Steve Loughran
Jim Twensky wrote: Yes, here is how it looks: hadoop.tmp.dir /scratch/local/jim/hadoop-${user.name} so I don't know why it still writes to /tmp. As a temporary workaround, I created a symbolic link from /tmp/hadoop-jim to /scratch/... and it works fine now but if you t

Re: Error reading task output

2009-04-21 Thread Steve Loughran
Cam Macdonell wrote: Well, for future googlers, I'll answer my own post. Watch our for the hostname at the end of "localhost" lines on slaves. One of my slaves was registering itself as "localhost.localdomain" with the jobtracker. Is there a way that Hadoop could be made to not be so depe

Re: fyi: A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks

2009-04-21 Thread Steve Loughran
Andrew Newman wrote: They are comparing an indexed system with one that isn't. Why is Hadoop faster at loading than the others? Surely no one would be surprised that it would be slower - I'm surprised at how well Hadoop does. Who want to write a paper for next year, "grep vs reverse index"? 2

Re: How many people is using Hadoop Streaming ?

2009-04-21 Thread Steve Loughran
Tim Wintle wrote: On Fri, 2009-04-03 at 09:42 -0700, Ricky Ho wrote: 1) I can pick the language that offers a different programming paradigm (e.g. I may choose functional language, or logic programming if they suit the problem better). In fact, I can even chosen Erlang at the map() and Prolog

Re: RPM spec file for 0.19.1

2009-04-21 Thread Steve Loughran
Ian Soboroff wrote: Steve Loughran writes: I think from your perpective it makes sense as it stops anyone getting itchy fingers and doing their own RPMs. Um, what's wrong with that? It's reallly hard to do good RPM spec files. If cloudera are willing to pay Matt to do it, not

Re: Amazon Elastic MapReduce

2009-04-03 Thread Steve Loughran
Brian Bockelman wrote: On Apr 2, 2009, at 3:13 AM, zhang jianfeng wrote: seems like I should pay for additional money, so why not configure a hadoop cluster in EC2 by myself. This already have been automatic using script. Not everyone has a support team or an operations team or enough tim

Re: Using HDFS to serve www requests

2009-04-03 Thread Steve Loughran
were accessible under an NIO front end, then applications written for the NIO APIs would work with the supported filesystems, with no need to code specifically for hadoop's not-yet-stable APIs Steve Loughran wrote: Edward Capriolo wrote: It is a little more natural to connect to HDFS f

Re: RPM spec file for 0.19.1

2009-04-03 Thread Steve Loughran
Christophe Bisciglia wrote: Hey Ian, we are totally fine with this - the only reason we didn't contribute the SPEC file is that it is the output of our internal build system, and we don't have the bandwidth to properly maintain multiple RPMs. That said, we chatted about this a bit today, and wer

Re: RPM spec file for 0.19.1

2009-04-03 Thread Steve Loughran
Ian Soboroff wrote: I created a JIRA (https://issues.apache.org/jira/browse/HADOOP-5615) with a spec file for building a 0.19.1 RPM. I like the idea of Cloudera's RPM file very much. In particular, it has nifty /etc/init.d scripts and RPM is nice for managing updates. However, it's for an older

Re: Typical hardware configurations

2009-03-31 Thread Steve Loughran
Scott Carey wrote: On 3/30/09 4:41 AM, "Steve Loughran" wrote: Ryan Rawson wrote: You should also be getting 64-bit systems and running a 64 bit distro on it and a jvm that has -d64 available. For the namenode yes. For the others, you will take a fairly big memory hit (1.5X o

Re: Typical hardware configurations

2009-03-30 Thread Steve Loughran
Ryan Rawson wrote: You should also be getting 64-bit systems and running a 64 bit distro on it and a jvm that has -d64 available. For the namenode yes. For the others, you will take a fairly big memory hit (1.5X object size) due to the longer pointers. JRockit has special compressed pointers

Re: Using HDFS to serve www requests

2009-03-30 Thread Steve Loughran
Edward Capriolo wrote: It is a little more natural to connect to HDFS from apache tomcat. This will allow you to skip the FUSE mounts and just use the HDFS-API. I have modified this code to run inside tomcat. http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample I will not testify to how well

Re: virtualization with hadoop

2009-03-30 Thread Steve Loughran
Oliver Fischer wrote: Hello Vishal, I did the same some weeks ago. The most important fact is, that it works. But it is horrible slow if you not have enough ram and multiple disks since all I/o-Operations go to the same disk. they may go to separate disks underneath, but performance is bad as

Re: JNI and calling Hadoop jar files

2009-03-30 Thread Steve Loughran
jason hadoop wrote: The exception reference to *org.apache.hadoop.hdfs.DistributedFileSystem*, implies strongly that a hadoop-default.xml file, or at least a job.xml file is present. Since hadoop-default.xml is bundled into the hadoop-0.X.Y-core.jar, the assumption is that the core jar is availa

Re: Coordination between Mapper tasks

2009-03-20 Thread Steve Loughran
Stuart White wrote: The nodes in my cluster have 4 cores & 4 GB RAM. So, I've set mapred.tasktracker.map.tasks.maximum to 3 (leaving 1 core for "breathing room"). My process requires a large dictionary of terms (~ 2GB when loaded into RAM). The terms are looked-up very frequently, so I want th

Re: using virtual slave machines

2009-03-12 Thread Steve Loughran
Karthikeyan V wrote: There is no specific procedure for configuring virtual machine slaves. make sure the following thing are done. I've used these as the beginning of a page on this http://wiki.apache.org/hadoop/VirtualCluster

Re: Extending ClusterMapReduceTestCase

2009-03-12 Thread Steve Loughran
jason hadoop wrote: I am having trouble reproducing this one. It happened in a very specific environment that pulled in an alternate sax parser. The bottom line is that jetty expects a parser with particular capabilities and if it doesn't get one, odd things happen. In a day or so I will have h

Re: Persistent HDFS On EC2

2009-03-12 Thread Steve Loughran
e sure that they are created correctly, as there is no direct migration of EBS to different availability zones. View EBS as renting space in SAN and it starts to make sense. -- Steve Loughran http://www.1060.org/blogxter/publish/5 Author: Ant in Action http://antbook.org/

Re: Persistent HDFS On EC2

2009-03-11 Thread Steve Loughran
Malcolm Matalka wrote: If this is not the correct place to ask Hadoop + EC2 questions please let me know. I am trying to get a handle on how to use Hadoop on EC2 before committing any money to it. My question is, how do I maintain a persistent HDFS between restarts of instances. Most of th

Re: Extending ClusterMapReduceTestCase

2009-03-11 Thread Steve Loughran
jason hadoop wrote: The other goofy thing is that the xml parser that is commonly first in the class path, validates xml in a way that is opposite to what jetty wants. What does ant -diagnostics say? It will list the XML parser at work This line in the preamble before theClusterMapReduceTes

Re: DataNode gets 'stuck', ends up with two DataNode processes

2009-03-09 Thread Steve Loughran
knowledge on how to solve this problem. Thanks for any help! === Garhan Attebury Systems Administrator UNL Research Computing Facility 402-472-7761 === -- Steve Loughran http://www.1060.org/blogxter/publish/5 Author: Ant in Action http://antbook.org/

Re: master trying fetch data from slave using "localhost" hostname :)

2009-03-09 Thread Steve Loughran
pavelkolo...@gmail.com wrote: On Fri, 06 Mar 2009 14:41:57 -, jason hadoop wrote: I see that when the host name of the node is also on the localhost line in /etc/hosts I erased all records with "localhost" from all "/etc/hosts" files and all fine now :) Thank you :) what does /et

Re: Running 0.19.2 branch in production before release

2009-03-05 Thread Steve Loughran
Aaron Kimball wrote: I recommend 0.18.3 for production use and avoid the 19 branch entirely. If your priority is stability, then stay a full minor version behind, not just a revision. Of course, if everyone stays that far behind, they don't get to find the bugs for other people. * If you pla

Re: [ANNOUNCE] Hadoop release 0.19.1 available

2009-03-03 Thread Steve Loughran
Mar 3, 2009 at 1:16 PM, Steve Loughran wrote: Aviad sela wrote: Nigel Thanks, I have extracted the new project. However, I am having problems building the project I am using Eclipse 3.4 and ant 1.7 I recieve error compiling core classes * compile-core-classes*: BUILD FAILED * D:\Work

Re: [ANNOUNCE] Hadoop release 0.19.1 available

2009-03-03 Thread Steve Loughran
\Hadoop\build.xml:302: java.lang.ExceptionInInitializerError * it points to the the webxml tag Try an ant -verbose and post the full log, we may be able to look at the problem more. Also, run an ant -diagnostics and include what it prints -- Steve Loughran http://www.1060.org

Re: How does NVidia GPU compare to Hadoop/MapReduce

2009-03-02 Thread Steve Loughran
Dan Zinngrabe wrote: On Fri, Feb 27, 2009 at 11:21 AM, Doug Cutting wrote: I think they're complementary. Hadoop's MapReduce lets you run computations on up to thousands of computers potentially processing petabytes of data. It gets data from the grid to your computation, reliably stores outp

Re: HDFS architecture based on GFS?

2009-02-27 Thread Steve Loughran
kang_min82 wrote: Hello Matei, Which Tasktracker did you mean here ? I don't understand that. In general we have mane Tasktrackers and each of them runs on one separate Datanode. Why doesn't the JobTracker talk directly to the Namenode for a list of Datanodes and then performs the MapReduce t

Re: the question about the common pc?

2009-02-23 Thread Steve Loughran
Tim Wintle wrote: On Fri, 2009-02-20 at 13:07 +, Steve Loughran wrote: I've been doing MapReduce work over small in-memory datasets using Erlang, which works very well in such a context. I've got some (mainly python) scripts (that will probably be run with hadoop streaming

Re: How to use Hadoop API to submit job?

2009-02-20 Thread Steve Loughran
Wu Wei wrote: Hi, I used to submit Hadoop job with the utility RunJar.main() on hadoop 0.18. On hadoop 0.19, because the commandLineConfig of JobClient was null, I got a NullPointerException error when RunJar.main() calls GenericOptionsParser to get libJars (0.18 didn't do this call). I also

Re: the question about the common pc?

2009-02-20 Thread Steve Loughran
?? wrote: Actually, there's a widely misunderstanding of this "Common PC" . Common PC doesn't means PCs which are daily used, It means the performance of each node, can be measured by common pc's computing power. In the matter of fact, we dont use Gb enthernet for daily pcs' communication, we

Re: GenericOptionsParser warning

2009-02-20 Thread Steve Loughran
Rasit OZDAS wrote: Hi, There is a JIRA issue about this problem, if I understand it correctly: https://issues.apache.org/jira/browse/HADOOP-3743 Strange, that I searched all source code, but there exists only this control in 2 places: if (!(job.getBoolean("mapred.used.genericoptionsparser", fal

Re: GenericOptionsParser warning

2009-02-18 Thread Steve Loughran
Sandhya E wrote: Hi All I prepare my JobConf object in a java class, by calling various set apis in JobConf object. When I submit the jobconf object using JobClient.runJob(conf), I'm seeing the warning: "Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for t

Re: HADOOP-2536 supports Oracle too?

2009-02-17 Thread Steve Loughran
sandhiya wrote: Hi, I'm using postgresql and the driver is not getting detected. How do you run it in the first place? I just typed bin/hadoop jar /root/sandy/netbeans/TableAccess/dist/TableAccess.jar at the terminal without the quotes. I didnt copy any files from my local drives into the Had

Re: HDFS architecture based on GFS?

2009-02-17 Thread Steve Loughran
Amr Awadallah wrote: I didn't understand usage of "malicuous" here, but any process using HDFS api should first ask NameNode where the Rasit, Matei is referring to fact that a malicious peace of code can bypass the Name Node and connect to any data node directly, or probe all data nodes for

Re: datanode not being started

2009-02-17 Thread Steve Loughran
Sandy wrote: Since I last used this machine, Parallels Desktop was installed by the admin. I am currently suspecting that somehow this is interfering with the function of Hadoop (though Java_HOME still seems to be ok). Has anyone had any experience with this being a cause of interference? I

Re: stable version

2009-02-16 Thread Steve Loughran
g XML parser on the classpath -and yet refusing to add the four lines of code needed to handle this- then we are letting down the users On 2/13/09, Steve Loughran wrote: Anum Ali wrote: This only occurs in linux , in windows its fine. do a java -version for me, and an ant -diagnostics, st

Re: stable version

2009-02-13 Thread Steve Loughran
Anum Ali wrote: This only occurs in linux , in windows its fine. do a java -version for me, and an ant -diagnostics, stick both on the bugrep https://issues.apache.org/jira/browse/HADOOP-5254 It may be that XInclude only went live in java1.6u5; I'm running a JRockit JVM which predates that

Re: stable version

2009-02-13 Thread Steve Loughran
Anum Ali wrote: yes On Thu, Feb 12, 2009 at 4:33 PM, Steve Loughran wrote: Anum Ali wrote: Iam working on Hadoop SVN version 0.21.0-dev. Having some problems , regarding running its examples/file from eclipse. It gives error for Exception in thread "

Re: Namenode not listening for remote connections to port 9000

2009-02-13 Thread Steve Loughran
Michael Lynch wrote: Hi, As far as I can tell I've followed the setup instructions for a hadoop cluster to the letter, but I find that the datanodes can't connect to the namenode on port 9000 because it is only listening for connections from localhost. In my case, the namenode is called cent

Re: stable version

2009-02-12 Thread Steve Loughran
Anum Ali wrote: yes On Thu, Feb 12, 2009 at 4:33 PM, Steve Loughran wrote: Anum Ali wrote: Iam working on Hadoop SVN version 0.21.0-dev. Having some problems , regarding running its examples/file from eclipse. It gives error for Exception in thread "

Re: Best practices on spliltting an input line?

2009-02-12 Thread Steve Loughran
Stefan Podkowinski wrote: I'm currently using OpenCSV which can be found at http://opencsv.sourceforge.net/ but haven't done any performance tests on it yet. In my case simply splitting strings would not work anyways, since I need to handle quotes and separators within quoted values, e.g. "a","a

Re: stable version

2009-02-12 Thread Steve Loughran
Anum Ali wrote: Iam working on Hadoop SVN version 0.21.0-dev. Having some problems , regarding running its examples/file from eclipse. It gives error for Exception in thread "main" java.lang.UnsupportedOperationException: This parser does not support specification "null" version "null" at jav

Re: File Transfer Rates

2009-02-11 Thread Steve Loughran
Brian Bockelman wrote: Just to toss out some numbers (and because our users are making interesting numbers right now) Here's our external network router: http://mrtg.unl.edu/~cricket/?target=%2Frouter-interfaces%2Fborder2%2Ftengigabitethernet2_2;view=Octets Here's the application-level

Re: anybody knows an apache-license-compatible impl of Integer.parseInt?

2009-02-11 Thread Steve Loughran
Zheng Shao wrote: We need to implement a version of Integer.parseInt/atoi from byte[] instead of String to avoid the high cost of creating a String object. I wanted to take the open jdk code but the license is GPL: http://www.docjar.com/html/api/java/lang/Integer.java.html Does anybody know an

Re: Backing up HDFS?

2009-02-11 Thread Steve Loughran
Allen Wittenauer wrote: On 2/9/09 4:41 PM, "Amandeep Khurana" wrote: Why would you want to have another backup beyond HDFS? HDFS itself replicates your data so if the reliability of the system shouldnt be a concern (if at all it is)... I'm reminded of a previous job where a site administrator

  1   2   3   >