Re: CDH and Hadoop

2011-03-24 Thread Eli Collins
Hey Rita, All software developed by Cloudera for CDH is Apache (v2) licensed and freely available. See these docs [1,2] for more info. We publish source packages (which includes the packaging source) and source tarballs, you can find these at http://archive.cloudera.com/cdh/3/. See the CHANGES.t

Re: Permissions issue

2010-11-09 Thread Eli Collins
Adding cdh-user@, BCC common-user@ Hey Steve, Sounds like you need to chmod 777 the staging dir. By default mapreduce.jobtracker.staging.root.dir is ${hadoop.tmp.dir}/mapred/staging but per the mapred configuration below setting this to /user is better and should mean you don't need to do the abo

Re: how to revert from a new version to an older one (CDH3)?

2010-09-07 Thread Eli Collins
restrictive) in CDH3 beta 2 so that it's possible to use more than one build in a cluster. http://archive.cloudera.com/cdh/3/hadoop-0.20.2+320.releasenotes.html Thanks, Eli On Wed, Sep 1, 2010 at 11:24 AM, Eli Collins wrote: > Hey guys, > > In CDH3 you can pin your repo to a

Re: how to revert from a new version to an older one (CDH3)?

2010-09-01 Thread Eli Collins
Hey guys, In CDH3 you can pin your repo to a particular release. Eg in the following docs to use beta 1 specify "redhat/cdh/3b1" instead of "redhat/cdh/3" in the repo file (for RH), or "-cdh3b1" instead of "-cdh3" in the list file (for Debian). You'll need to do a "yum clean metadata" or "apt-get

Re: How to patch Hadoop 0.20.2 with symbolic links patch

2010-07-16 Thread Eli Collins
symlinks: http://people.apache.org/~tomwhite/hadoop-0.21.0-candidate-0/ Thanks, Eli On Fri, Jul 16, 2010 at 7:20 AM, Yujun Wu wrote: > Hello, > > I am new to hadoop. Recently, I installed Hadoop 0.20.2 and it works. I > tried to patch it with the  symbolic links patch by Eli (Mr.

Re: JAVA_HOME not set

2010-05-21 Thread Eli Collins
Hey David, This issued was fixed in CDH2 and will be in the next beta of CDH3. Appreciate the feedback! Thanks, Eli On Tue, May 18, 2010 at 12:59 PM, David Howell wrote: > Are you using Cloudera's hadoop 0.20.2? > > There's some logic in bin/hadoop-config.sh that seems to be failing if > JAVA_H

Re: Silly question about 'rack aware'...

2010-05-09 Thread Eli Collins
Hey Michael, The script specified by dfs.network.script is passed both host names and IPs. In most cases an IP is passed, however in some cases (eg when using dfs.hosts files) a hostname is passed. Thanks, Eli ps - useful pointers: http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200

Re: Try to mount HDFS

2010-04-26 Thread Eli Collins
The issue that required you changing ports is HDFS-961. Thanks, Eli On Fri, Apr 23, 2010 at 6:30 AM, Christian Baun wrote: > Brian, > > You got it!!! :-) > It works (partly)! > > i switched to Port 9000. core-site.xml includes now: > >         >                fs.default.name >                

Re: Symbolic link in Hadoop HDFS?

2010-03-10 Thread Eli Collins
> in shell commands, with one physical copy of a big data set, different ppl > can easily create different subset of it to work on by using symlinks. > > Thanks, > > Michael > > --- On Wed, 3/10/10, Eli Collins wrote: > > From: Eli Collins > Subject: Re: Sym

Re: Symbolic link in Hadoop HDFS?

2010-03-10 Thread Eli Collins
dered in the next release? > > Thanks, > > Michael > > --- On Tue, 3/9/10, Eli Collins wrote: > > From: Eli Collins > Subject: Re: Symbolic link in Hadoop HDFS? > To: common-user@hadoop.apache.org > Date: Tuesday, March 9, 2010, 8:01 PM > > Hey Michael, >

Re: Symbolic link in Hadoop HDFS?

2010-03-09 Thread Eli Collins
Hey Michael, Symbolic links has been implemented [1] but are not yet available in a Hadoop release. The implementation is only available to clients that use the new FileContext API so clients like Hive need to be migrated from using FileSystem to FileContext. This is currently being done in Hadoop

Re: why not zookeeper for the namenode

2010-02-23 Thread Eli Collins
> From what I read, I thought, that bookkeeper would be the ideal enhancement > for the namenode, to make it distributed and therefor finaly highly available. Being distributed doesn't imply high availability. Availability is about minimizing downtime. For example, a primary that can fail over to

Re: Installing in local Maven repository

2010-01-27 Thread Eli Collins
On Wed, Jan 27, 2010 at 8:38 AM, Stuart Sierra wrote: > Hello, > > Does anyone have up-to-date instructions for installing hadoop-core in > a local Maven repository?  The instructions at > http://wiki.apache.org/hadoop/HowToContribute do not work (the > mvn-install target is not defined). > > Than

Re: Slowdown with Hadoop Sort benchmark when using Jumbo frames?

2010-01-22 Thread Eli Collins
On Fri, Jan 22, 2010 at 3:57 AM, stephen mulcahy wrote: > Hi, > > I've been running some tests on some new hardware we have acquired. > > As a baseline, I ran the Hadoop sort[1] with 10GB and 100GB of data. As an > experiment, I ran it on 4 systems (1 configured as master+slave and 3 as > slaves)

Re: "Lost" HDFS space

2010-01-14 Thread Eli Collins
On Thu, Jan 14, 2010 at 12:43 AM, Erik Forsberg wrote: > Hi! > > I'm having trouble figuring out the numbers reported by 'hadoop dfs > -dus' versus the numbers reported by the namenode web interface. > > I have a 4 node clusters, 4TB of disk on each node. > > hadoop dfs -dus / > hdfs://hdp01-01:90

Re: Balancing a cluster when a new node is added

2010-01-10 Thread Eli Collins
Have you verified this new DNs Hadoop configuration files are the same as the others? Do you see any errors in the NN when restarting HDFS on this new node? Thanks, Eli On Sat, Jan 9, 2010 at 9:44 AM, Saptarshi Guha wrote: > Hello, > I'm using Hadoop 0.20.1. I just added a new node to a 5 node >

Re: HDFS read/write speeds, and read optimization

2010-01-10 Thread Eli Collins
> data.replication = 2 > > A bit of topic - is it safe to have such number? About a year ago I heard > only 3 way replication was fully tested, while 2 way had some issues - was > it fixed in subsequent versions? I think that's still a relatively untested configuration, though I'm not aware of any

Re: HDFS read/write speeds, and read optimization

2010-01-10 Thread Eli Collins
> I actually tested it with a simple Java test loader I quickly put together, > which ran on each machine and continuously has written random data to DFS. I > tuned the writing rate until I got ~77Mb/s - above it the iowait loads on > each disk (measured by iostat) became above 50% - 60%, which is

Re: HDFS read/write speeds, and read optimization

2010-01-02 Thread Eli Collins
Hey Stas, Can you provide more information about your workload and the environment? eg are you running t.o.a.h.h.BenchmarkThroughput, TestDFSIO, or timing hadoop fs -put/get to transfer data to hdfs from another machine, looking at metrics, etc. What else is running on the cluster? Have you profil

Re: file system

2009-12-23 Thread Eli Collins
> Could the communication between blade server and disk array be the bottleneck? Yes, depending on the number of blades, the network into the array will bottleneck because it doesn't scale with the number of data nodes.

Re: Why DrWho

2009-12-23 Thread Eli Collins
On Thu, Dec 17, 2009 at 10:13 AM, Owen O'Malley wrote: > For a while there has been a jira about removing all of the cases where we > currently fork a subprocess and replacing it with a jni library. It would be > lovely if someone did that. *smile* Just posted a patch to HADOOP-4998. Heading ou

Re: Help with fuse-dfs

2009-12-22 Thread Eli Collins
; Call to org.apache.hadoop.fs.FileSystem::exists failed! >>>>>   unique: 7, error: -2 (No such file or directory), outsize: 16 >>>>> unique: 8, opcode: LOOKUP (1), nodeid: 1, insize: 46 >>>>> LOOKUP /hdfs: >>>>> getattr /hdfs: >&

Re: Help with fuse-dfs

2009-12-20 Thread Eli Collins
> fuse_dfs TRACE - readdir / >   unique: 4, success, outsize: 200 > unique: 5, opcode: RELEASEDIR (29), nodeid: 1, insize: 64 >   unique: 5, success, outsize: 16 > > Does It seem OK? Hm, seems like it's not finding any directory entries. Mind putting a printf in dfs_readdir after hdfsListDirectori

Re: Help with fuse-dfs

2009-12-18 Thread Eli Collins
Thanks for the info. Please uncomment "//#define DOTRACE" in fuse_dfs.h, recompile, and ls /mnt/dfs again and post the trace. That will help identify the particular error that's causing the failure below. > getdents(3, 0x61fec8, 512)              = -1 EIO (Input/output error) > write(2, "ls: ", 4

Re: Help with fuse-dfs

2009-12-17 Thread Eli Collins
1 >>>>>   max_readahead=0x0002 >>>>>   max_write=0x0002 >>>>>   unique: 1, success, outsize: 40 >>>>> unique: 2, opcode: GETATTR (3), nodeid: 1, insize: 56 >>>>> getattr / >>>>> unique: 3, opcode: GET

Re: Why DrWho

2009-12-17 Thread Eli Collins
> For a while there has been a jira about removing all of the cases where we > currently fork a subprocess and replacing it with a jni library. It would be > lovely if someone did that. *smile* I wrote such a native class for the local implementation of symlinks (eg to make a jni callout for readl

Re: Help with fuse-dfs

2009-12-15 Thread Eli Collins
The "fuse-dfs didn't recognize " and "fuse-dfs ignoring option -d" are expected, they get passed along from fuse-dfs to fuse (via fuse_main). Does it work if you pass -o private? Still nothing reported by dmesg? What does jps indicate is running?What linux distribution and kernel are you using?

Re: Memory. -Xmx. error=12, chmod, JobTracker.

2009-12-15 Thread Eli Collins
See HADOOP-5059. On Fri, Dec 11, 2009 at 7:57 AM, pavel kolodin wrote: > On Fri, 11 Dec 2009 11:52:57 -, Sean Owen wrote: > > "-Xmx900" means "give the entire JVM only 900 bytes of heap space" >> which can't possibly work. >> You do not say what problem you are trying to solve here. What

Re: Namenode crashes while rolling edit log from secondary namenode

2009-12-04 Thread Eli Collins
> > Zhang > > > -----Original Message- > From: Eli Collins [mailto:e...@cloudera.com] > Sent: Friday, December 04, 2009 2:03 PM > To: common-user@hadoop.apache.org > Subject: Re: Namenode crashes while rolling edit log from secondary > namenode > > Hey Zhang, >

Re: Namenode crashes while rolling edit log from secondary namenode

2009-12-04 Thread Eli Collins
Hey Zhang, > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Fatal Error : All > storage directories are inaccessible. Are the directories specified by dfs.namenode.[name|edits].dir accessible? Perhaps they're NFS mounts that are flaking out? Thanks, Eli

Re: how to run programs present in the test folder

2009-12-04 Thread Eli Collins
Hey Siddu, You the testcase flag, eg ant -Dtestcase=TestHDFSCLI test to run TestHDFSCLI.java Thanks, Eli On Tue, Dec 1, 2009 at 10:36 AM, Siddu wrote: > Hi all , > > I am interested in the exploring the test folder . which is present in > src/test/org/apache/hadoop/hdfs/* > > Please ca

Re: building Hadoop on Karmic 64-bit

2009-11-29 Thread Eli Collins
Hey Chris, Forgot to mention the patch for HADOOP-5611 is in CDH2 and the patch for HDFS-790 will be in the next update. If you want to see how the src/c++ is built on CDH checkout the cloudera/do-release-build script. Thanks, Eli On Sun, Nov 29, 2009 at 10:21 PM, Eli Collins wrote: >

Re: building Hadoop on Karmic 64-bit

2009-11-29 Thread Eli Collins
Hey Chris, Thanks for reporting. I filed HDFS-790 and uploaded a patch (which you will need to apply after applying HADOOP-5611) and verified it compiles on karmic 64-bit. In the mean time if you just need to build libhdfs (which doesn't depend on c++-utils) you can do that with ant -Dcompile.c++=

Re: why does not hdfs read ahead ?

2009-11-24 Thread Eli Collins
Hey Martin, It would be an interesting experiment but I'm not sure it would improve things as the host (and hardware to some extent) are already reading ahead. A useful exercise would be to evaluate whether the new default host parameters for on-demand readahead are suitable for hadoop. http://lw

Re: libhdfs on hadoop 0.20.0 release

2009-10-19 Thread Eli Collins
Hey Yang Jie, The following works for me on hadoop-0.20.1 on ubuntu 9.04 (amd64) ant -Dcompile.c++=true -Dlibhdfs=true compile-c++-libhdfs You can see how libhdfs (and the rest of hadoop) is built in CDH by looking at the file cloudera/do-release-build in the source: http://archive.cloudera.com

Re: local node Quotas (for an R&D cluster)

2009-09-23 Thread Eli Collins
> These values determine how much HDFS is *not* allowed to use.  There is no > limit on how much MR can take.  This is exactly the opposite of what he and > pretty much every other admin wants.  [Negative math is fun! Or something.] Hey Allen -- is there a JIRA for this? A quick search didn't tur