Problem to create sequence file for

2009-10-27 Thread bhushan_mahale
Hi, I have written a code to create sequence files for given text files. The program takes following input parameters: 1. Local source directory - contains all the input text files 2. Destination HDFS URI - location on hdfs where sequence file will be copied The key for a sequence-record is

Mount WebDav in Linux for HDFS-0.20.1

2009-10-27 Thread Zhang Bingjun (Eddy)
Dear Huy Phan and others, Thanks a lot for your efforts in customizing the WebDav serverand make it work for Hadoop-0.20.1. After setting up the WebDav server, I could access it using Cadaver client in Ubuntu without using any username password. Operati

Re: Mount WebDav in Linux for HDFS-0.20.1

2009-10-27 Thread Huy Phan
Hi Zhang, Here is the patch for davfs2 to solve "server does not support WebDAV" issue: diff --git a/src/webdav.c b/src/webdav.c index 8ec7a2d..4bdaece 100644 --- a/src/webdav.c +++ b/src/webdav.c @@ -472,7 +472,7 @@ dav_init_connection(const char *path) if (!ret) { initialized =

Re: Mount WebDav in Linux for HDFS-0.20.1

2009-10-27 Thread Zhang Bingjun (Eddy)
Dear Huy Phan, Thanks for your quick reply. I was using fuse-dfs before. But I found serious memory leak with fuse-dfs about 10MB leakage per 10k file read/write. When the occupied memory size reached about 150MB, the read/write performance dropped dramatically. Did you encounter these problems?

Re: Mount WebDav in Linux for HDFS-0.20.1

2009-10-27 Thread Huy Phan
Hi Zhang, I didn't play much with fuse-dfs, in my opinion, memory leak is something solvable and I can see Apache had made some fixes for this issue on libhdfs. If you encounter these problems with older version of Hadoop, I think you should give a try on the latest stable version. Since I didn

Re: Mount WebDav in Linux for HDFS-0.20.1

2009-10-27 Thread Zhang Bingjun (Eddy)
Dear Huy Phan, I downloaded davfs2-1.4.3 and in this version the patch you sent me seems to be applied already. I compiled and installed this version. However, the error message is still around like below... had...@hdfs2:/mnt$ sudo mount.davfs http://192.168.0.131:9800 hdfs-webdav/ Please enter t

Re: Using Configuration instead of JobConf

2009-10-27 Thread Oliver B. Fischer
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Oliver B. Fischer schrieb: > Hi, > > according to the API-Dokumentation of 0.20.1 JobConf is deprecated and > we should use Configuration instead. However all examples on the webpage > still referece JobConf. > > Is there a good example for replacin

Re: Using Configuration instead of JobConf

2009-10-27 Thread tim robertson
The org.apache.hadoop.examples.SecondarySort in 0.20.1 is an example using the org.apache.hadoop.conf.Configuration. Cheers, Tim On Tue, Oct 27, 2009 at 2:42 PM, Oliver B. Fischer wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Oliver B. Fischer schrieb: >> Hi, >> >> according to th

Re: Problem to create sequence file for

2009-10-27 Thread Jason Venner
How large is the string that is being written? Does it contain the entire contents of your file? You may simple need to increase the heap size with your jvm. On Tue, Oct 27, 2009 at 3:43 AM, bhushan_mahale < bhushan_mah...@persistent.co.in> wrote: > Hi, > > I have written a code to create sequen

RE: Problem to create sequence file for

2009-10-27 Thread bhushan_mahale
Hi Jason, Thanks for the reply. The string is the entire content of the input text file. It could as long as ~300MB. I tried increasing jvm heap but unfortunately it was giving same error. Other option I am thinking is to split input files first. - Bhushan -Original Message- From: Jason

Re: Streaming ignoring stderr output

2009-10-27 Thread Jason Venner
Most likely one gets buffered when the file descriptor is a pipe and the other is at most line buffered as it is when the code is run by the streaming mapper tsak. On Mon, Oct 26, 2009 at 11:06 AM, Ryan Rosario wrote: > Thanks. I think that I may have tripped on some sort of bug. > Unfortunately,

Re: Secondary NameNodes or NFS exports?

2009-10-27 Thread Jason Venner
We have been having some trouble with the secondary on a cluster that has one edit log partition on an nfs server, with the namenode rejecting the merged images due to timestamp missmatches. On Mon, Oct 26, 2009 at 10:14 AM, Stas Oskin wrote: > Hi. > > Thanks for the advice, it seems that the i

Re: Problem to create sequence file for

2009-10-27 Thread Jason Venner
If your string is up to 300MB you will need probably 1.3+gig to write it 1 copy in the string 600MB if your file is all ascii (strings store as shorts) 1 copy in the byte array as utf8 1 to x3 expansion, say 600MB 1 copy in the on the wire format, say 700MB possibly 1 copy in a transit buffer on th

Re: Problem to create sequence file for

2009-10-27 Thread Jean-Eric CAZAMEA
Il y a 2 lettres inversées dans le nom Jean-Eric From: Jason Venner To: common-user@hadoop.apache.org Sent: Tue, October 27, 2009 3:54:00 PM Subject: Re: Problem to create sequence file for If your string is up to 300MB you will need probably 1.3+gig

Re: Problem to create sequence file for

2009-10-27 Thread Amogh Vasekar
Hi Bhushan, If splitting input files is an option, why don't you let hadoop do this for you? If need be you may use a custom input format and sequencefile*outputformat. Amogh On 10/27/09 7:55 PM, "bhushan_mahale" wrote: Hi Jason, Thanks for the reply. The string is the entire content of the

Re: Secondary NameNodes or NFS exports?

2009-10-27 Thread Stas Oskin
Hi. You mean, you couldn't recover the NameNode from checkpoints because of timestamps? Regards. On Tue, Oct 27, 2009 at 4:49 PM, Jason Venner wrote: > We have been having some trouble with the secondary on a cluster that has > one edit log partition on an nfs server, with the namenode rejectin

RE: Hadoop User Group (Bay Area) - Nov 18th at Yahoo!

2009-10-27 Thread Dekel Tankel
Hi all, Thank you for those who attended the meeting last week, I hope you found it interesting and helpful. Special thanks to all the presenters! Presentations are available here: http://developer.yahoo.net/blogs/hadoop/2009/10/hadoop_user_group_hug_oct_21st.html RSVP is now open for the next

RE: Task process exit with nonzero status of 1

2009-10-27 Thread Marc Limotte
Just an FYI, found the solution to this problem. Apparently, it's an OS limit on the number of sub-directories that can be created in another directory. In this case, we had 31998 sub-directories under hadoop/userlogs/, so any new tasks would fail in Job Setup. >From the unix command line, mkd

Re: Seattle / NW Hadoop, Lucene, Apache "Cloud Stack" Meetup, Wed Oct 28 6:45pm

2009-10-27 Thread Bradford Stephens
Hey guys! Don't forget this is tomorrow (Wednesday). See you there! Cheers, Bradford On Sun, Oct 18, 2009 at 5:10 PM, Bradford Stephens wrote: > Greetings, > > (You're receiving this e-mail because you're on a DL or I think you'd > be interested) > > It's time for another Hadoop/Lucene/Apache "C

Re: Mount WebDav in Linux for HDFS-0.20.1

2009-10-27 Thread Huy Phan
Hi Zhang, I applied my patch to davfs2-1.4.0 and it's working fine with Hadoop 0.20.1. If you didn't define any access restriction in account.properties file, you can ignore the authentication when mounting davfs2. Best, Huy Phan Zhang Bingjun (Eddy) wrote: Dear Huy Phan, I downloaded davfs

How to give consecutive numbers to output records?

2009-10-27 Thread Mark Kerzner
Hi, I need to number all output records consecutively, like, 1,2,3... This is no problem with one reducer, making recordId an instance variable in the Reducer class, and setting conf.setNumReduceTasks(1) However, it is an architectural decision forced by processing need, where the reducer become

Re: How to give consecutive numbers to output records?

2009-10-27 Thread Aaron Kimball
There is no in-MapReduce mechanism for cross-task synchronization. You'll need to use something like Zookeeper for this, or another external database. Note that this will greatly complicate your life. If I were you, I'd try to either redesign my pipeline elsewhere to eliminate this need, or maybe

Re: How to give consecutive numbers to output records?

2009-10-27 Thread Mark Kerzner
Aaron, although your notes are not a ready solution, but they are a great help. Thank you, Mark On Tue, Oct 27, 2009 at 11:27 PM, Aaron Kimball wrote: > There is no in-MapReduce mechanism for cross-task synchronization. You'll > need to use something like Zookeeper for this, or another external

Re: Mount WebDav in Linux for HDFS-0.20.1

2009-10-27 Thread Zhang Bingjun (Eddy)
Dear Huy Phan, Thanks a lot! It seems like the diff in the patch you sent me should be the other way around, which is like the following: diff --git b/src/webdav.c a/src/webdav.c index 8ec7a2d..4bdaece 100644 --- b/src/webdav.c +++ a/src/webdav.c @@ -472,7 +472,7 @@ dav_init_connection(const cha