Archives larger thatn 2^31 bytes in DistributedCache

2008-10-31 Thread Christian Kunz
In hadoop-0.17 we tried to use a 2.2GB archive and seemingly ran into http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6599383: java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.(ZipFile.java:114) at java.util.

Please help, don't know how to solve--java.io.IOException: WritableName can't load class

2008-10-31 Thread Mudong Lu
Hello, guys, I am very new to hadoop. I was trying to read nutch data files using a script i found on http://wiki.apache.org/nutch/Getting_Started . And after 2 days of trying, I still cannot get it to work. now the error i got is "java.lang.RuntimeException: java.io.IOException: WritableName can'

RE: "Merge of the inmemory files threw an exception" and diffs between 0.17.2 and 0.18.1

2008-10-31 Thread Deepika Khera
Wow, if the issue is fixed with version 0.20, then could we please have a patch for version 0.18? Thanks, Deepika -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: Thursday, October 30, 2008 12:19 PM To: core-user@hadoop.apache.org Subject: Re: "Merge of the inmem

Re: Status FUSE-Support of HDFS

2008-10-31 Thread Pete Wyckoff
It has come a long way since 0.18 and facebook keeps our (0.17) dfs mounted via fuse and uses that for some operations. There have recently been some problems with fuse-dfs when used in a multithreaded environment, but those have been fixed in 0.18.2 and 0.19. (do not use 0.18 or 0.18.1) The

Re: Mapper settings...

2008-10-31 Thread Owen O'Malley
On Oct 31, 2008, at 3:15 PM, Bhupesh Bansal wrote: Why do we need these setters in JobConf ?? jobConf.setMapOutputKeyClass(String.class); jobConf.setMapOutputValueClass(LongWritable.class); Just historical. The Mapper and Reducer interfaces didn't use to be generic. (Hadoop used to run on

Re: LeaseExpiredException and too many xceiver

2008-10-31 Thread Raghu Angadi
Config on most Y! clusters sets dfs.datanode.max.xcievers to a large value .. something like 1k to 2k. You could try that. Raghu. Nathan Marz wrote: Looks like the exception on the datanode got truncated a little bit. Here's the full exception: 2008-10-31 14:20:09,978 ERROR org.apache.hado

Mapper settings...

2008-10-31 Thread Bhupesh Bansal
Hey guys, Just curious, Why do we need these setters in JobConf ?? jobConf.setMapOutputKeyClass(String.class); jobConf.setMapOutputValueClass(LongWritable.class); We should be able to extract these from OutputController of Mapper class ?? IMHO, they have to be consistent with OutputCollecto

Re: To Compute or Not to Compute on Prod

2008-10-31 Thread shahab mehmandoust
Currently, I'm just researching so I'm just playing with the idea of streaming log data into the HDFS. I'm confused about: "...all you need is a Hadoop install. Your production node doesn't need to be a datanode." If my production node is *not* a dataNode then how can I do "hadoop dfs put?" I w

Re: LeaseExpiredException and too many xceiver

2008-10-31 Thread Nathan Marz
Looks like the exception on the datanode got truncated a little bit. Here's the full exception: 2008-10-31 14:20:09,978 ERROR org.apache.hadoop.dfs.DataNode: DatanodeRegistration(10.100.11.115:50010, storageID=DS-2129547091-10.100.11.115-50010-1225485937590, infoPort=50075, ipcPort=50020):D

LeaseExpiredException and too many xceiver

2008-10-31 Thread Nathan Marz
Hello, We are seeing some really bad errors on our hadoop cluster. After reformatting the whole cluster, the first job we run immediately fails with "Could not find block locations..." errrors. In the namenode logs, we see a ton of errors like: 2008-10-31 14:20:44,799 INFO org.apache.hado

RE: "Merge of the inmemory files threw an exception" and diffs between 0.17.2 and 0.18.1

2008-10-31 Thread Deepika Khera
Hi Devraj, It was pretty consistent with my comparator class in my old email(the one that uses UTF8). While trying to resolve the issue, I changed UTF8 to Text. That made it disappear for a while but then it came back again. My new Comparator class(with Text) is - public class IncrementalURLInde

Re: To Compute or Not to Compute on Prod

2008-10-31 Thread Jerome Boulon
Hi, We have deployed a new monitoring system Chukwa ( http://wiki.apache.org/hadoop/Chukwa) that is doing exactly that. Also this system provide an easy way to post-process you log file and extract useful information using M/R. /Jerome. On 10/31/08 1:46 PM, "Norbert Burger" <[EMAIL PROTECTED]> w

Re: To Compute or Not to Compute on Prod

2008-10-31 Thread Norbert Burger
What are you using to "stream logs into the HDFS"? If the command-line tools (ie., "hadoop dfs put") work for you, then all you need is a Hadoop install. Your production node doesn't need to be a datanode. On Fri, Oct 31, 2008 at 2:35 PM, shahab mehmandoust <[EMAIL PROTECTED]>wrote: > I want to

Redirecting Libhdfs output

2008-10-31 Thread Brian Bockelman
Hey all, libhdfs prints out useful information to stderr in the function errnoFromException; unfortunately, in the C application framework I use, the stderr is directed to /dev/null, making debugging miserably hard. Does anyone have any suggestions to make the errnoFromException functio

Re: To Compute or Not to Compute on Prod

2008-10-31 Thread shahab mehmandoust
Definitely speaking java Do you think I'm being paranoid about the possible load? Shahab On Fri, Oct 31, 2008 at 11:52 AM, Edward Capriolo <[EMAIL PROTECTED]>wrote: > Shahab, > > This can be done. > If you client speaks java you can connect to hadoop and write as a stream. > > If you client

Re: To Compute or Not to Compute on Prod

2008-10-31 Thread Edward Capriolo
Shahab, This can be done. If you client speaks java you can connect to hadoop and write as a stream. If you client does not have java. The thrift api will generate stubs in a variety of languages Thrift API: http://wiki.apache.org/hadoop/HDFS-APIs Shameless plug -- If you just want to stream da

To Compute or Not to Compute on Prod

2008-10-31 Thread shahab mehmandoust
I want to stream data from logs into the HDFS in production but I do NOT want my production machine to be apart of the computation cluster. The reason I want to do it in this way is to take advantage of HDFS without putting computation load on my production machine. Is this possible*?* Furthermor

Re: SecondaryNameNode on separate machine

2008-10-31 Thread Konstantin Shvachko
True, dfs.http.address is the NN Web UI address. This where the NN http server runs. Besides the Web UI there also a servlet running on that server which is used to transfer image and edits from NN to the secondary using http get. So SNN uses both addresses fs.default.name and dfs.http.address. W

Re: ApacheCon US 2008

2008-10-31 Thread Grant Ingersoll
I will also be presenting on Mahout (machine learning) on Wednesday at 3:30 (I think). It will have some Hadoop flavor in it. -Grant On Oct 31, 2008, at 1:46 PM, Owen O'Malley wrote: Just a reminder that ApacheCon US is next week in New Orleans. There will be a lot of Hadoop developers and

Re: ApacheCon US 2008

2008-10-31 Thread Lukáš Vlček
Hi, Hope somebody will record at least fraction of these talks and put them on the web as soon as possible.Lukas On Fri, Oct 31, 2008 at 6:46 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > Just a reminder that ApacheCon US is next week in New Orleans. There will > be a lot of Hadoop developers an

RE: ApacheCon US 2008

2008-10-31 Thread Ashish Thusoo
Owen, Just wanted to mention that there is a talk on Hive as well on Friday 9:30AM... Ashish -Original Message- From: Owen O'Malley [mailto:[EMAIL PROTECTED] Sent: Friday, October 31, 2008 10:47 AM To: [EMAIL PROTECTED] Cc: core-user@hadoop.apache.org Subject: ApacheCon US 2008 Just a r

ApacheCon US 2008

2008-10-31 Thread Owen O'Malley
Just a reminder that ApacheCon US is next week in New Orleans. There will be a lot of Hadoop developers and talks. (I'm CC'ing core-user because it has the widest coverage. Please join the low traffic [EMAIL PROTECTED] list for cross sub-project announcements.) * Hadoop Camp with lots o

Re: SecondaryNameNode on separate machine

2008-10-31 Thread Doug Cutting
Otis Gospodnetic wrote: Konstantin & Co, please correct me if I'm wrong, but looking at hadoop-default.xml makes me think that dfs.http.address is only the URL for the NN *Web UI*. In other words, this is where we people go look at the NN. The secondary NN must then be using only the Primary

Status FUSE-Support of HDFS

2008-10-31 Thread Robert Krüger
Hi, could anyone tell me what the current Status of FUSE support for HDFS is? Is this something that can be expected to be usable in a few weeks/months in a production environment? We have been really happy/successful with HDFS in our production system. However, some software we use in our applic

Re: [core-user] Help deflating output files

2008-10-31 Thread Martin Davidsson
You can override this property by passing in -jobconf mapred.output.compress=false to the hadoop binary, e.g. hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-0.18.0-streaming.jar -input "/user/root/input" -mapper 'cat' -reducer 'wc -l' -output "/user/root/output" -jobconf mapred.job.name="Exp

RE: Browse HDFS file in URL

2008-10-31 Thread Malcolm Matalka
The Hadoop file API allows you to open a file based on URL Path file = new Path("hdfs://hadoop00:54313/user/hadoop/conflated.20081016/part-9"); JobConf job = new JobConf(new Configuration(), ReadFileHadoop.class); job.setJobName("test"); FileSyste

Re: hostname in logs

2008-10-31 Thread Steve Loughran
Alex Loddengaard wrote: Thanks, Steve. I'll look in to this patch. As a temporary solution I use a log4j variable to manually set a "hostname" private field in the Appender. This solution is rather annoying, but it'll work fro now. Thanks again. what about having the task tracker pass down a

Browse HDFS file in URL

2008-10-31 Thread Neal Lee (RDLV)
Hi All, I'm wondering that can I browse a HDFS file in URL (ex. http://host/test.jpeg) so that I can show this file on my webapp directly. Thanks, Neal This correspondence is from Cyberlink Corp. and is intended only for use by the recipient named herein, and may contain privileged, pro

Re: hostname in logs

2008-10-31 Thread Alex Loddengaard
Thanks, Steve. I'll look in to this patch. As a temporary solution I use a log4j variable to manually set a "hostname" private field in the Appender. This solution is rather annoying, but it'll work fro now. Thanks again. Alex On Fri, Oct 31, 2008 at 3:58 AM, Steve Loughran <[EMAIL PROTECTED]>

Re: hostname in logs

2008-10-31 Thread Steve Loughran
Alex Loddengaard wrote: I'd like my log messages to display the hostname of the node that they were outputted on. Sure, this information can be grabbed from the log filename, but I would like each log message to also have the hostname. I don't think log4j provides support to include the hostnam

Re: TaskTrackers disengaging from JobTracker

2008-10-31 Thread Aaron Kimball
To complete the picture: not only was our network swamped, I realized tonight that the NameNode/JobTracker was running on a 99% full disk (it hit 100% full about thirty minutes ago). That poor JobTracker was fighting against a lot of odds. As soon as we upgrade to a bigger disk and switch it back o