Re: How to debug org.apache.hadoop.streaming.TestMultipleCachefiles.testMultipleCachefiles

2008-09-06 Thread Abdul Qadeer
I am running a single node Hadoop. If I try to debug org.apache.hadoop.streaming.TestMultipleCachefiles.testMultipleCachefiles, the following exception tells that I haven't put webapps on the classpath. I have in fact put src/webapps on classpath. So I was wondering what is wrong. 2008-09-06 20:

Re: Hadoop custom readers and writers

2008-09-06 Thread Dennis Kubes
We did something similar with the ARC format where is record (webpage) is gzipped and then appended. It is not exactly the same but it may help. Take a look at the following classes, they are in the Nutch trunk: org.apache.nutch.tools.arc.ArcInputFormat org.apache.nutch.tools.arc.ArcRecordRea

Hadoop Streaming and Multiline Input

2008-09-06 Thread Dennis Kubes
Is is possible to set a multiline text input in streaming to be used as a single record? For example say I wanted to scan a webpage for a specific regex that is multiline, is this possible in streaming? Dennis

Re: How do specify certain IP to be used by datanode/namenode

2008-09-06 Thread Dennis Kubes
In your hadoop-site.xml file in config set the following variable: dfs.datanode.dns.interface default The name of the Network Interface from which a data node should report its IP address. Change default to the name of your interface (network card), usually eth0, eth1, etc. D

Re: How to debug org.apache.hadoop.streaming.TestMultipleCachefiles.testMultipleCachefiles

2008-09-06 Thread 叶双明
Any detail error message ? 2008/9/5, Abdul Qadeer <[EMAIL PROTECTED]>: > > I want to debug the test case > org.apache.hadoop.streaming.TestMultipleCachefiles.testMultipleCachefiles< > http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3151/testReport/org.apache.hadoop.streaming/TestMultipleCac

Re: HDFS read/write programmatically

2008-09-06 Thread Chris Douglas
Paths are URIs. Without the authority explicitly specified in the path or without an overriding definition in hadoop-site.xml, fs.default.name will be "file:///" from hadoop-default.xml (which should be why you're writing to local disk instead of HDFS). If you're running on a single node, f

Re: How do specify certain IP to be used by datanode/namenode

2008-09-06 Thread Jean-Daniel Cryans
Kevin, I think specifying datanode.dns.interface alone for dfs and mapred is enough (not sure). You only have to set it to eth0 or eth1, etc J-D On Sat, Sep 6, 2008 at 7:18 PM, Kevin <[EMAIL PROTECTED]> wrote: > Hi J-D, > > I could not try it right now as I am not familiar with setting up DNS >

HDFS read/write programmatically

2008-09-06 Thread Wasim Bari
Hi, I have configured HDFS on windows and running it using Cygwin. I am interested to access programmatically the files and folders in HDFS. ( mean I can read/write files in HDFS using Java code). I used this example http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample . Code is running f

Re: How do specify certain IP to be used by datanode/namenode

2008-09-06 Thread Kevin
Hi J-D, I could not try it right now as I am not familiar with setting up DNS server (I assume the parameters you mentioned are those specifying DNS server). It actually becomes more interesting as why specifying the IP does not suffice? Do you mean that hadoop will decide the right IP of a node b

Re: OutOfMemoryError with map jobs

2008-09-06 Thread Chris Douglas
From the stack trace you provided, your OOM is probably due to HADOOP-3931, which is fixed in 0.17.2. It occurs when the deserialized key in an outputted record exactly fills the serialization buffer that collects map outputs, causing an allocation as large as the size of that buffer. It ca

Re: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data

2008-09-06 Thread Chris Douglas
FWIW: HADOOP-3940 is merged into the 0.18 branch and should be part of 0.18.1. -C On Sep 4, 2008, at 6:33 AM, Devaraj Das wrote: I started a profile of the reduce-task. I've attached the profiling output. It seems from the samples that ramManager.waitForDataToMerge() doesn't actually wait

Hadoop custom readers and writers

2008-09-06 Thread Amit Simgh
Hi, I have thousands of webpages each represented as serialized tree object compressed (ZLIB) together (file size varying from 2.5 GB to 4.5GB). I have to do some heavy text processing on these pages. What the the best way to read /access these pages. Method1 *** 1) Write Custom

Hadoop custom readers and writers

2008-09-06 Thread Amit Simgh
Hi, I have thousands of webpages each represented as serialized tree object compressed (ZLIB) together (file size varying from 2.5 GB to 4.5GB). I have to do some heavy text processing on these pages. What the the best way to read /access these pages. Method1 *** 1) Write Custom

Re: Multiple output files

2008-09-06 Thread Owen O'Malley
On Sep 6, 2008, at 9:35 AM, Ryan LeCompte wrote: I have a question regarding multiple output files that get produced as a result of using multiple reduce tasks for a job (as opposed to only one). If I'm using a custom writable and thus writing to a sequence output, am I gauranteed that all of t

Re: Multiple output files

2008-09-06 Thread Ryan LeCompte
This clears up my concerns. Thanks! Ryan On Sep 6, 2008, at 2:17 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote: On Sep 6, 2008, at 9:35 AM, Ryan LeCompte wrote: I have a question regarding multiple output files that get produced as a result of using multiple reduce tasks for a job (as opp

Re: Multiple input files

2008-09-06 Thread Owen O'Malley
You can give a comma separated list of files and directories to the FileInputFormats, such as TextInputFormat. Directories are expanded one level, so dir1 becomes dir1/*, but not dir1/*/*. -- Owen

RE: OutOfMemoryError with map jobs

2008-09-06 Thread Leon Mergen
Hello, > I'm currently developing a map/reduce program that emits a fair amount > of maps per input record (around 50 - 100), and I'm getting OutOfMemory > errors: Sorry for the noise, I found out I had to set the mapred.child.java.opts JobConf parameter to "-Xmx512m" to make 512MB of heap space

Multiple output files

2008-09-06 Thread Ryan LeCompte
Hello, I have a question regarding multiple output files that get produced as a result of using multiple reduce tasks for a job (as opposed to only one). If I'm using a custom writable and thus writing to a sequence output, am I gauranteed that all of the day for a particular key will appear in a

OutOfMemoryError with map jobs

2008-09-06 Thread Leon Mergen
Hello, I'm currently developing a map/reduce program that emits a fair amount of maps per input record (around 50 - 100), and I'm getting OutOfMemory errors: 2008-09-06 15:28:08,993 ERROR org.apache.hadoop.mapred.pipes.BinaryProtocol: java.lang.OutOfMemoryError: Java heap space at org.

Re: is SecondaryNameNode in support for the NameNode?

2008-09-06 Thread Jean-Daniel Cryans
Yes, I agree that that page is confusing. There was a thread named "Confusing NameNodeFailover page in Hadoop Wiki" in August and some stuff was done (like the failover page was removed) but my guess is that there is still work to do. Since this is an open wiki, anyone can edit it (smile). J-D O

Re: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data

2008-09-06 Thread Devaraj Das
These exceptions are apparently coming from the dfs side of things. Could someone from the dfs side please look at these? On 9/5/08 3:04 PM, "Espen Amble Kolstad" <[EMAIL PROTECTED]> wrote: > Hi, > > Thanks! > The patch applies without change to hadoop-0.18.0, and should be > included in a 0.18

can retrive FS metadata from slaves?

2008-09-06 Thread 叶双明
I know The NameNode is a Single Point of Failure for the HDFS Cluster, When the meta in NameNode goes down, all data in filesystem is destroy. Without consideration of any backup of metadata. question: is it valuable to implement retrive metadata from the block report from the slaves? just someth

Re: is SecondaryNameNode in support for the NameNode?

2008-09-06 Thread 叶双明
Actually, I hava readed about: The term "secondary name-node" is somewhat misleading. It is not a name-node in the sense that data-nodes cannot connect to the secondary name-node, and in no event it can replace the primary name-node in case of its failure. But today, I read another article in the

Re: Multiple input files

2008-09-06 Thread Ryan LeCompte
Hi Sayali, Yes, you can submit a collection of files from HDFS as input to the job. Please take a look at the WordCount example in the Map/Reduce tutorial for an example: http://hadoop.apache.org/core/docs/r0.18.0/mapred_tutorial.html#Example%3A+WordCount+v1.0 Ryan On Sat, Sep 6, 2008 at 9:03

Multiple input files

2008-09-06 Thread Sayali Kulkarni
Hello, When starting a hadoop job, I need to specify an input file and an output file. Can I instead specify a list of input files? example, I have the input distributed in the files: file000, file001, file002, file003, ... So I can I specify input files as file*. I can add all my files to HDFS.

Re: is SecondaryNameNode in support for the NameNode?

2008-09-06 Thread Jean-Daniel Cryans
Hi, See http://wiki.apache.org/hadoop/FAQ#7 and http://hadoop.apache.org/core/docs/r0.17.2/hdfs_user_guide.html#Secondary+Namenode Regards, J-D On Sat, Sep 6, 2008 at 5:26 AM, 叶双明 <[EMAIL PROTECTED]> wrote: > Hi all! > > The NameNode is a Single Point of Failure for the HDFS Cluster. There > i

is SecondaryNameNode in support for the NameNode?

2008-09-06 Thread 叶双明
Hi all! The NameNode is a Single Point of Failure for the HDFS Cluster. There is support for NameNodeFailover, with a SecondaryNameNode hosted on a separate machine being able to stand in for the original NameNode if it goes down. Is it right? is SecondaryNameNode in support for the NameNode? S

Re: help! how can i control special data to specific datanode?

2008-09-06 Thread 叶双明
It is enough for you to know whice Dirctory in the HDFS contain your index datat rather than whice datanode. 2008/9/6, Jean-Daniel Cryans <[EMAIL PROTECTED]>: > > Hi, > > I suggest that you read how data is stored in HDFS, see > http://hadoop.apache.org/core/docs/r0.18.0/hdfs_design.html > > J-D