Re: Serving contents of large MapFiles/SequenceFiles from memory across many machines

2008-09-18 Thread Miles Osborne
hello Chris! (if you are talking about serving language models and/or phrase tables) i had a student look at using HBase for LMs this summer. i don't think it is sufficiently quick to deal with millions of queries per second, but that may be due to blunders on our part. it may be possible that

RE: Need help in hdfs configuration fully distributed way in Mac OSX...

2008-09-18 Thread souravm
Hi Mafish, Thanks for your suggestions. Finally I could resolve the issue. The *site.xml in namenode had ds.default.name as localhost where as in data nodes it were the actual ip. I changed the local host to actual ip in name node and it started working. Regards, Sourav -Original

Hadoop tracing

2008-09-18 Thread Naama Kraus
Hi, I am looking for information in the area of Hadoop tracing, instrumentation, benchmarking and so forth. What utilities exist ? What's their maturity? Where can I get more info about them ? I am curious about statistics on Hadoop behavior (per a typical workload ? different workloads ?). I am

Re: scp to namenode faster than dfs put?

2008-09-18 Thread Steve Loughran
[EMAIL PROTECTED] wrote: thanks for the replies. So looks like replication might be the real overhead when compared to scp. Makes sense, but there's no reason why you couldn't have first node you copy up the data to, continue and pass that data to the other nodes. If its in the same rack,

Re: scp to namenode faster than dfs put?

2008-09-18 Thread Prasad Pingali
On Thursday 18 September 2008 04:12:13 pm Steve Loughran wrote: [EMAIL PROTECTED] wrote: thanks for the replies. So looks like replication might be the real overhead when compared to scp. Makes sense, but there's no reason why you couldn't have first node you copy up the data to, continue

Re: custom writable class

2008-09-18 Thread Shengkai Zhu
Your custom implementation of any interface from hadoop-core should be archived together with the application (i.e. in the same jar). Andt he jar will be added to CLASSPATH of the task runner, then your customwritable.java could be found. On Thu, Sep 18, 2008 at 8:09 PM, Deepak Diwakar [EMAIL

Re: custom writable class

2008-09-18 Thread Shengkai Zhu
You can refer to the Hadoop Map-Reduce Tutorial On Thu, Sep 18, 2008 at 8:40 PM, Shengkai Zhu [EMAIL PROTECTED] wrote: Your custom implementation of any interface from hadoop-core should be archived together with the application (i.e. in the same jar). Andt he jar will be added to CLASSPATH

Re: custom writable class

2008-09-18 Thread chanel
Where can you find the Hadoop Map-Reduce Tutorial? Shengkai Zhu wrote: You can refer to the Hadoop Map-Reduce Tutorial On Thu, Sep 18, 2008 at 8:40 PM, Shengkai Zhu [EMAIL PROTECTED] wrote: Your custom implementation of any interface from hadoop-core should be archived together with the

Re: scp to namenode faster than dfs put?

2008-09-18 Thread James Moore
Isn't one of the features of replication a guarantee that when my write finishes, I know there are N replicas written? Seems like if you want the quicker behavior, you write with replication set to 1 for that file, then change the replication count when you're finished. -- James Moore | [EMAIL

Re: scp to namenode faster than dfs put?

2008-09-18 Thread Raghu Angadi
James Moore wrote: Isn't one of the features of replication a guarantee that when my write finishes, I know there are N replicas written? This is what happens normally, but it is not a guarantee. When there are errors, data might be written to fewer replicas. Raghu. Seems like if you want

[ANNOUNCE] Hadoop release 0.18.1 available

2008-09-18 Thread Nigel Daley
Release 0.18.1 fixes 9 critical bugs in 0.18.0. For Hadoop release details and downloads, visit: http://hadoop.apache.org/core/releases.html Hadoop 0.18.1 Release Notes are at http://hadoop.apache.org/core/docs/r0.18.1/releasenotes.html Thanks to all who contributed to this release! Nigel

slow copy makes reduce hang

2008-09-18 Thread Rong-en Fan
Hi, I'm using 0.17.2.1 and see a reduce hang in shuffle phase due to a unresponsive node. From the reduce log (sorry that I didn't keep it around), it stuck in copying map output from a dead node (I can not ssh to that one). At that point, all maps are already finished. I'm wondering why this

Example code for map-side join

2008-09-18 Thread Stuart Sierra
Hello all, Does anyone have some working example code for doing a map-side (inner) join? The documentation at http://tinyurl.com/43j5pp is less than enlightening... Thanks, -Stuart

Re: streaming question

2008-09-18 Thread Karl Anderson
On 16-Sep-08, at 1:25 AM, Christian Ulrik Søttrup wrote: Ok i've tried what you suggested and all sorts of combinations with no luck. Then I went through the source of the Streaming lib. It looks like it checks for the existence of the combiner while it is building the jobconf i.e. before

Re: custom writable class

2008-09-18 Thread Shengkai Zhu
Here is the link http://hadoop.apache.org/core/docs/current/mapred_tutorial.html On Thu, Sep 18, 2008 at 9:16 PM, chanel [EMAIL PROTECTED] wrote: Where can you find the Hadoop Map-Reduce Tutorial? Shengkai Zhu wrote: You can refer to the Hadoop Map-Reduce Tutorial On Thu, Sep 18, 2008 at

Re: slow copy makes reduce hang

2008-09-18 Thread Rong-en Fan
Reply to myself. I'm using streaming and the task timeout was set to 0, so that's why. On Fri, Sep 19, 2008 at 3:34 AM, Rong-en Fan [EMAIL PROTECTED] wrote: Hi, I'm using 0.17.2.1 and see a reduce hang in shuffle phase due to a unresponsive node. From the reduce log (sorry that I didn't keep

Data corruption when using Lzo Codec

2008-09-18 Thread Alex Feinberg
Hello, I am running a custom crawler (written internally) using hadoop streaming. I am attempting to compress the output using LZO, but instead I am receiving corrupted output that is neither in the format I am aiming for nor as a compressed lzo file. Is this a known issue? Is there anything I am

Re: slow copy makes reduce hang

2008-09-18 Thread Rong-en Fan
this time, I set task timeout to 10m via -jobconf mapred.task.timeout=60 However, I still see this hang at shuffle stage, and lots of messages below appear in the log 2008-09-19 12:34:02,289 INFO org.apache.hadoop.mapred.ReduceTask: task_200809190308_0007_r_01_1 Need 6 map output(s)

how to get the filenames stored in dfs as the key

2008-09-18 Thread komagal meenakshi
hi everybody. can anyone plase help me how to get the input filename in dfs as the key in the output? example: [ filenames , value] - Unlimited freedom, unlimited storage. Get it now

Re: OutOfMemory Error

2008-09-18 Thread Edward J. Yoon
The key is of the form ID :DenseVector Representation in mahout with I guess vector size seems too large so it'll need a distributed vector architecture (or 2d partitioning strategies) for large scale matrix operations. The hama team investigate these problem areas. So, it will be improved If

Re: scp to namenode faster than dfs put?

2008-09-18 Thread Prasad Pingali
Even if writes are happening in parallel from a single machine, wouldn't the network congestion cause slow down due to packet collision? - Prasad. On Thursday 18 September 2008 10:47:48 pm Raghu Angadi wrote: Steve Loughran wrote: [EMAIL PROTECTED] wrote: thanks for the replies. So looks

RE: OutOfMemory Error

2008-09-18 Thread Palleti, Pallavi
Yeah. That was the problem. And Hama can be surely useful for large scale matrix operations. But for this problem, I have modified the code to just pass the ID information and read the vector information only when it is needed. In this case, it was needed only in the reducer phase. This way,