Shared thread safe variables?

2008-12-24 Thread Jim Twensky
Hello, I was wondering if Hadoop provides thread safe shared variables that can be accessed from individual mappers/reducers along with a proper locking mechanism. To clarify things, let's say that in the word count example, I want to know the word that has the highest frequency and how many

Re: Shared thread safe variables?

2008-12-24 Thread Aaron Kimball
Hi Jim, The ability to perform locking of shared mutable state is a distinct anti-goal of the MapReduce paradigm. One of the major benefits of writing MapReduce programs is knowing that you don't have to worry about deadlock in your code. If mappers could lock objects, then the failure and

Re: How to coordinate nodes of different computing powers in a same cluster?

2008-12-24 Thread Aaron Kimball
Jeremy, A clarification: there is currently no mechanism in Hadoop to slot particular tasks on particular nodes. Hadoop does not take into account a particular node's suitability for a given task; if one node has more CPU, and another node has more IO, you cannot indicate that certain tasks

Re: How to coordinate nodes of different computing powers in a same cluster?

2008-12-24 Thread Devaraj Das
On 12/24/08 3:20 PM, Aaron Kimball aa...@cloudera.com wrote: Jeremy, A clarification: there is currently no mechanism in Hadoop to slot particular tasks on particular nodes. Hadoop does not take into account a particular node's suitability for a given task; if one node has more CPU, and

Re: Architecture question.

2008-12-24 Thread Steve Loughran
aakash shah wrote: We can assume that this record has only one key-value mapping. Value will be updated every minute. Currently we have 1 Million these ( key-value ) pairs but I have to make sure that we can scale it upto 10 million of these ( key- value ) pairs. Every 10 minute I will be

Re: Architecture question.

2008-12-24 Thread tim robertson
I would also consider a DB for this... 10M and 2 columns is not a lot of data so I would look to have it in memory with some DB index or memory hash for querying. (We are keeping the indexes of tables with 150M records, 30M and 10M and joining between them with around 25 indexes on the 150M table

How to run projects via eclipse plugin?

2008-12-24 Thread Raşit Özdaş
Hi, After struggling a few days with eclipse plugin (v. 0.19.0), I finally succeeded with DFS tree, but I still can't run projects via eclipse. There is a diff file, which Tvrtko Bedekovic has suggested on following page:

Re: issues with hadoop in AIX

2008-12-24 Thread ps40
Hi, I saw that a fix was created for this issue. Were you able to run hadoop on AIX after this? We are in a similar situation and are wondering if hadoop will work on AIX and Solaris. Thanks Arun Venugopal-2 wrote: Hi, I am evaluating Hadoop's Portability Across Heterogeneous Hardware

Re: Shared thread safe variables?

2008-12-24 Thread Jim Twensky
Hi Aaron, Thanks for the advice. I actually thought of using multiple combiners and a single reducer but I was worried about the key sorting phase to be a vaste for my purpose. If the input is just a bunch of (word,count) pairs which is in the order of TeraBytes, wouldn't sorting be an overkill?

Re: issues with hadoop in AIX

2008-12-24 Thread Brian Bockelman
Hey, I can attest that Hadoop works on Solaris 10 just fine. Brian On Dec 24, 2008, at 10:26 AM, ps40 wrote: Hi, I saw that a fix was created for this issue. Were you able to run hadoop on AIX after this? We are in a similar situation and are wondering if hadoop will work on AIX and

Having trouble accessing MapFiles in the DistributedCache

2008-12-24 Thread Sean Shanny
To all, Version: hadoop-0.17.2.1-core.jar I created a MapFile on a local node. I put the files into the HDFS using the following commands: $ bin/hadoop fs -copyFromLocal /tmp/ur/data/2008-12-19/url/data $ bin/hadoop fs -copyFromLocal /tmp/ur/index /2008-12-19/url/index and placed them