subject:"Shared thread safe variables\?"

Re: Shared thread safe variables?

2009-01-01 Thread Jim Twensky

Aaron, I actually do something different than word count. I count all possible phrases for every sentence in my corpus. So for instance, if I have a sentence like Hello world, my mappers emit: Hello 1 World 1 Hello World 1 As you can easily realize, for longer sentences the number of

Re: Shared thread safe variables?

2008-12-31 Thread Aaron Kimball

Hmm. Check your math on the data set size. Your input corpus may be a few (dozen, hundred) TB, but how many distinct words are there? The output data set should be at least a thousand times smaller. If you've got the hardware to do that initial word count step on a few TB of data, the second pass

Re: Shared thread safe variables?

2008-12-25 Thread Jim Twensky

Hello again, I think I found an answer to my question. If I write a new WritableComparable object that extends IntWritable and then overwrite the compareTo method, I can change the sorting order from ascending to descending. That will solve my problem for getting the top 100 most frequent words

Shared thread safe variables?

2008-12-24 Thread Jim Twensky

Hello, I was wondering if Hadoop provides thread safe shared variables that can be accessed from individual mappers/reducers along with a proper locking mechanism. To clarify things, let's say that in the word count example, I want to know the word that has the highest frequency and how many

Re: Shared thread safe variables?

2008-12-24 Thread Aaron Kimball

Hi Jim, The ability to perform locking of shared mutable state is a distinct anti-goal of the MapReduce paradigm. One of the major benefits of writing MapReduce programs is knowing that you don't have to worry about deadlock in your code. If mappers could lock objects, then the failure and

Re: Shared thread safe variables?

2008-12-24 Thread Jim Twensky

Hi Aaron, Thanks for the advice. I actually thought of using multiple combiners and a single reducer but I was worried about the key sorting phase to be a vaste for my purpose. If the input is just a bunch of (word,count) pairs which is in the order of TeraBytes, wouldn't sorting be an overkill?

Re: Shared thread safe variables?

Re: Shared thread safe variables?

Re: Shared thread safe variables?

Shared thread safe variables?

Re: Shared thread safe variables?

Re: Shared thread safe variables?

6 matches

Site Navigation

Mail list logo

Footer information