Re: How do I generate a histogram?

2011-05-09 Thread Hadooper
i agree. 1. read each line. 2. tokenize string. grab 'key' and form pair. (...not the first token, since you don't care for it). 3. the result is each line. On Tue, May 10, 2011 at 4:22 AM, Soren Flexner wrote: > It's word count. The mapper takes f(value) and outputs that as the key, > with 1

Re: How do I generate a histogram?

2011-05-09 Thread Alex Kozlov
It looks like the TotalOrderPartitioner will still work, but you'll need to provide a custom Sampler. If you know the distribution is advance, then you even don't need to sample: just write the partitio

Re: How do I generate a histogram?

2011-05-09 Thread W.P. McNeill
In general the approximate distribution may not be known in advance, though for the problem I'm currently working on it is. If it is known in advance, I can just create binning in a custom partitioner, right? I was looking at the TotalOrderPartitioner and sampling documentation, but I wasn't sure

Re: How do I generate a histogram?

2011-05-09 Thread Alex Kozlov
Is the approx. distribution of f(value) known in advance? Or you can sample and use TotalOrderPartitioner . On Mon, May 9, 2011 at 2:11 PM, W.P. McNeill wrote: > Oops, I forgot to descr

Where should combiner processes write temporary sequence files?

2011-05-09 Thread W.P. McNeill
I have a reducer that creates a temporary sequence file. I am generating this file by calling SequenceFileOutputFormat.getPathForWorkFile() and passing the result into SequenceFile.createWriter(). Now I also want to use the same reducer code as a combiner. SequenceFileOutputFormat.getPathForWorkF

Re: How do I generate a histogram?

2011-05-09 Thread W.P. McNeill
Oops, I forgot to describe the full extent of what I'm trying to do. Obviously a histogram is just a word count. However, I'm also trying to generate the histogram in order by bins. I want to be able to use more than one reducer, so I'll need to do a total ordering. But I wasn't sure of how to w

Re: How do I generate a histogram?

2011-05-09 Thread Alex Kozlov
You can also just use Hive/Pig to get the answers if you code the UDFs: select f(value), count(1) from your_table group by f(value) Something similar in Pig a = LOAD 'your_data.txt' AS (value:int); b = FOREACH a GENERATE f($0); c = GROUP b BY $0; d = FOREACH b GENERATE group, COUNT(1); dump d;

RE: RE: Re:Problem debugging MapReduce job under Windows

2011-05-09 Thread Iwona Bialynicka-Birula
It turns out that this problem is specific to a beta Cloudera Hadoop build I was using: 0.20.2-737. It does not reproduce in CDH3u0, so all is good now. Thanks for all the suggestions, Iwona -Original Message- From: Iwona Bialynicka-Birula [mailto:iwona...@atigeo.com] Sent: Monday, May

Behavioral targeting

2011-05-09 Thread sagar kohli
Hi all, Cheers to Hadoop, I was trying to understand Behavioral targeting for large data, though it is well explained in *this document*, but it could be better if any example were provided with document. Has Some one worked or working on it

Re: How do I generate a histogram?

2011-05-09 Thread Soren Flexner
It's word count. The mapper takes f(value) and outputs that as the key, with 1 as the value. The reducer outputs the key (ie f(value)) as the key and the sum of all the 1's as the value. You should be able to just tweak WordCount.java to get what you want -s On May 9, 2011, at 1:12 PM, "W.P. M

How do I generate a histogram?

2011-05-09 Thread W.P. McNeill
I have a set of (key, value) pairs. For each value there is a function f(value) that returns an integer. I want to generate a histogram over f(value) for my data set. For example, representing the values as [f(value)] if I have the data set key1, [3] key2, [4] key3, [3] key4, [5] I'd want to pro

Re: Behavioral targeting

2011-05-09 Thread Daniel McEnnis
Sagar, You should look at mahout.apache.org. The kind of stuff your looking for is there. Daniel. On Mon, May 9, 2011 at 6:51 AM, Sagar Kohli wrote: > Hi all, > > Cheers to Hadoop, I was trying to understand Behavioral targeting for large > data, though it is well explained in this > documen

Re: So many unexpected "Lost task tracker" errors making the job to be killed Options

2011-05-09 Thread Shantian Purkad
I have been seeing this a lot on my cluster as well. This typically happens for me if there are many maps (more than 5000) in a job. Here is my cluster summary 342316 files and directories, 94294 blocks = 436610 total. Heap Size is 258.12 MB / 528 MB (48%) Configured Capacity : 26.57 TB DFS

RE: RE: Re:Problem debugging MapReduce job under Windows

2011-05-09 Thread Iwona Bialynicka-Birula
I do have Cygwin on the path, but in this case, it is not using chmod to set permissions, but File.setReadable. Below is what RawLocalFileSystem.setPermissions looks like. It only uses chmod (hence Cygwin) if group != other (lines 490-494), but this is not the case for 700 permissions, so it do

Behavioral targeting

2011-05-09 Thread Sagar Kohli
Hi all, Cheers to Hadoop, I was trying to understand Behavioral targeting for large data, though it is well explained in this document, but it could be better if any example were provided with document. Has Some one worked or working on i

Announcing Seal 0.1.0: BWA alignment on Hadoop

2011-05-09 Thread Luca Pireddu
Hello everyone. If you're working on short DNA read alignment, then you may be interested in this message. We've just released Seal (http://biodoop-seal.sourceforge.net/), a Hadoop- based distributed short read alignment and analysis toolkit. Currently SEAL includes tools for: read alignment

Re: NoSuchMethodError while calling a DAO method from Reducer

2011-05-09 Thread Harsh J
Hey Abhay, Ensure that you do not versions of your additional jars on the TT's classpath. Those will interfere with the job's supplied ones. On Mon, May 9, 2011 at 1:27 PM, abhay ratnaparkhi wrote: > I checked the class file and it's having the new methods. > It seems that previous class file fo

Re: NoSuchMethodError while calling a DAO method from Reducer

2011-05-09 Thread abhay ratnaparkhi
I found that XML files are getting cached somewhere while running MR task. Even if my recently submitted job has new XML files (for ibatis queries), it's using XML files from previously submitted job. On Mon, May 9, 2011 at 1:27 PM, abhay ratnaparkhi < abhay.ratnapar...@gmail.com> wrote: > > I ch

So many unexpected "Lost task tracker" errors making the job to be killed Options

2011-05-09 Thread Marc Sturlese
Hey there, I have a small cluster running on 0.20.2. Everything is fine but once in a while, when a job with a lot of map tasks is running I start getting the error: Lost task tracker: tracker_cluster1:localhost.localdomain/ 127.0.0.1:x Before getting the error, the task attempt has been r

Re: NoSuchMethodError while calling a DAO method from Reducer

2011-05-09 Thread abhay ratnaparkhi
I checked the class file and it's having the new methods. It seems that previous class file for DAO is getting cached somewhere. Does hadoop caches some files in a job? Abhay On Tue, May 3, 2011 at 7:22 PM, abhay ratnaparkhi < abhay.ratnapar...@gmail.com> wrote: > I'm trying to run MR task to p

Which datanode serves the data for MR

2011-05-09 Thread Matthew John
Hi all, I wanted to know details such as "In an MR job, which tasktracker (node-level) works on data (inputsplit) from which datanode (node-level) ?" Can some logs provide data on it? Or do I need to print this data - if yes, what to print and how to print ? Thanks, Matthew