Re: Duplicated entries with map job reading from HBase

2010-11-09 Thread Adam Phelps
That had been my initial thought, however dumping the data from hbase shell only found single entries. Further with the experiment I ran yesterday (generating the data to a new table as well as the old one) the entries being created should have been identical for each table. The only thing I

Re: Predicting how many values will I see in a call to reduce?

2010-11-09 Thread Owen O'Malley
On Sun, Nov 7, 2010 at 5:38 AM, Anthony Urso wrote: > Is there any way to know how many values I will see in a call to > reduce without first counting through them all with the iterator? No, there currently isn't. The framework doesn't have the information until the iterator is exhausted. The i

AW: Duplicated entries with map job reading from HBase

2010-11-09 Thread Biedermann,S.,Fa. Post Direkt
Hi Adam, Is it possible that you have double entries in your old table (two entries for the same (column family, column, timestamp) tuple)? Sven -Ursprüngliche Nachricht- Von: Adam Phelps [mailto:a...@opendns.com] Gesendet: Dienstag, 9. November 2010 01:30 An: mapreduce-user@hadoop.apa

Question regarding the location in a map reduce function

2010-11-09 Thread Markus Pilman
Hi all, I am very new to Hadoop and I have a (it seems to me it should be quite simple) question regarding MapReduce: I wanted to implement a simple Map Reduce to create an inverted index over a list of files on my DFS. The code is quite easy, but I cannot figure out, how I get to the exact lo

Re: Yahoo Open Source Real-Time MapReduce

2010-11-09 Thread Jeff Zhang
en, Yes, "stream process" should be more accurate than "real-time" On Tue, Nov 9, 2010 at 6:36 PM, Bibek Paudel wrote: > On Tue, Nov 9, 2010 at 10:49 AM, Jeff Zhang wrote: >> Not sure whether this has been post on this mail list. But I strongly >> feel to tell everyone here that "Yahoo Open Sou

Re: Yahoo Open Source Real-Time MapReduce

2010-11-09 Thread Bibek Paudel
On Tue, Nov 9, 2010 at 10:49 AM, Jeff Zhang wrote: > Not sure whether this has been post on this mail list. But I strongly > feel to tell everyone here that "Yahoo Open Source Real-Time > MapReduce". See http://s4.io/ for more details. > > And thanks again for Yahoo's contribution for open source

Re: Yahoo Open Source Real-Time MapReduce

2010-11-09 Thread Niels Basjes
Hi, I had a quick read through the site and I have perhaps "the" question: Assume I have an existing Hadoop MapReduce application written in Java. How hard is it to make it run in realtime using this new tool? 2010/11/9 Jeff Zhang > Not sure whether this has been post on this mail list. But I s

Yahoo Open Source Real-Time MapReduce

2010-11-09 Thread Jeff Zhang
Not sure whether this has been post on this mail list. But I strongly feel to tell everyone here that "Yahoo Open Source Real-Time MapReduce". See http://s4.io/ for more details. And thanks again for Yahoo's contribution for open source world. -- Best Regards Jeff Zhang

Re: How to increase data processed by one JVM instance

2010-11-09 Thread Harsh J
Hi, On Tue, Nov 9, 2010 at 2:55 PM, Jyothish Soman wrote: > Hello, > I wanted to know how to increase the data processed by a single JVM > instance. What options are needed for this, and where to put them up. What do you exactly mean by increasing the "data processed" part? In case you're runni

How to increase data processed by one JVM instance

2010-11-09 Thread Jyothish Soman
Hello, I wanted to know how to increase the data processed by a single JVM instance. What options are needed for this, and where to put them up. Regards, Jyothish Soman