Re: [infinispan-dev] MapReduce limitations and suggestions.

2014-02-18 Thread Radim Vansa
Hi Etienne, how does the requirement for all data provided to Reducer as a whole work for distributed caches? There you'd get only a subset of the whole mapped set on each node (afaik each node maps the nodes locally and performs a reduction before executing the global reduction). Or are

Re: [infinispan-dev] MapReduce limitations and suggestions.

2014-02-18 Thread Dan Berindei
Radim, this is how our M/R algorithm works (Hadoop may do it differently): * The mapping phase generates a MapIntKey, CollectionIntValue on each node (Int meaning intermediate). * In the combine (local reduce) phase, a combine operation takes as input an IntKey and a CollectionIntValue with only

Re: [infinispan-dev] MapReduce limitations and suggestions.

2014-02-18 Thread Marcelo Pasin
On 18/Feb/2014, at 10:59 , Dan Berindei dan.berin...@gmail.com wrote: I think Hadoop only loads a block of intermediate values in memory at once, and can even sort the intermediate values (with a user-supplied comparison function) so that the reduce function can work on a sorted list without

Re: [infinispan-dev] MapReduce limitations and suggestions.

2014-02-18 Thread Evangelos Vazaios
Hi Radim, Since Hadoop is the most popular implementation of MapReduce I will give a brief overview of how it works and then I'll provide with an example where the reducers must run over the whole list of values with the same key. Hadoop MR overview. MAP 1) Input file(s) are split into pieces

Re: [infinispan-dev] MapReduce limitations and suggestions.

2014-02-18 Thread Dan Berindei
On Tue, Feb 18, 2014 at 12:21 PM, Evangelos Vazaios vag...@gmail.comwrote: Hi Radim, Since Hadoop is the most popular implementation of MapReduce I will give a brief overview of how it works and then I'll provide with an example where the reducers must run over the whole list of values with

Re: [infinispan-dev] Design change in Infinispan Query

2014-02-18 Thread Adrian Nistor
Well, OGM and Infinispan are different species :) So, Infinispan being what it is today - a non-homogenous, schema-less KV store, without support for entity associations (except embedding) - which simplifies the whole thing a lot, should we or should we not provide transparent

Re: [infinispan-dev] MapReduce limitations and suggestions.

2014-02-18 Thread Evangelos Vazaios
On 02/18/2014 01:40 PM, Dan Berindei wrote: On Tue, Feb 18, 2014 at 12:21 PM, Evangelos Vazaios vag...@gmail.comwrote: Hi Radim, Since Hadoop is the most popular implementation of MapReduce I will give a brief overview of how it works and then I'll provide with an example where the

[infinispan-dev] Introducing Infinispan OData server: Remote JSON documents querying

2014-02-18 Thread Tomas Sykora
Hello all! :) It's the right time to make it a little bit more public and share some results of work on Infinispan OData server, finally! This solution can serve as a proof of concept where we are able to remotely query JSON documents stored in Infinispan caches and using industrial standard

Re: [infinispan-dev] Design change in Infinispan Query

2014-02-18 Thread Emmanuel Bernard
On Tue 2014-02-18 14:02, Adrian Nistor wrote: Well, OGM and Infinispan are different species :) So, Infinispan being what it is today - a non-homogenous, schema-less KV store, without support for entity associations (except embedding) - which simplifies the whole thing a lot, should we or

Re: [infinispan-dev] Design change in Infinispan Query

2014-02-18 Thread Sanne Grinovero
On 18 February 2014 13:01, Emmanuel Bernard emman...@hibernate.org wrote: On Tue 2014-02-18 14:02, Adrian Nistor wrote: Well, OGM and Infinispan are different species :) So, Infinispan being what it is today - a non-homogenous, schema-less KV store, without support for entity associations

Re: [infinispan-dev] MapReduce limitations and suggestions.

2014-02-18 Thread Radim Vansa
Thanks a lot for this explanations, guys (Dan and Evangelos), I was confused with nomenclature in Hadoop/Infinispan vs. wiki/something I learned in the past. I was considering M/R to be node1 |node2 | ---|--| K1,V1 | K2,V2 | K3,V3 | K4,V4| | | |

Re: [infinispan-dev] MapReduce limitations and suggestions.

2014-02-18 Thread Dan Berindei
On Tue, Feb 18, 2014 at 2:17 PM, Evangelos Vazaios vag...@gmail.com wrote: On 02/18/2014 01:40 PM, Dan Berindei wrote: On Tue, Feb 18, 2014 at 12:21 PM, Evangelos Vazaios vag...@gmail.com wrote: Hi Radim, Since Hadoop is the most popular implementation of MapReduce I will give a

Re: [infinispan-dev] Design change in Infinispan Query

2014-02-18 Thread Emmanuel Bernard
On Tue 2014-02-18 13:27, Sanne Grinovero wrote: On 18 February 2014 13:01, Emmanuel Bernard emman...@hibernate.org wrote: On Tue 2014-02-18 14:02, Adrian Nistor wrote: There were some points raised previously like /if you search for more than one cache transparently, then you probably need

Re: [infinispan-dev] MapReduce limitations and suggestions.

2014-02-18 Thread Evangelos Vazaios
On 02/18/2014 04:39 PM, Dan Berindei wrote: On Tue, Feb 18, 2014 at 2:17 PM, Evangelos Vazaios vag...@gmail.com wrote: On 02/18/2014 01:40 PM, Dan Berindei wrote: On Tue, Feb 18, 2014 at 12:21 PM, Evangelos Vazaios vag...@gmail.com wrote: Hi Radim, Since Hadoop is the most popular

Re: [infinispan-dev] MapReduce limitations and suggestions.

2014-02-18 Thread Vladimir Blagojevic
On 2/18/2014, 4:59 AM, Dan Berindei wrote: The limitation we have now is that in the reduce phase, the entire list of values for one intermediate key must be in memory at once. I think Hadoop only loads a block of intermediate values in memory at once, and can even sort the intermediate

Re: [infinispan-dev] MapReduce limitations and suggestions.

2014-02-18 Thread Evangelos Vazaios
On 02/18/2014 05:36 PM, Vladimir Blagojevic wrote: On 2/18/2014, 4:59 AM, Dan Berindei wrote: The limitation we have now is that in the reduce phase, the entire list of values for one intermediate key must be in memory at once. I think Hadoop only loads a block of intermediate values in

Re: [infinispan-dev] MapReduce limitations and suggestions.

2014-02-18 Thread Dan Berindei
On Tue, Feb 18, 2014 at 5:46 PM, Evangelos Vazaios vag...@gmail.com wrote: On 02/18/2014 05:36 PM, Vladimir Blagojevic wrote: On 2/18/2014, 4:59 AM, Dan Berindei wrote: The limitation we have now is that in the reduce phase, the entire list of values for one intermediate key must be in