Re: Best practice for partial cartesian product

Sebastian Schelter Tue, 08 Apr 2014 01:38:23 -0700

Have a look at the sampleDown method in RowSimilarityJob:


https://svn.apache.org/viewvc/mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/RowSimilarityJob.java?view=markup

On 04/08/2014 10:33 AM, Reinis Vicups wrote:

Sebastian, thank your very much for your response.

Could you or anyone point me to the mahout classes where this is being
solved?

thank you guys
reinis

On 08.04.2014 10:27, Sebastian Schelter wrote:

I don't know a good name for that. The problems is that a quadratic
amount of pairs needs to be emitted here. In our collaborative
filtering code, we solve this through downsampling.

--sebastian

On 04/08/2014 10:08 AM, Reinis Vicups wrote:

Hi,

this is not mahout question directly, but I figured that you guys most
likely can answer it.

Actually I have two questions:

1. This: {(1,2); (1,3); (2,3)} is not full cartesian product, right? It
is missing (1,1); (2,2); (3,3); (2,1);.... My question is - how is it
called? Partial cartesian? Asymetric cartesian?

2. If I try to build the product I described above in reducer, what
would be the best practice? My current code look like this:

     @Override
     public void reduce(final VarLongWritable key, final
Iterable<VarLongWritable> values, final Context context) {

         final VarLongWritable[] valueArray = Iterables.toArray(values,
VarLongWritable.class);

         for (int i = 0; i < valueArray.length; i++) {
             for (int j = i + 1; j < valueArray.length; j++) {
                 context.write(new PairWritable(valueArray[i].get(),
valueArray[j].get()), customerPreferenceWritable);
             }
         }
     }

I don't feel quite right with this solution since I make a copy of
values in "valueArray" and believe that it will cost me
OoutOfMemoryExceptions with larger data sets.

thanks and br
reinis

Re: Best practice for partial cartesian product

Reply via email to