Have a look at the sampleDown method in RowSimilarityJob:
https://svn.apache.org/viewvc/mahout/trunk/core/src/main/java/org/apache/mahout/math/hadoop/similarity/cooccurrence/RowSimilarityJob.java?view=markup
On 04/08/2014 10:33 AM, Reinis Vicups wrote:
Sebastian, thank your very much for your response.
Could you or anyone point me to the mahout classes where this is being
solved?
thank you guys
reinis
On 08.04.2014 10:27, Sebastian Schelter wrote:
I don't know a good name for that. The problems is that a quadratic
amount of pairs needs to be emitted here. In our collaborative
filtering code, we solve this through downsampling.
--sebastian
On 04/08/2014 10:08 AM, Reinis Vicups wrote:
Hi,
this is not mahout question directly, but I figured that you guys most
likely can answer it.
Actually I have two questions:
1. This: {(1,2); (1,3); (2,3)} is not full cartesian product, right? It
is missing (1,1); (2,2); (3,3); (2,1);.... My question is - how is it
called? Partial cartesian? Asymetric cartesian?
2. If I try to build the product I described above in reducer, what
would be the best practice? My current code look like this:
@Override
public void reduce(final VarLongWritable key, final
Iterable<VarLongWritable> values, final Context context) {
final VarLongWritable[] valueArray = Iterables.toArray(values,
VarLongWritable.class);
for (int i = 0; i < valueArray.length; i++) {
for (int j = i + 1; j < valueArray.length; j++) {
context.write(new PairWritable(valueArray[i].get(),
valueArray[j].get()), customerPreferenceWritable);
}
}
}
I don't feel quite right with this solution since I make a copy of
values in "valueArray" and believe that it will cost me
OoutOfMemoryExceptions with larger data sets.
thanks and br
reinis