subject:"Fuzzy GroupBy"

Re: Fuzzy GroupBy

2015-03-26 Thread Sean Owen

The grouping is determined by the POJO's equals() method. You can also call groupBy() to group by some function of the POJOs. For example if you're grouping Doubles into nearly-equal bunches, you could group by their .intValue() On Thu, Mar 26, 2015 at 8:47 PM, Mihran Shahinian wrote: > I would l

Fuzzy GroupBy

2015-03-26 Thread Mihran Shahinian

I would like to group records, but instead of grouping on exact key I want to be able to compute the similarity of keys on my own. Is there a recommended way of doing this? here is my starting point final JavaRDD< pojo > records = spark.parallelize(getListofPojos()).cache(); class pojo { String