Whoops, I should have mentioned that it's a multivariate median (cf
http://www.pnas.org/content/97/4/1423.full.pdf ). It's easy to compute
when all the values are accessible at once. I'm not sure it's possible
with a combiner. So, I guess the question should be: "Can I use
GraphX's Pregel without a combiner?"

On Wed, Apr 23, 2014 at 7:01 PM, Tom Vacek <minnesota...@gmail.com> wrote:
> Here are some out-of-the-box ideas:  If the elements lie in a fairly small
> range and/or you're willing to work with limited precision, you could use
> counting sort.  Moreover, you could iteratively find the median using
> bisection, which would be associative and commutative.  It's easy to think
> of improvements that would make this approach give a reasonable answer in a
> few iterations.  I have no idea about mixing algorithmic iterations with
> median-finding iterations.
>
>
> On Wed, Apr 23, 2014 at 8:20 PM, Ryan Compton <compton.r...@gmail.com>
> wrote:
>>
>> I'm trying shoehorn a label propagation-ish algorithm into GraphX. I
>> need to update each vertex with the median value of their neighbors.
>> Unlike PageRank, which updates each vertex with the mean of their
>> neighbors, I don't have a simple commutative and associative function
>> to use for mergeMsg.
>>
>> What are my options? It looks like I can choose between:
>>
>> 1. a hacky mergeMsg (i.e. combine a,b -> Array(a,b) and then do the
>> median in vprog)
>> 2. collectNeighbors and then median
>> 3. ignore GraphX and just do the whole thing with joins (which I
>> actually got working, but its slow)
>>
>> Is there another possibility that I'm missing?
>
>

Reply via email to