This would be much, much faster if your set of IDs was simply a Set,
and you passed that to a filter() call that just filtered in the docs
that matched an ID in the set.
On Thu, Apr 16, 2015 at 4:51 PM, Wang, Ningjun (LNG-NPV)
ningjun.w...@lexisnexis.com wrote:
Does anybody have a solution for
Does anybody have a solution for this?
From: Wang, Ningjun (LNG-NPV)
Sent: Tuesday, April 14, 2015 10:41 AM
To: user@spark.apache.org
Subject: How to join RDD keyValuePairs efficiently
I have an RDD that contains millions of Document objects. Each document has an
unique Id that is a string. I
: user@spark.apache.org
Subject: Re: How to join RDD keyValuePairs efficiently
This would be much, much faster if your set of IDs was simply a Set, and you
passed that to a filter() call that just filtered in the docs that matched an
ID in the set.
On Thu, Apr 16, 2015 at 4:51 PM, Wang, Ningjun (LNG
You could try repartitioning your RDD using a custom partitioner
(HashPartitioner etc) and caching the dataset into memory to speedup the
joins.
Thanks
Best Regards
On Tue, Apr 14, 2015 at 8:10 PM, Wang, Ningjun (LNG-NPV)
ningjun.w...@lexisnexis.com wrote:
I have an RDD that contains
(LNG-NPV) [mailto:ningjun.w...@lexisnexis.com]
Sent: Thursday, April 16, 2015 9:39 PM
To: user@spark.apache.org
Subject: RE: How to join RDD keyValuePairs efficiently
Evo
partition the large doc RDD based on the hash function on the
key ie the docid
What API to use to do
: RE: How to join RDD keyValuePairs efficiently
Evo
partition the large doc RDD based on the hash function on the key ie
the docid
What API to use to do this?
By the way, loading the entire dataset to memory cause OutOfMemory problem
because it is too large (I only have one machine
To: Wang, Ningjun (LNG-NPV)
Cc: user@spark.apache.org
Subject: Re: How to join RDD keyValuePairs efficiently
This would be much, much faster if your set of IDs was simply a Set, and you
passed that to a filter() call that just filtered in the docs that matched an
ID in the set.
On Thu, Apr 16
Yes simply look for partitionby in the javadoc for e.g. PairJavaRDD
From: Jeetendra Gangele [mailto:gangele...@gmail.com]
Sent: Thursday, April 16, 2015 9:57 PM
To: Evo Eftimov
Cc: Wang, Ningjun (LNG-NPV); user
Subject: Re: How to join RDD keyValuePairs efficiently
Does this same