ion?
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-efficiently-join-this-two-complicated-rdds-tp1665p1749.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>
>
it costed by the DAG
> information?
> Or by some variable related with the collect function?
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-efficiently-join-this-two-complicated-rdds-tp1665p1749.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
RDD1 doesn't change.
Thank you for your advice, we will have a look at how ALS iterate
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-efficiently-join-this-two-complicated-rdds-tp1665p1750.html
Sent from the Apache Spark User List mailing
spark-user-list.1001560.n3.nabble.com/How-to-efficiently-join-this-two-complicated-rdds-tp1665p1749.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Actually, even without the skewness
problem, the solution I've proposed is really not efficient, since
it generates a lot of data. Since what you have is very close to a
sparse matrix * sparse vector computation, in my opinion, you
should split your data in blocks
ot;3",("L2",33))
("5",("L2",55))
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-efficiently-join-this-two-complicated-rdds-tp1665p1728.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
user-list.1001560.n3.nabble.com/How-to-efficiently-join-this-two-complicated-rdds-tp1665p1714.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
4472 5950 7276 7368 14670 14671 13078 14673 14674
>
> 1 153 258 2240 4486 5953 7276 7368 7678 12683 13096 14673 14674
>
> ...
>
> Type two RDD: a set of (key, value).
>
>
>
> The problem we want to solve:
>
> For each line in RDD one, we need to use the keys of t
number of the keys in one line of type one RDD is about 50. The size of
RDD one file is about 10GB.
The biggest number of key in RDD two is about 450, and we wil
-join-this-two-complicated-rdds-tp1665p1675.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
rdd of size RDD1 and containing a number
per line?
Yes
Thank you again. This problem has puzzled us for several days...
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-efficiently-join-this-two-complicated-rdds-tp1665p1674.html
Sent from the
he (key, value) if value is zero.
>
> And maybe the type one RDD has a lot key numbers of 1 but a few of 15877.
>
>
>
> We want to fine a fast way to solve this problem.
>
> Sincerely thanks
>
>
> Bo Han
> .
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-efficiently-join-this-two-complicated-rdds-tp1665.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
://apache-spark-user-list.1001560.n3.nabble.com/How-to-efficiently-join-this-two-complicated-rdds-tp1665.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
13 matches
Mail list logo