Re: How to efficiently join this two complicated rdds

2014-02-19 Thread Eugen Cepoi
ion? >> >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-efficiently-join-this-two-complicated-rdds-tp1665p1749.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> > >

Re: How to efficiently join this two complicated rdds

2014-02-19 Thread Eugen Cepoi
it costed by the DAG > information? > Or by some variable related with the collect function? > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-efficiently-join-this-two-complicated-rdds-tp1665p1749.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >

Re: How to efficiently join this two complicated rdds

2014-02-19 Thread hanbo
RDD1 doesn't change. Thank you for your advice, we will have a look at how ALS iterate -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-efficiently-join-this-two-complicated-rdds-tp1665p1750.html Sent from the Apache Spark User List mailing

Re: How to efficiently join this two complicated rdds

2014-02-19 Thread hanbo
spark-user-list.1001560.n3.nabble.com/How-to-efficiently-join-this-two-complicated-rdds-tp1665p1749.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to efficiently join this two complicated rdds

2014-02-19 Thread Guillaume Pitel
Actually, even without the skewness problem, the solution I've proposed is really not efficient, since it generates a lot of data. Since what you have is very close to a sparse matrix * sparse vector computation, in my opinion, you should split your data in blocks

Re: How to efficiently join this two complicated rdds

2014-02-18 Thread hanbo
ot;3",("L2",33)) ("5",("L2",55)) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-efficiently-join-this-two-complicated-rdds-tp1665p1728.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to efficiently join this two complicated rdds

2014-02-18 Thread zhaoxw12
user-list.1001560.n3.nabble.com/How-to-efficiently-join-this-two-complicated-rdds-tp1665p1714.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to efficiently join this two complicated rdds

2014-02-18 Thread Eugen Cepoi
4472 5950 7276 7368 14670 14671 13078 14673 14674 > > 1 153 258 2240 4486 5953 7276 7368 7678 12683 13096 14673 14674 > > ... > > Type two RDD: a set of (key, value). > > > > The problem we want to solve: > > For each line in RDD one, we need to use the keys of t

Re: How to efficiently join this two complicated rdds

2014-02-18 Thread Guillaume Pitel
number of the keys in one line of type one RDD is about 50. The size of RDD one file is about 10GB. The biggest number of key in RDD two is about 450, and we wil

Re: How to efficiently join this two complicated rdds

2014-02-18 Thread hanbo
-join-this-two-complicated-rdds-tp1665p1675.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to efficiently join this two complicated rdds

2014-02-18 Thread hanbo
rdd of size RDD1 and containing a number per line? Yes Thank you again. This problem has puzzled us for several days... -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-efficiently-join-this-two-complicated-rdds-tp1665p1674.html Sent from the

Re: How to efficiently join this two complicated rdds

2014-02-18 Thread Eugen Cepoi
he (key, value) if value is zero. > > And maybe the type one RDD has a lot key numbers of 1 but a few of 15877. > > > > We want to fine a fast way to solve this problem. > > Sincerely thanks > > > Bo Han > . > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-efficiently-join-this-two-complicated-rdds-tp1665.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >

How to efficiently join this two complicated rdds

2014-02-17 Thread hanbo
://apache-spark-user-list.1001560.n3.nabble.com/How-to-efficiently-join-this-two-complicated-rdds-tp1665.html Sent from the Apache Spark User List mailing list archive at Nabble.com.