Re: PySpark joins fail - please help

2014-10-17 Thread Russell Jurney
There was a bug in the devices line: dh.index('id') should have been x[dh.index('id')]. ᐧ On Fri, Oct 17, 2014 at 5:52 PM, Russell Jurney wrote: > Is that not exactly what I've done in j3/j4? The keys are identical > strings.The k is the same, the value in both instances is an associative > arra

Re: PySpark joins fail - please help

2014-10-17 Thread Russell Jurney
Is that not exactly what I've done in j3/j4? The keys are identical strings.The k is the same, the value in both instances is an associative array. devices = devices.map(lambda x: (dh.index('id'), {'deviceid': x[dh.index('id')], 'foo': x[dh.index('foo')], 'bar': x[dh.index('bar')]})) bytes_in_out

Re: PySpark joins fail - please help

2014-10-17 Thread Davies Liu
Hey Russell, join() can only work with RDD of pairs (key, value), such as rdd1: (k, v1) rdd2: (k, v2) rdd1.join(rdd2) will be (k1, v1, v2) Spark SQL will be more useful for you, see http://spark.apache.org/docs/1.1.0/sql-programming-guide.html Davies On Fri, Oct 17, 2014 at 5:01 PM, Russel

PySpark joins fail - please help

2014-10-17 Thread Russell Jurney
https://gist.github.com/rjurney/fd5c0110fe7eb686afc9 Any way I try to join my data fails. I can't figure out what I'm doing wrong. -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com ᐧ