many-to-many join

2015-07-21 Thread John Berryman
Quick example problem that's stumping me: * Users have 1 or more phone numbers and therefore one or more area codes. * There are 100M users. * States have one or more area codes. * I would like to the states for the users (as indicated by phone area code). I was thinking about something like this

Fwd: Spark/PySpark errors on mysterious missing /tmp file

2015-06-12 Thread John Berryman
(This question is also present on StackOverflow http://stackoverflow.com/questions/30656083/spark-pyspark-errors-on-mysterious-missing-tmp-file ) I'm having issues with pyspark and a missing /tmp file. I've narrowed down the behavior to a short snippet. >>> a=sc.parallelize([(16646160,1)]) #