Quick example problem that's stumping me:
* Users have 1 or more phone numbers and therefore one or more area codes.
* There are 100M users.
* States have one or more area codes.
* I would like to the states for the users (as indicated by phone area
code).
I was thinking about something like this
(This question is also present on StackOverflow
http://stackoverflow.com/questions/30656083/spark-pyspark-errors-on-mysterious-missing-tmp-file
)
I'm having issues with pyspark and a missing /tmp file. I've narrowed down
the behavior to a short snippet.
>>> a=sc.parallelize([(16646160,1)]) #