Re: createDataFrame causing a strange error.

2016-11-29 Thread Andrew Holway
Hi Marco, I was not able to find out what was causing the problem but a "git stash" seems to have fixed it :/ Thanks for your help... :) On Mon, Nov 28, 2016 at 10:50 PM, Marco Mistroni wrote: > Hi Andrew, > sorry but to me it seems s3 is the culprit > I have

Re: createDataFrame causing a strange error.

2016-11-28 Thread Marco Mistroni
Hi Andrew, sorry but to me it seems s3 is the culprit I have downloaded your json file and stored locally. Then write this simple app (a subset of what you have in ur github, sorry i m littebit rusty on how to create new column out of existing ones) which basically read the json file It's in

Re: createDataFrame causing a strange error.

2016-11-28 Thread Andrew Holway
I extracted out the boto bits and tested in vanilla python on the nodes. I am pretty sure that the data from S3 is ok. I've applied a public policy to the bucket s3://time-waits-for-no-man. There is a publicly available object here:

Re: createDataFrame causing a strange error.

2016-11-27 Thread Marco Mistroni
Hi pickle erros normally point to serialisation issue. i am suspecting something wrong with ur S3 data , but is just a wild guess... Is your s3 object publicly available? few suggestions to nail down the problem 1 - try to see if you can read your object from s3 using boto3 library 'offline',

Re: createDataFrame causing a strange error.

2016-11-27 Thread Andrew Holway
I get a slight different error when not specifying a schema: Traceback (most recent call last): File "/home/centos/fun-functions/spark-parrallel-read-from-s3/tick.py", line 61, in df = sqlContext.createDataFrame(foo) File

createDataFrame causing a strange error.

2016-11-27 Thread Andrew Holway
Hi, Can anyone tell me what is causing this error Spark 2.0.0 Python 2.7.5 df = sqlContext.createDataFrame(foo, schema) https://gist.github.com/mooperd/368e3453c29694c8b2c038d6b7b4413a Traceback (most recent call last): File "/home/centos/fun-functions/spark-parrallel-read-from-s3/tick.py",