Re: Difference in R and Spark Output

2017-01-01 Thread Saroj C
Dear Felix, Thanks. Please find the differences Cluster Spark - Size R- Size 0 69 114 1 79 141 2 77 93 3 90 44 4 130 53 Spark - Centers 0.807554406 0.123759 -0.58642 -0.17803 0.624278 -0.06752 0.033517 -0.01504 -0.02794 0.016699 0.20841 -0.00149

24/7 Spark Streaming on YARN in Production

2017-01-01 Thread Bernhard Schäfer
Two weeks ago I have published a blogpost about our experiences running 24/7 Spark Streaming applications on YARN in production: https://www.inovex.de/blog/247-spark-streaming-on-yarn-in-production/ Amongst others it

Re: What is missing here to use sql in spark?

2017-01-01 Thread Michal Šenkýř
Happy new year, Raymond! Not sure whether I undestand your problem correctly but it seems to me that you are just not processing your result. sqlContext.sql(...) returns a DataFrame which you have to call an action on. Therefore, to get the result you are expecting, you just have to call:

Re: Error when loading json to spark

2017-01-01 Thread Raymond Xie
Thank you very much Marco, is your code in Scala? do you have a python example? Can anyone give me a python example to handle json data on Spark? ** *Sincerely yours,* *Raymond* On Sun, Jan 1, 2017 at 12:29 PM, Marco Mistroni

Re: Error when loading json to spark

2017-01-01 Thread Marco Mistroni
Hi you will need to pass the schema, like in the snippet below (even though the code might have been superseeded in spark 2.0) import sqlContext.implicits._ val jsonRdd = sc.textFile("file:///c:/tmp/1973-01-11.json") val schema = (new StructType).add("hour",

RE: [PySpark - 1.6] - Avoid object serialization

2017-01-01 Thread Sidney Feiner
Thanks everybody but I've found another way of doing it. Because I didn't really actually need an instance of my class, I created a "static" class. All variables get initiated as class variables and all methods are class methods. Thanks a lot anyways, hope my answer will also help one day ☺

Re: Error when loading json to spark

2017-01-01 Thread Raymond Xie
I found the cause: I need to "put" the json file onto hdfs first before it can be used, here is what I did: hdfs dfs -put /root/Downloads/data/json/world_bank.json hdfs://localhost:9000/json df = sqlContext.read.json("/json/") df.show(10) . However, there is a new problem here, the json

Re: Error when loading json to spark

2017-01-01 Thread Raymond Xie
Thank you Miguel, here is the output: >>>df = sqlContext.read.json("/root/Downloads/data/json") 17/01/01 07:28:19 INFO json.JSONRelation: Listing hdfs://localhost:9000/root/Downloads/data/json on driver 17/01/01 07:28:19 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory