Re: Join streams Apache Spark

2017-05-07 Thread Gourav Sengupta
On another note, you might want to first try Flume in case you are just at exploration phase. The advantage of Flume (using push) is that you do not need to write any additional program in order to sink or write your data to any target system. I am not quite sure how well Flume works with SPARK str

Re: take the difference between two columns of a dataframe in pyspark

2017-05-07 Thread Gourav Sengupta
Hi, convert then to temporary table and write a SQL, that will also work. Regards, Gourav On Sun, May 7, 2017 at 2:49 AM, Zeming Yu wrote: > Say I have the following dataframe with two numeric columns A and B, > what's the best way to add a column showing the difference between the two > colu

Re: Issue upgrading to Spark 2.1.1 from 2.1.0

2017-05-07 Thread Irving Duran
I haven't noticed that on behavior on ALS. Thank you, Irving Duran On 05/07/2017 04:14 PM, mhornbech wrote: > Hi > > We have just tested the new Spark 2.1.1 release, and observe an issue where > the driver program hangs when making predictions using a random forest. The > issue disappears when d

Issue upgrading to Spark 2.1.1 from 2.1.0

2017-05-07 Thread mhornbech
Hi We have just tested the new Spark 2.1.1 release, and observe an issue where the driver program hangs when making predictions using a random forest. The issue disappears when downgrading to 2.1.0. Have anyone observed similar issues? Recommendations on how to dig into this would also be much ap

Re: org.apache.hadoop.fs.FileSystem: Provider tachyon.hadoop.TFS could not be instantiated

2017-05-07 Thread Rohit Karlupia
Last time I checked, this happens only with Spark < 2.0.0. The reason is ServiceLoader used for loading all fileSystems from the classpath. In pre Spark < 2.0.0 tachyon.hadoop.TFS was packaged with Spark distribution and gets loaded irrespective of it being used or not. Moving to Spark 2.0.0+ will

Re: Join streams Apache Spark

2017-05-07 Thread saulshanabrook
The script I wrote in Go? No I do not, but it's very easy to compile it to whatever platform you are running on! Doesn't need to be integrated in the same language as the rest of your code. On Sat, May 6, 2017 at 3:13 PM tencas [via Apache Spark User List] < ml+s1001560n28658...@n3.nabble.com> wro