[PySpark] [SparkR] Is it possible to invoke a PySpark function with a SparkR DataFrame?

2019-07-15 Thread Fiske, Danny
sue this? Is it even possible? Many thanks, Danny For the latest data on the economy and society, consult our website at http://www.ons.gov.uk *** Please Note: Incoming and outgoing email messages are

Re: Debug Spark

2015-11-29 Thread Danny Stephan
Hi, You can use “jwdp" to debug everything that run on top of JVM including Spark. Specific with IntelliJ, maybe this link can help you: http://danosipov.com/?p=779 <http://danosipov.com/?p=779> regards, Danny > Op 29 nov. 2015, om 17:34 heeft Masf <masfwo...@gmail.

Cluster sizing for recommendations

2015-07-06 Thread Danny Yates
of limitations around how I can size EC2 instances in order to get the CPU I need. But I've been at this for 3 days now and still haven't actually managed to build any recommendations... Thanks in advance, Danny

Spark 1.4 MLLib Bug?: Multiclass Classification requirement failed: sizeInBytes was negative

2015-07-03 Thread Danny Linden
hi, i want to run a multiclass classification with 390 classes on120k label points(tf-idf vectors). but i get the following exception. If i reduce the number of classes to ~20 everythings work fine. How can i fix this? i use the LogisticRegressionWithLBFGS class for my classification on a 8

Spark 1.4 MLLib Bug?: Multiclass Classification requirement failed: sizeInBytes was negative

2015-07-03 Thread Danny
hi, i want to run a multiclass classification with 390 classes on120k label points(tf-idf vectors). but i get the following exception. If i reduce the number of classes to ~20 everythings work fine. How can i fix this? i use the LogisticRegressionWithLBFGS class for my classification on a 8

Re: which mllib algorithm for large multi-class classification?

2015-06-24 Thread Danny Linden
give you an array larger than MaxInt exception. Could you paste the stack trace? -Xiangrui On Mon, Jun 22, 2015 at 4:21 PM, Danny kont...@dannylinden.de wrote: hi, I am unfortunately not very fit in the whole MLlib stuff, so I would appreciate a little help: Which multi-class

Re: s3 - Can't make directory for path

2015-06-22 Thread Danny
hi, have you tested s3://ww-sandbox/name_of_path/ instead of s3://ww-sandbox/name_of_path or have you test to add your file extension with placeholder (*) like: s3://ww-sandbox/name_of_path/*.gz or s3://ww-sandbox/name_of_path/*.csv depend on your files. If it does not work pls test with

which mllib algorithm for large multi-class classification?

2015-06-22 Thread Danny
hi, I am unfortunately not very fit in the whole MLlib stuff, so I would appreciate a little help: Which multi-class classification algorithm i should use if i want to train texts (100-1000 words each) into categories. The number of categories is between 100-500 and the number of training

New Spark Meetup group in Munich

2015-06-22 Thread Danny Linden
in special topics about Spark. It would be nice if someone can add our meetup group to the spark website (http://spark.apache.org/community.html) :) You find us here: http://www.meetup.com/de/Spark-Munich/ http://www.meetup.com/de/Spark-Munich/ Thanks, Danny Linden

Re: Spark and S3 server side encryption

2015-01-29 Thread Danny
On Spark 1.2.0 you have the s3a library to work with S3. And there is a config param named fs.s3a.server-side-encryption-algorithm: https://github.com/Aloisius/hadoop-s3a -- View this message in context:

ETL process design

2015-01-28 Thread Danny Yates
day, so we need to be able to handle the situation where we're adding events for a day we've already processed. Many thanks, Danny.

Can Spark benefit from Hive-like partitions?

2015-01-26 Thread Danny Yates
) Is there any way to get Spark to use the y, m and d fields to minimise the files it transfers from S3? Thanks, Danny.

Re: Can Spark benefit from Hive-like partitions?

2015-01-26 Thread Danny Yates
Thanks Michael. I'm not actually using Hive at the moment - in fact, I'm trying to avoid it if I can. I'm just wondering whether Spark has anything similar I can leverage? Thanks

Re: Can Spark benefit from Hive-like partitions?

2015-01-26 Thread Danny Yates
Ah, well that is interesting. I'll experiment further tomorrow. Thank you for the info! - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org