SparkML 1.6 GBTRegressor crashes with high maxIter hyper-parameter?

2017-01-25 Thread Aris
When I train a GBTRegressor model from a DataFrame in the latest 1.6.4-Snapshot, with a high number for the hyper-parameter maxIter, say 500, we have java.lang.StackOverflowError; GBTRegressor does work with maxIter set about 100. Does this make sense? Are there any known solutions? This is

Re: Possible Code Generation Bug: Can Spark 2.0 Datasets handle Scala Value Classes?

2016-09-01 Thread Aris
expressed interest in code generation / encoder problems I have found recently. Here is the problem: https://issues.apache.org/jira/browse/SPARK-17368 Thank you On Thu, Sep 1, 2016 at 3:09 PM, Aris <arisofala...@gmail.com> wrote: > Thank you Jakob on two counts > >

Re: Possible Code Generation Bug: Can Spark 2.0 Datasets handle Scala Value Classes?

2016-09-01 Thread Aris
;ja...@odersky.com> wrote: > Hi Aris, > thanks for sharing this issue. I can confirm that value classes > currently don't work, however I can't think of reason why they > shouldn't be supported. I would therefore recommend that you report > this as a bug. > > (Btw, value classes also

Possible Code Generation Bug: Can Spark 2.0 Datasets handle Scala Value Classes?

2016-09-01 Thread Aris
Hello Spark community - Does Spark 2.0 Datasets *not support* Scala Value classes (basically "extends AnyVal" with a bunch of limitations) ? I am trying to do something like this: case class FeatureId(value: Int) extends AnyVal val seq = Seq(FeatureId(1),FeatureId(2),FeatureId(3)) import

Re: Spark 2.0.0 JaninoRuntimeException

2016-08-16 Thread Aris
com> > wrote: > >> I see a similar issue being resolved recently: https://issues.apach >> e.org/jira/browse/SPARK-15285 >> >> On Fri, Aug 12, 2016 at 3:33 PM, Aris <arisofala...@gmail.com> wrote: >> >>> Hello folks, >>> >>> I'

Re: Spark 2.0.0 JaninoRuntimeException

2016-08-16 Thread Aris
? > > There was a test: > SPARK-15285 Generated SpecificSafeProjection.apply method grows beyond 64KB > > See if it matches your use case. > > On Tue, Aug 16, 2016 at 8:41 AM, Aris <arisofala...@gmail.com> wrote: > >> I am still working on making a minimal test tha

Re: Spark 2.0.0 JaninoRuntimeException

2016-08-16 Thread Aris
an reproduce the problem in SPARK-15285 with master branch. > Should we reopen SPARK-15285? > > Best Regards, > Kazuaki Ishizaki, > > > > From:Ted Yu <yuzhih...@gmail.com> > To:dhruve ashar <dhruveas...@gmail.com> > Cc:Aris &l

Spark 2.0.0 JaninoRuntimeException

2016-08-12 Thread Aris
Hello folks, I'm on Spark 2.0.0 working with Datasets -- and despite the fact that smaller data unit tests work on my laptop, when I'm on a cluster, I get cryptic error messages: Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method >

Spark Streaming 1.6 mapWithState not working well with Kryo Serialization

2016-03-02 Thread Aris
ryo.java:472) Since this class is private with spark streaming itself, I cannot actually register it with Kryo, and I cannot do registrationRequired in order to make sure *everything* has been serialized with Kryo. Is this a bug? Can I somehow solve this? Aris

Re: Streaming mapWithState API has NullPointerException

2016-02-22 Thread Aris
If I build from git branch origin/branch-1.6 will I be OK to test out my code? Thank you so much TD! Aris On Mon, Feb 22, 2016 at 2:48 PM, Tathagata Das <tathagata.das1...@gmail.com> wrote: > There were a few bugs that were solved with mapWithState recently. Would > be available

Streaming mapWithState API has NullPointerException

2016-02-22 Thread Aris
Hello Spark community, and especially TD and Spark Streaming folks: I am using the new Spark 1.6.0 Streaming mapWithState API, in order to accomplish a streaming joining task with data. Things work fine on smaller sets of data, but on a single-node large cluster with JSON strings amounting to

Re: Spark Streaming 1.6 accumulating state across batches for joins

2015-12-02 Thread Aris
d be val rawLEFT: DStream[String] = ssc.textFileStream(dirLEFT) val rawRIGHT: DStream[String] = ssc.textFileStream(dirRIGHT) On Wed, Dec 2, 2015 at 2:12 PM, Aris <arisofala...@gmail.com> wrote: > Hello folks, > > I'm on the newest spark 1.6.0-SNAPSHOT Spark Streaming with the new &

Re: Bug in ElasticSearch and Spark SQL: Using SQL to query out data from JSON documents is totally wrong!

2015-02-11 Thread Aris
this behavior does not happen when I connect to ElasticSearch? Every single JSON document must contain every single key, or else the application crashes? So If one single JSON document is missing key X from the elasticsearch data, the application throws an Exception? Thank you! Aris On Wed, Feb 11, 2015

Bug in ElasticSearch and Spark SQL: Using SQL to query out data from JSON documents is totally wrong!

2015-02-10 Thread Aris
I'm using ElasticSearch with elasticsearch-spark-BUILD-SNAPSHOT and Spark/SparkSQL 1.2.0, from Costin Leau's advice. I want to query ElasticSearch for a bunch of JSON documents from within SparkSQL, and then use a SQL query to simply query for a column, which is actually a JSON key -- normal

SparkSQL 1.2 and ElasticSearch-Spark 1.4 not working together, NoSuchMethodError problems

2015-02-09 Thread Aris
Hello Spark community and Holden, I am trying to follow Holden Karau's SparkSQL and ElasticSearch tutorial from Spark Summit 2014. I am trying to use elasticsearch-spark 2.1.0.Beta3 and SparkSQL 1.2 together. https://github.com/holdenk/elasticsearchspark *(Side Note: This very nice tutorial does

Status of MLLib exporting models to PMML

2014-11-10 Thread Aris
ensemble tree methods be representable in PMML? Thank you!! Aris

MLlib - Does LogisticRegressionModel.clearThreshold() no longer work?

2014-10-14 Thread Aris
getting a realistic probability that is between 0.0 and 1.0, I am only getting back predictions of 0.0 OR 1.0...never anything in between. The API says that clearThreshold is experimental ...it was working before! Is it broken now? Thanks! Aris

Re: MLlib - Does LogisticRegressionModel.clearThreshold() no longer work?

2014-10-14 Thread Aris
Wow...I just tried LogisticRegressionWithLBFGS, and using clearThreshold() DOES IN FACT work. It appears the the LogsticRegressionWithSGD returns a model whose method is broken!! On Tue, Oct 14, 2014 at 3:14 PM, Aris arisofala...@gmail.com wrote: Hi folks, When I am predicting Binary 1/0

Why is parsing a CSV incredibly wasteful with Java Heap memory?

2014-10-13 Thread Aris
, Iterable[Array[Double]])] = input.groupBy(_.last) Thank you! Aris

Re: Why is parsing a CSV incredibly wasteful with Java Heap memory?

2014-10-13 Thread Aris
it. You can map to Float in this case to halve the memory, if that works for your use case. This is just kind of how Strings and floating-point work in the JVM, nothing Spark-specific. On Mon, Oct 13, 2014 at 9:12 PM, Aris arisofala...@gmail.com wrote: Hi guys, I am trying just parse out values

Memory Leaks? 1GB input file turns into 8GB memory use in JVM... from parsing CSV

2014-10-09 Thread Aris
Hello Spark folks, I am doing a simple parsing of a CSV input file, and the input file is very large (~1GB). It seems I have a memory leak here and I am destroying my server. After using jmap to generate a Java heap dump and using the Eclipse Memory Analyzer, I basically learned that when I read

Re: Anybody built the branch for Adaptive Boosting, extension to MLlib by Manish Amde?

2014-09-24 Thread Aris
to do a normal ./sbt/sbt assembly build, or do I need to do something else? Thank you and take care Aris On Thu, Sep 18, 2014 at 3:50 PM, Manish Amde manish...@gmail.com wrote: Hi Aris, Thanks for the interest. First and foremost, tree ensembles are a top priority for the 1.2 release and we

Re: return probability \ confidence instead of actual class

2014-09-24 Thread Aris
with the model.clearThreshold() you can just get the raw predicted scores, removing the threshold which simple translates that into a positive/negative class. API is here http://yhuai.github.io/site/api/scala/index.html#org.apache.spark.mllib.classification.SVMModel Enjoy! Aris On Sun, Sep 21, 2014 at 11

Re: MLlib - Possible to use SVM with Radial Basis Function kernel rather than Linear Kernel?

2014-09-18 Thread Aris
Sorry to bother you guys, but does anybody have any ideas about the status of MLlib with a Radial Basis Function kernel for SVM? Thank you! On Tue, Sep 16, 2014 at 3:27 PM, Aris wrote: Hello Spark Community - I am using the support vector machine / SVM implementation in MLlib

Anybody built the branch for Adaptive Boosting, extension to MLlib by Manish Amde?

2014-09-18 Thread Aris
Thank you Spark community you make life much more lovely - suffering in silence is not fun! I am trying to build the Spark Git branch from Manish Amde, available here: https://github.com/manishamde/spark/tree/ada_boost I am trying to build the non-master branch 'ada_boost' (in the link above),

MLlib - Possible to use SVM with Radial Basis Function kernel rather than Linear Kernel?

2014-09-16 Thread Aris
data... which begs the question, is there some kind of support for RBF kernels rather than linear kernels? In small data tests using R the RBF kernel worked really well, and linear kernel never converged...so I would really like to use RBF. Thank you folks for any help! Aris

Re: org.apache.spark.SparkException: java.io.FileNotFoundException: does not exist)

2014-09-16 Thread Aris
This should be a really simple problem, but you haven't shared enough code to determine what's going on here. On Tue, Sep 16, 2014 at 8:08 AM, Hui Li littleleave...@gmail.com wrote: Hi, I am new to SPARK. I just set up a small cluster and wanted to run some simple MLLIB examples. By

Re: Spark Streaming with Kafka, building project with 'sbt assembly' is extremely slow

2014-09-04 Thread Aris
the entire running command - bin/spark-submit --master local[*] --jars $(find /home/data/.ivy2/cache/ -iname *.jar | tr '\n' ,) --class KafkaStreamConsumer ~/code_host/data/scala/streamingKafka/target/scala-2.10/streamingkafka_2.10-1.0.jar node1:2181 my-consumer-group aris-topic 1 This is fairly

Spark Streaming with Kafka, building project with 'sbt assembly' is extremely slow

2014-08-29 Thread Aris
using sbt package? This is process is working, but very slow. Any help with speeding up this compilation is really appreciated!! Aris - import AssemblyKeys._ // put this at the top of the file name := streamingKafka version := 1.0 scalaVersion := 2.10.4

Re: Client application that calls Spark and receives an MLlib *model* Scala Object, not just result

2014-07-15 Thread Aris
a location back to the client on a successful write. On Mon, Jul 14, 2014 at 4:27 PM, Aris Vlasakakis a...@vlasakakis.com wrote: Hello Spark community, I would like to write an application in Scala that i a model server. It should have an MLlib Linear Regression model that is already trained

Re: Submitting to a cluster behind a VPN, configuring different IP address

2014-07-15 Thread Aris Vlasakakis
Hello! Just curious if anybody could respond to my original message, if anybody knows about how to set the configuration variables that are handles by Jetty and not Spark's native framework..which is Akka I think? Thanks On Thu, Jul 10, 2014 at 4:04 PM, Aris Vlasakakis a...@vlasakakis.com

Client application that calls Spark and receives an MLlib *model* Scala Object, not just result

2014-07-14 Thread Aris Vlasakakis
pattern. Thanks! -- Άρης Βλασακάκης Aris Vlasakakis

Re: Cannot submit to a Spark Application to a remote cluster Spark 1.0

2014-07-10 Thread Aris Vlasakakis
of spark-submit but do remember a very similar issue when trying to run a Spark driver on a windows machine against a Spark Master on Ubuntu cluster (the SPARK_HOME directories were obviously different) On Wed, Jul 9, 2014 at 7:18 PM, Aris Vlasakakis a...@vlasakakis.com wrote: Hello everybody

Submitting to a cluster behind a VPN, configuring different IP address

2014-07-10 Thread Aris Vlasakakis
Master need to be on the same network segment (the VPN subnetwork ). Am I on the right track? How can I set spark.fileserver.uri and spark.httpBroadcast.uri ? I see that these are actually run by Jetty server...any thoughts? Thank you so much! -- Άρης Βλασακάκης Aris Vlasakakis