SparkML 1.6 GBTRegressor crashes with high maxIter hyper-parameter?

2017-01-25 Thread Aris
When I train a GBTRegressor model from a DataFrame in the latest 1.6.4-Snapshot, with a high number for the hyper-parameter maxIter, say 500, we have java.lang.StackOverflowError; GBTRegressor does work with maxIter set about 100. Does this make sense? Are there any known solutions? This is runnin

Re: Spark 2.0.1 has been published?

2016-10-06 Thread Aris
Mario -- I would recommend downloading and building from source, as the repositories could be lagged On Thu, Oct 6, 2016 at 4:00 PM, miliofotou wrote: > I can verify this as well. Even though I can download the 2.0.1 binary just > fine, I cannot find the 2.0.1 artifacts on mvnrepository.com or a

Re: Possible Code Generation Bug: Can Spark 2.0 Datasets handle Scala Value Classes?

2016-09-01 Thread Aris
expressed interest in code generation / encoder problems I have found recently. Here is the problem: https://issues.apache.org/jira/browse/SPARK-17368 Thank you On Thu, Sep 1, 2016 at 3:09 PM, Aris wrote: > Thank you Jakob on two counts > > 1. Yes, thanks for pointing out that sp

Re: Possible Code Generation Bug: Can Spark 2.0 Datasets handle Scala Value Classes?

2016-09-01 Thread Aris
> Hi Aris, > thanks for sharing this issue. I can confirm that value classes > currently don't work, however I can't think of reason why they > shouldn't be supported. I would therefore recommend that you report > this as a bug. > > (Btw, value classes also cu

Possible Code Generation Bug: Can Spark 2.0 Datasets handle Scala Value Classes?

2016-09-01 Thread Aris
Hello Spark community - Does Spark 2.0 Datasets *not support* Scala Value classes (basically "extends AnyVal" with a bunch of limitations) ? I am trying to do something like this: case class FeatureId(value: Int) extends AnyVal val seq = Seq(FeatureId(1),FeatureId(2),FeatureId(3)) import spark.i

Re: Spark 2.0.0 JaninoRuntimeException

2016-08-16 Thread Aris
issue being resolved recently: https://issues.apach >> e.org/jira/browse/SPARK-15285 >> >> On Fri, Aug 12, 2016 at 3:33 PM, Aris wrote: >> >>> Hello folks, >>> >>> I'm on Spark 2.0.0 working with Datasets -- and despite the fact that >>&

Re: Spark 2.0.0 JaninoRuntimeException

2016-08-16 Thread Aris
> There was a test: > SPARK-15285 Generated SpecificSafeProjection.apply method grows beyond 64KB > > See if it matches your use case. > > On Tue, Aug 16, 2016 at 8:41 AM, Aris wrote: > >> I am still working on making a minimal test that I can share without my >> work-specific code

Re: Spark 2.0.0 JaninoRuntimeException

2016-08-16 Thread Aris
aster branch. > Should we reopen SPARK-15285? > > Best Regards, > Kazuaki Ishizaki, > > > > From:Ted Yu > To:dhruve ashar > Cc:Aris , "user@spark.apache.org" < > user@spark.apache.org> > Da

Spark 2.0.0 JaninoRuntimeException

2016-08-12 Thread Aris
Hello folks, I'm on Spark 2.0.0 working with Datasets -- and despite the fact that smaller data unit tests work on my laptop, when I'm on a cluster, I get cryptic error messages: Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method > "(Lorg/apache/spark/sql/catalyst/InternalRow;L

Spark Streaming 1.6 mapWithState not working well with Kryo Serialization

2016-03-02 Thread Aris
ava:472) Since this class is private with spark streaming itself, I cannot actually register it with Kryo, and I cannot do registrationRequired in order to make sure *everything* has been serialized with Kryo. Is this a bug? Can I somehow solve this? Aris

Re: Streaming mapWithState API has NullPointerException

2016-02-22 Thread Aris
If I build from git branch origin/branch-1.6 will I be OK to test out my code? Thank you so much TD! Aris On Mon, Feb 22, 2016 at 2:48 PM, Tathagata Das wrote: > There were a few bugs that were solved with mapWithState recently. Would > be available in 1.6.1 (RC to be cut soon). >

Streaming mapWithState API has NullPointerException

2016-02-22 Thread Aris
Hello Spark community, and especially TD and Spark Streaming folks: I am using the new Spark 1.6.0 Streaming mapWithState API, in order to accomplish a streaming joining task with data. Things work fine on smaller sets of data, but on a single-node large cluster with JSON strings amounting to 2.5

Re: Spark Streaming 1.6 accumulating state across batches for joins

2015-12-02 Thread Aris
d be val rawLEFT: DStream[String] = ssc.textFileStream(dirLEFT) val rawRIGHT: DStream[String] = ssc.textFileStream(dirRIGHT) On Wed, Dec 2, 2015 at 2:12 PM, Aris wrote: > Hello folks, > > I'm on the newest spark 1.6.0-SNAPSHOT Spark Streaming with the new > trackStateByKey API.

Spark Streaming 1.6 accumulating state across batches for joins

2015-12-02 Thread Aris
Hello folks, I'm on the newest spark 1.6.0-SNAPSHOT Spark Streaming with the new trackStateByKey API. I'm trying to do something fairly simple that requires knowing state across minibatches, so I am trying to see if it can be done. I basically have two types of data to do a join, left-side and ri

Re: Bug in ElasticSearch and Spark SQL: Using SQL to query out data from JSON documents is totally wrong!

2015-02-11 Thread Aris
y X. Are you saying this behavior does not happen when I connect to ElasticSearch? Every single JSON document must contain every single key, or else the application crashes? So If one single JSON document is missing key X from the elasticsearch data, the application throws an Exception? Thank you! Ari

Bug in ElasticSearch and Spark SQL: Using SQL to query out data from JSON documents is totally wrong!

2015-02-10 Thread Aris
I'm using ElasticSearch with elasticsearch-spark-BUILD-SNAPSHOT and Spark/SparkSQL 1.2.0, from Costin Leau's advice. I want to query ElasticSearch for a bunch of JSON documents from within SparkSQL, and then use a SQL query to simply query for a column, which is actually a JSON key -- normal thing

Re: SparkSQL 1.2 and ElasticSearch-Spark 1.4 not working together, NoSuchMethodError problems

2015-02-09 Thread Aris
asticsearch.org/guide/en/elasticsearch/hadoop/ > master/install.html#download-dev > [2] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/ > master/spark.html > > On 2/9/15 9:33 PM, Aris wrote: > >> Hello Spark community and Holden, >> >> I am trying to follow Hol

SparkSQL 1.2 and ElasticSearch-Spark 1.4 not working together, NoSuchMethodError problems

2015-02-09 Thread Aris
Hello Spark community and Holden, I am trying to follow Holden Karau's SparkSQL and ElasticSearch tutorial from Spark Summit 2014. I am trying to use elasticsearch-spark 2.1.0.Beta3 and SparkSQL 1.2 together. https://github.com/holdenk/elasticsearchspark *(Side Note: This very nice tutorial does

Status of MLLib exporting models to PMML

2014-11-10 Thread Aris
g like Manish Amde's boosted ensemble tree methods be representable in PMML? Thank you!! Aris

Re: MLlib - Does LogisticRegressionModel.clearThreshold() no longer work?

2014-10-14 Thread Aris
Wow...I just tried LogisticRegressionWithLBFGS, and using clearThreshold() DOES IN FACT work. It appears the the LogsticRegressionWithSGD returns a model whose method is broken!! On Tue, Oct 14, 2014 at 3:14 PM, Aris wrote: > Hi folks, > > When I am predicting Binary 1/0 respo

MLlib - Does LogisticRegressionModel.clearThreshold() no longer work?

2014-10-14 Thread Aris
getting a "realistic" probability that is between 0.0 and 1.0, I am only getting back predictions of 0.0 OR 1.0...never anything in between. The API says that clearThreshold is "experimental" ...it was working before! Is it broken now? Thanks! Aris

Re: Why is parsing a CSV incredibly wasteful with Java Heap memory?

2014-10-13 Thread Aris
s? that could > explain it. > > You can map to Float in this case to halve the memory, if that works > for your use case. This is just kind of how Strings and floating-point > work in the JVM, nothing Spark-specific. > > On Mon, Oct 13, 2014 at 9:12 PM, Aris wrote: >

Why is parsing a CSV incredibly wasteful with Java Heap memory?

2014-10-13 Thread Aris
ped: RDD[(Double, Iterable[Array[Double]])] = input.groupBy(_.last) Thank you! Aris

Memory Leaks? 1GB input file turns into 8GB memory use in JVM... from parsing CSV

2014-10-09 Thread Aris
Hello Spark folks, I am doing a simple parsing of a CSV input file, and the input file is very large (~1GB). It seems I have a memory leak here and I am destroying my server. After using jmap to generate a Java heap dump and using the Eclipse Memory Analyzer, I basically learned that when I read i

Re: return probability \ confidence instead of actual class

2014-09-24 Thread Aris
model.clearThreshold() you can just get the raw predicted scores, removing the threshold which simple translates that into a positive/negative class. API is here http://yhuai.github.io/site/api/scala/index.html#org.apache.spark.mllib.classification.SVMModel Enjoy! Aris On Sun, Sep 21, 2014 at 11

Re: Anybody built the branch for Adaptive Boosting, extension to MLlib by Manish Amde?

2014-09-24 Thread Aris
le to do a normal "./sbt/sbt assembly" build, or do I need to do something else? Thank you and take care Aris On Thu, Sep 18, 2014 at 3:50 PM, Manish Amde wrote: > Hi Aris, > > Thanks for the interest. First and foremost, tree ensembles are a top > priority for the 1.2 rele

Anybody built the branch for Adaptive Boosting, extension to MLlib by Manish Amde?

2014-09-18 Thread Aris
Thank you Spark community you make life much more lovely - suffering in silence is not fun! I am trying to build the Spark Git branch from Manish Amde, available here: https://github.com/manishamde/spark/tree/ada_boost I am trying to build the non-master branch 'ada_boost' (in the link above), b

Re: MLlib - Possible to use SVM with Radial Basis Function kernel rather than Linear Kernel?

2014-09-18 Thread Aris
Sorry to bother you guys, but does anybody have any ideas about the status of MLlib with a Radial Basis Function kernel for SVM? Thank you! On Tue, Sep 16, 2014 at 3:27 PM, Aris < wrote: > Hello Spark Community - > > I am using the support vector machine / SVM implementation in MLli

Re: org.apache.spark.SparkException: java.io.FileNotFoundException: does not exist)

2014-09-16 Thread Aris
This should be a really simple problem, but you haven't shared enough code to determine what's going on here. On Tue, Sep 16, 2014 at 8:08 AM, Hui Li wrote: > Hi, > > I am new to SPARK. I just set up a small cluster and wanted to run some > simple MLLIB examples. By following the instructions of

MLlib - Possible to use SVM with Radial Basis Function kernel rather than Linear Kernel?

2014-09-16 Thread Aris
data... which begs the question, is there some kind of support for RBF kernels rather than linear kernels? In small data tests using R the RBF kernel worked really well, and linear kernel never converged...so I would really like to use RBF. Thank you folks for any help! Aris

Re: Categorical Features for K-Means Clustering

2014-09-16 Thread Aris
Yeah - another vote here to do what's called One-Hot encoding, just convert the single categorical feature into N columns, where N is the number of distinct values of that feature, with a single one and all the other features/columns set to zero. On Tue, Sep 16, 2014 at 2:16 PM, Sean Owen wrote:

Re: Spark Streaming with Kafka, building project with 'sbt assembly' is extremely slow

2014-09-04 Thread Aris
iname *.jar | tr '\n' ,) Here's the entire running command - bin/spark-submit --master local[*] --jars $(find /home/data/.ivy2/cache/ -iname *.jar | tr '\n' ,) --class KafkaStreamConsumer ~/code_host/data/scala/streamingKafka/target/scala-2.10/streamingkafka_2.10-1.0.jar node1

Spark Streaming with Kafka, building project with 'sbt assembly' is extremely slow

2014-08-29 Thread Aris
build this JAR file, just using sbt package? This is process is working, but very slow. Any help with speeding up this compilation is really appreciated!! Aris - import AssemblyKeys._ // put this at the top of the file name := "streamingKafka&qu

Re: Submitting to a cluster behind a VPN, configuring different IP address

2014-07-15 Thread Aris Vlasakakis
Hello! Just curious if anybody could respond to my original message, if anybody knows about how to set the configuration variables that are handles by Jetty and not Spark's native framework..which is Akka I think? Thanks On Thu, Jul 10, 2014 at 4:04 PM, Aris Vlasakakis wrote: >

Re: Client application that calls Spark and receives an MLlib *model* Scala Object, not just result

2014-07-15 Thread Aris
e and return a location back > to the client on a successful write. > > > > > > On Mon, Jul 14, 2014 at 4:27 PM, Aris Vlasakakis > wrote: > >> Hello Spark community, >> >> I would like to write an application in Scala that i a model server. It >>

Client application that calls Spark and receives an MLlib *model* Scala Object, not just result

2014-07-14 Thread Aris Vlasakakis
y common design pattern. Thanks! -- Άρης Βλασακάκης Aris Vlasakakis

Submitting to a cluster behind a VPN, configuring different IP address

2014-07-10 Thread Aris Vlasakakis
y spark.fileserver.uri and the Spark Master need to be on the same network segment (the VPN subnetwork ). Am I on the right track? How can I set "spark.fileserver.uri" and "spark.httpBroadcast.uri" ? I see that these are actually run by Jetty server...any thoughts? Thank you so much! -- Άρης Βλασακάκης Aris Vlasakakis

Re: Cannot submit to a Spark Application to a remote cluster Spark 1.0

2014-07-10 Thread Aris Vlasakakis
Andrew, thank you so much! That worked! I had to manually set the spark.home configuration in the SparkConf object using .set("spark.home","/cluster/path/to/spark/"), and then I was able to submit from my laptop to the cluster! Aris On Thu, Jul 10, 2014 at 11:41 AM, Andrew O

Re: Cannot submit to a Spark Application to a remote cluster Spark 1.0

2014-07-10 Thread Aris Vlasakakis
he cluster? I'm on an older > drop so not sure about the finer points of spark-submit but do > remember a very similar issue when trying to run a Spark driver on a > windows machine against a Spark Master on Ubuntu cluster (the > SPARK_HOME directories were obviously different)

Cannot submit to a Spark Application to a remote cluster Spark 1.0

2014-07-09 Thread Aris Vlasakakis
a:1207) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) -- Άρης Βλασακάκης Aris Vlasakakis