When I train a GBTRegressor model from a DataFrame in the latest
1.6.4-Snapshot, with a high number for the hyper-parameter maxIter, say
500, we have java.lang.StackOverflowError; GBTRegressor does work with
maxIter set about 100.
Does this make sense? Are there any known solutions? This is
expressed interest in code
generation / encoder problems I have found recently.
Here is the problem:
https://issues.apache.org/jira/browse/SPARK-17368
Thank you
On Thu, Sep 1, 2016 at 3:09 PM, Aris <arisofala...@gmail.com> wrote:
> Thank you Jakob on two counts
>
>
;ja...@odersky.com> wrote:
> Hi Aris,
> thanks for sharing this issue. I can confirm that value classes
> currently don't work, however I can't think of reason why they
> shouldn't be supported. I would therefore recommend that you report
> this as a bug.
>
> (Btw, value classes also
Hello Spark community -
Does Spark 2.0 Datasets *not support* Scala Value classes (basically
"extends AnyVal" with a bunch of limitations) ?
I am trying to do something like this:
case class FeatureId(value: Int) extends AnyVal
val seq = Seq(FeatureId(1),FeatureId(2),FeatureId(3))
import
com>
> wrote:
>
>> I see a similar issue being resolved recently: https://issues.apach
>> e.org/jira/browse/SPARK-15285
>>
>> On Fri, Aug 12, 2016 at 3:33 PM, Aris <arisofala...@gmail.com> wrote:
>>
>>> Hello folks,
>>>
>>> I'
?
>
> There was a test:
> SPARK-15285 Generated SpecificSafeProjection.apply method grows beyond 64KB
>
> See if it matches your use case.
>
> On Tue, Aug 16, 2016 at 8:41 AM, Aris <arisofala...@gmail.com> wrote:
>
>> I am still working on making a minimal test tha
an reproduce the problem in SPARK-15285 with master branch.
> Should we reopen SPARK-15285?
>
> Best Regards,
> Kazuaki Ishizaki,
>
>
>
> From:Ted Yu <yuzhih...@gmail.com>
> To:dhruve ashar <dhruveas...@gmail.com>
> Cc:Aris &l
Hello folks,
I'm on Spark 2.0.0 working with Datasets -- and despite the fact that
smaller data unit tests work on my laptop, when I'm on a cluster, I get
cryptic error messages:
Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method
>
ryo.java:472)
Since this class is private with spark streaming itself, I cannot actually
register it with Kryo, and I cannot do registrationRequired in order to
make sure *everything* has been serialized with Kryo.
Is this a bug? Can I somehow solve this?
Aris
If I build from git branch origin/branch-1.6 will I be OK to test out my
code?
Thank you so much TD!
Aris
On Mon, Feb 22, 2016 at 2:48 PM, Tathagata Das <tathagata.das1...@gmail.com>
wrote:
> There were a few bugs that were solved with mapWithState recently. Would
> be available
Hello Spark community, and especially TD and Spark Streaming folks:
I am using the new Spark 1.6.0 Streaming mapWithState API, in order to
accomplish a streaming joining task with data.
Things work fine on smaller sets of data, but on a single-node large
cluster with JSON strings amounting to
d be
val rawLEFT: DStream[String] = ssc.textFileStream(dirLEFT)
val rawRIGHT: DStream[String] = ssc.textFileStream(dirRIGHT)
On Wed, Dec 2, 2015 at 2:12 PM, Aris <arisofala...@gmail.com> wrote:
> Hello folks,
>
> I'm on the newest spark 1.6.0-SNAPSHOT Spark Streaming with the new
&
this behavior does not happen when I connect to
ElasticSearch? Every single JSON document must contain every single key, or
else the application crashes? So If one single JSON document is missing key
X from the elasticsearch data, the application throws an Exception?
Thank you!
Aris
On Wed, Feb 11, 2015
I'm using ElasticSearch with elasticsearch-spark-BUILD-SNAPSHOT and
Spark/SparkSQL 1.2.0, from Costin Leau's advice.
I want to query ElasticSearch for a bunch of JSON documents from within
SparkSQL, and then use a SQL query to simply query for a column, which is
actually a JSON key -- normal
Hello Spark community and Holden,
I am trying to follow Holden Karau's SparkSQL and ElasticSearch tutorial
from Spark Summit 2014. I am trying to use elasticsearch-spark 2.1.0.Beta3
and SparkSQL 1.2 together.
https://github.com/holdenk/elasticsearchspark
*(Side Note: This very nice tutorial does
ensemble tree methods be representable
in PMML?
Thank you!!
Aris
getting a realistic probability that is
between 0.0 and 1.0, I am only getting back predictions of 0.0 OR
1.0...never anything in between.
The API says that clearThreshold is experimental ...it was working
before! Is it broken now?
Thanks!
Aris
Wow...I just tried LogisticRegressionWithLBFGS, and using clearThreshold()
DOES IN FACT work. It appears the the LogsticRegressionWithSGD returns a
model whose method is broken!!
On Tue, Oct 14, 2014 at 3:14 PM, Aris arisofala...@gmail.com wrote:
Hi folks,
When I am predicting Binary 1/0
, Iterable[Array[Double]])] =
input.groupBy(_.last)
Thank you!
Aris
it.
You can map to Float in this case to halve the memory, if that works
for your use case. This is just kind of how Strings and floating-point
work in the JVM, nothing Spark-specific.
On Mon, Oct 13, 2014 at 9:12 PM, Aris arisofala...@gmail.com wrote:
Hi guys,
I am trying just parse out values
Hello Spark folks,
I am doing a simple parsing of a CSV input file, and the input file is very
large (~1GB). It seems I have a memory leak here and I am destroying my
server. After using jmap to generate a Java heap dump and using the Eclipse
Memory Analyzer, I basically learned that when I read
to do a normal ./sbt/sbt assembly build,
or do I need to do something else?
Thank you and take care
Aris
On Thu, Sep 18, 2014 at 3:50 PM, Manish Amde manish...@gmail.com wrote:
Hi Aris,
Thanks for the interest. First and foremost, tree ensembles are a top
priority for the 1.2 release and we
with the model.clearThreshold() you can just get the
raw predicted scores, removing the threshold which simple translates that
into a positive/negative class.
API is here
http://yhuai.github.io/site/api/scala/index.html#org.apache.spark.mllib.classification.SVMModel
Enjoy!
Aris
On Sun, Sep 21, 2014 at 11
Sorry to bother you guys, but does anybody have any ideas about the status
of MLlib with a Radial Basis Function kernel for SVM?
Thank you!
On Tue, Sep 16, 2014 at 3:27 PM, Aris wrote:
Hello Spark Community -
I am using the support vector machine / SVM implementation in MLlib
Thank you Spark community you make life much more lovely - suffering in
silence is not fun!
I am trying to build the Spark Git branch from Manish Amde, available here:
https://github.com/manishamde/spark/tree/ada_boost
I am trying to build the non-master branch 'ada_boost' (in the link above),
data...
which begs the question, is there some kind of support for RBF kernels
rather than linear kernels? In small data tests using R the RBF kernel
worked really well, and linear kernel never converged...so I would really
like to use RBF.
Thank you folks for any help!
Aris
This should be a really simple problem, but you haven't shared enough code
to determine what's going on here.
On Tue, Sep 16, 2014 at 8:08 AM, Hui Li littleleave...@gmail.com wrote:
Hi,
I am new to SPARK. I just set up a small cluster and wanted to run some
simple MLLIB examples. By
the entire running command -
bin/spark-submit --master local[*] --jars $(find /home/data/.ivy2/cache/
-iname *.jar | tr '\n' ,) --class KafkaStreamConsumer
~/code_host/data/scala/streamingKafka/target/scala-2.10/streamingkafka_2.10-1.0.jar
node1:2181 my-consumer-group aris-topic 1
This is fairly
using sbt
package? This is process is working, but very slow.
Any help with speeding up this compilation is really appreciated!!
Aris
-
import AssemblyKeys._ // put this at the top of the file
name := streamingKafka
version := 1.0
scalaVersion := 2.10.4
a location back
to the client on a successful write.
On Mon, Jul 14, 2014 at 4:27 PM, Aris Vlasakakis a...@vlasakakis.com
wrote:
Hello Spark community,
I would like to write an application in Scala that i a model server. It
should have an MLlib Linear Regression model that is already trained
Hello!
Just curious if anybody could respond to my original message, if anybody
knows about how to set the configuration variables that are handles by
Jetty and not Spark's native framework..which is Akka I think?
Thanks
On Thu, Jul 10, 2014 at 4:04 PM, Aris Vlasakakis a...@vlasakakis.com
pattern.
Thanks!
--
Άρης Βλασακάκης
Aris Vlasakakis
of spark-submit but do
remember a very similar issue when trying to run a Spark driver on a
windows machine against a Spark Master on Ubuntu cluster (the
SPARK_HOME directories were obviously different)
On Wed, Jul 9, 2014 at 7:18 PM, Aris Vlasakakis a...@vlasakakis.com
wrote:
Hello everybody
Master need to be on the same network segment (the VPN subnetwork ).
Am I on the right track? How can I set spark.fileserver.uri and
spark.httpBroadcast.uri ? I see that these are actually run by Jetty
server...any thoughts?
Thank you so much!
--
Άρης Βλασακάκης
Aris Vlasakakis
34 matches
Mail list logo