When I train a GBTRegressor model from a DataFrame in the latest
1.6.4-Snapshot, with a high number for the hyper-parameter maxIter, say
500, we have java.lang.StackOverflowError; GBTRegressor does work with
maxIter set about 100.
Does this make sense? Are there any known solutions? This is runnin
Mario -- I would recommend downloading and building from source, as the
repositories could be lagged
On Thu, Oct 6, 2016 at 4:00 PM, miliofotou wrote:
> I can verify this as well. Even though I can download the 2.0.1 binary just
> fine, I cannot find the 2.0.1 artifacts on mvnrepository.com or a
expressed interest in code
generation / encoder problems I have found recently.
Here is the problem:
https://issues.apache.org/jira/browse/SPARK-17368
Thank you
On Thu, Sep 1, 2016 at 3:09 PM, Aris wrote:
> Thank you Jakob on two counts
>
> 1. Yes, thanks for pointing out that sp
> Hi Aris,
> thanks for sharing this issue. I can confirm that value classes
> currently don't work, however I can't think of reason why they
> shouldn't be supported. I would therefore recommend that you report
> this as a bug.
>
> (Btw, value classes also cu
Hello Spark community -
Does Spark 2.0 Datasets *not support* Scala Value classes (basically
"extends AnyVal" with a bunch of limitations) ?
I am trying to do something like this:
case class FeatureId(value: Int) extends AnyVal
val seq = Seq(FeatureId(1),FeatureId(2),FeatureId(3))
import spark.i
issue being resolved recently: https://issues.apach
>> e.org/jira/browse/SPARK-15285
>>
>> On Fri, Aug 12, 2016 at 3:33 PM, Aris wrote:
>>
>>> Hello folks,
>>>
>>> I'm on Spark 2.0.0 working with Datasets -- and despite the fact that
>>&
> There was a test:
> SPARK-15285 Generated SpecificSafeProjection.apply method grows beyond 64KB
>
> See if it matches your use case.
>
> On Tue, Aug 16, 2016 at 8:41 AM, Aris wrote:
>
>> I am still working on making a minimal test that I can share without my
>> work-specific code
aster branch.
> Should we reopen SPARK-15285?
>
> Best Regards,
> Kazuaki Ishizaki,
>
>
>
> From:Ted Yu
> To:dhruve ashar
> Cc:Aris , "user@spark.apache.org" <
> user@spark.apache.org>
> Da
Hello folks,
I'm on Spark 2.0.0 working with Datasets -- and despite the fact that
smaller data unit tests work on my laptop, when I'm on a cluster, I get
cryptic error messages:
Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method
> "(Lorg/apache/spark/sql/catalyst/InternalRow;L
ava:472)
Since this class is private with spark streaming itself, I cannot actually
register it with Kryo, and I cannot do registrationRequired in order to
make sure *everything* has been serialized with Kryo.
Is this a bug? Can I somehow solve this?
Aris
If I build from git branch origin/branch-1.6 will I be OK to test out my
code?
Thank you so much TD!
Aris
On Mon, Feb 22, 2016 at 2:48 PM, Tathagata Das
wrote:
> There were a few bugs that were solved with mapWithState recently. Would
> be available in 1.6.1 (RC to be cut soon).
>
Hello Spark community, and especially TD and Spark Streaming folks:
I am using the new Spark 1.6.0 Streaming mapWithState API, in order to
accomplish a streaming joining task with data.
Things work fine on smaller sets of data, but on a single-node large
cluster with JSON strings amounting to 2.5
d be
val rawLEFT: DStream[String] = ssc.textFileStream(dirLEFT)
val rawRIGHT: DStream[String] = ssc.textFileStream(dirRIGHT)
On Wed, Dec 2, 2015 at 2:12 PM, Aris wrote:
> Hello folks,
>
> I'm on the newest spark 1.6.0-SNAPSHOT Spark Streaming with the new
> trackStateByKey API.
Hello folks,
I'm on the newest spark 1.6.0-SNAPSHOT Spark Streaming with the new
trackStateByKey API. I'm trying to do something fairly simple that requires
knowing state across minibatches, so I am trying to see if it can be done.
I basically have two types of data to do a join, left-side and ri
y X.
Are you saying this behavior does not happen when I connect to
ElasticSearch? Every single JSON document must contain every single key, or
else the application crashes? So If one single JSON document is missing key
X from the elasticsearch data, the application throws an Exception?
Thank you!
Ari
I'm using ElasticSearch with elasticsearch-spark-BUILD-SNAPSHOT and
Spark/SparkSQL 1.2.0, from Costin Leau's advice.
I want to query ElasticSearch for a bunch of JSON documents from within
SparkSQL, and then use a SQL query to simply query for a column, which is
actually a JSON key -- normal thing
asticsearch.org/guide/en/elasticsearch/hadoop/
> master/install.html#download-dev
> [2] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/
> master/spark.html
>
> On 2/9/15 9:33 PM, Aris wrote:
>
>> Hello Spark community and Holden,
>>
>> I am trying to follow Hol
Hello Spark community and Holden,
I am trying to follow Holden Karau's SparkSQL and ElasticSearch tutorial
from Spark Summit 2014. I am trying to use elasticsearch-spark 2.1.0.Beta3
and SparkSQL 1.2 together.
https://github.com/holdenk/elasticsearchspark
*(Side Note: This very nice tutorial does
g like Manish Amde's boosted ensemble tree methods be representable
in PMML?
Thank you!!
Aris
Wow...I just tried LogisticRegressionWithLBFGS, and using clearThreshold()
DOES IN FACT work. It appears the the LogsticRegressionWithSGD returns a
model whose method is broken!!
On Tue, Oct 14, 2014 at 3:14 PM, Aris wrote:
> Hi folks,
>
> When I am predicting Binary 1/0 respo
getting a "realistic" probability that is
between 0.0 and 1.0, I am only getting back predictions of 0.0 OR
1.0...never anything in between.
The API says that clearThreshold is "experimental" ...it was working
before! Is it broken now?
Thanks!
Aris
s? that could
> explain it.
>
> You can map to Float in this case to halve the memory, if that works
> for your use case. This is just kind of how Strings and floating-point
> work in the JVM, nothing Spark-specific.
>
> On Mon, Oct 13, 2014 at 9:12 PM, Aris wrote:
>
ped: RDD[(Double, Iterable[Array[Double]])] =
input.groupBy(_.last)
Thank you!
Aris
Hello Spark folks,
I am doing a simple parsing of a CSV input file, and the input file is very
large (~1GB). It seems I have a memory leak here and I am destroying my
server. After using jmap to generate a Java heap dump and using the Eclipse
Memory Analyzer, I basically learned that when I read i
model.clearThreshold() you can just get the
raw predicted scores, removing the threshold which simple translates that
into a positive/negative class.
API is here
http://yhuai.github.io/site/api/scala/index.html#org.apache.spark.mllib.classification.SVMModel
Enjoy!
Aris
On Sun, Sep 21, 2014 at 11
le to do a normal "./sbt/sbt assembly" build,
or do I need to do something else?
Thank you and take care
Aris
On Thu, Sep 18, 2014 at 3:50 PM, Manish Amde wrote:
> Hi Aris,
>
> Thanks for the interest. First and foremost, tree ensembles are a top
> priority for the 1.2 rele
Thank you Spark community you make life much more lovely - suffering in
silence is not fun!
I am trying to build the Spark Git branch from Manish Amde, available here:
https://github.com/manishamde/spark/tree/ada_boost
I am trying to build the non-master branch 'ada_boost' (in the link above),
b
Sorry to bother you guys, but does anybody have any ideas about the status
of MLlib with a Radial Basis Function kernel for SVM?
Thank you!
On Tue, Sep 16, 2014 at 3:27 PM, Aris < wrote:
> Hello Spark Community -
>
> I am using the support vector machine / SVM implementation in MLli
This should be a really simple problem, but you haven't shared enough code
to determine what's going on here.
On Tue, Sep 16, 2014 at 8:08 AM, Hui Li wrote:
> Hi,
>
> I am new to SPARK. I just set up a small cluster and wanted to run some
> simple MLLIB examples. By following the instructions of
data...
which begs the question, is there some kind of support for RBF kernels
rather than linear kernels? In small data tests using R the RBF kernel
worked really well, and linear kernel never converged...so I would really
like to use RBF.
Thank you folks for any help!
Aris
Yeah - another vote here to do what's called One-Hot encoding, just convert
the single categorical feature into N columns, where N is the number of
distinct values of that feature, with a single one and all the other
features/columns set to zero.
On Tue, Sep 16, 2014 at 2:16 PM, Sean Owen wrote:
iname *.jar | tr '\n' ,)
Here's the entire running command -
bin/spark-submit --master local[*] --jars $(find /home/data/.ivy2/cache/
-iname *.jar | tr '\n' ,) --class KafkaStreamConsumer
~/code_host/data/scala/streamingKafka/target/scala-2.10/streamingkafka_2.10-1.0.jar
node1
build this JAR file, just using sbt
package? This is process is working, but very slow.
Any help with speeding up this compilation is really appreciated!!
Aris
-
import AssemblyKeys._ // put this at the top of the file
name := "streamingKafka&qu
Hello!
Just curious if anybody could respond to my original message, if anybody
knows about how to set the configuration variables that are handles by
Jetty and not Spark's native framework..which is Akka I think?
Thanks
On Thu, Jul 10, 2014 at 4:04 PM, Aris Vlasakakis
wrote:
>
e and return a location back
> to the client on a successful write.
>
>
>
>
>
> On Mon, Jul 14, 2014 at 4:27 PM, Aris Vlasakakis
> wrote:
>
>> Hello Spark community,
>>
>> I would like to write an application in Scala that i a model server. It
>>
y
common design pattern.
Thanks!
--
Άρης Βλασακάκης
Aris Vlasakakis
y spark.fileserver.uri and the
Spark Master need to be on the same network segment (the VPN subnetwork ).
Am I on the right track? How can I set "spark.fileserver.uri" and
"spark.httpBroadcast.uri" ? I see that these are actually run by Jetty
server...any thoughts?
Thank you so much!
--
Άρης Βλασακάκης
Aris Vlasakakis
Andrew, thank you so much! That worked! I had to manually set the
spark.home configuration in the SparkConf object using
.set("spark.home","/cluster/path/to/spark/"), and then I was able to submit
from my laptop to the cluster!
Aris
On Thu, Jul 10, 2014 at 11:41 AM, Andrew O
he cluster? I'm on an older
> drop so not sure about the finer points of spark-submit but do
> remember a very similar issue when trying to run a Spark driver on a
> windows machine against a Spark Master on Ubuntu cluster (the
> SPARK_HOME directories were obviously different)
a:1207)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
--
Άρης Βλασακάκης
Aris Vlasakakis
40 matches
Mail list logo