Ignite is the renaming of GridGain, if that helps. It's like Oracle
Coherence, if that helps. These do share some similarities -- fault
tolerant, in-memory, distributed processing. The pieces they're built
on differ, the architecture differs, the APIs differ. So fairly
different in particulars. I
-https://wiki.apache.org/incubator/IgniteProposal has I think been updated
recently and has a good comparison.
- Although grid gain has been around since the spark days, Apache Ignite is
quite new and just getting started I think so
- you will probably want to reach out to the developers
To: Sean Owen
Cc: Ognen Duzlevski; user@spark.apache.org
Subject: Re: Apache Ignite vs Apache Spark
-https://wiki.apache.org/incubator/IgniteProposal has I think been updated
recently and has a good comparison.
- Although grid gain has been around since the spark days, Apache Ignite is
quite new
-
From: Jay Vyas [mailto:jayunit100.apa...@gmail.com]
Sent: Thursday, February 26, 2015 3:40 PM
To: Sean Owen
Cc: Ognen Duzlevski; user@spark.apache.org
Subject: Re: Apache Ignite vs Apache Spark
-https://wiki.apache.org/incubator/IgniteProposal has I think been updated
recently and has a good
Please add the Zagreb Meetup group, too.
http://www.meetup.com/Apache-Spark-Zagreb-Meetup/
Thanks!
On 18.2.2015. 19:46, Johan Beisser wrote:
If you could also add the Hamburg Apache Spark Meetup, I'd appreciate it.
http://www.meetup.com/Hamburg-Apache-Spark-Meetup/
On Tue, Feb 17, 2015
it.
Thanks in advance.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Periodic-Broadcast-in-Apache-Spark-Streaming-tp21703.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
If you could also add the Hamburg Apache Spark Meetup, I'd appreciate it.
http://www.meetup.com/Hamburg-Apache-Spark-Meetup/
On Tue, Feb 17, 2015 at 5:08 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
Thanks! I've added you.
Matei
On Feb 17, 2015, at 4:06 PM, Ralph Bergmann
:
http://apache-spark-user-list.1001560.n3.nabble.com/Periodic-Broadcast-in-Apache-Spark-Streaming-tp21703.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr
Hi,
there is a small Spark Meetup group in Berlin, Germany :-)
http://www.meetup.com/Berlin-Apache-Spark-Meetup/
Plaes add this group to the Meetups list at
https://spark.apache.org/community.html
Ralph
-
To unsubscribe, e
Thanks! I've added you.
Matei
On Feb 17, 2015, at 4:06 PM, Ralph Bergmann | the4thFloor.eu
ra...@the4thfloor.eu wrote:
Hi,
there is a small Spark Meetup group in Berlin, Germany :-)
http://www.meetup.com/Berlin-Apache-Spark-Meetup/
Plaes add this group to the Meetups list
for this error
Exception in thread main java.lang.NoClassDefFoundError:
org/apache/spark/SparkConf
...
Caused by: java.lang.ClassNotFoundException: org.apache.spark.SparkConf
Here is my build.sbt file
import _root_.sbt.Keys._
import _root_.sbtassembly.Plugin.AssemblyKeys._
import _root_
2, 2015 at 6:34 PM, Emre Sevinc emre.sev...@gmail.com wrote:
Hello,
I'm using Apache Spark Streaming 1.2.0 and trying to define a file filter
for file names when creating an InputDStream
https://spark.apache.org/docs/1.2.0/api/java/org/apache/spark/streaming/dstream/InputDStream.html
Hello,
I'm using Apache Spark Streaming 1.2.0 and trying to define a file filter
for file names when creating an InputDStream
https://spark.apache.org/docs/1.2.0/api/java/org/apache/spark/streaming/dstream/InputDStream.html
by invoking the fileStream
https://spark.apache.org/docs/1.2.0/api/java
(SequenceFileInputFormat.class));
Thanks
Best Regards
On Mon, Feb 2, 2015 at 6:34 PM, Emre Sevinc emre.sev...@gmail.com wrote:
Hello,
I'm using Apache Spark Streaming 1.2.0 and trying to define a file filter
for file names when creating an InputDStream
https://spark.apache.org/docs/1.2.0/api/java/org
in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-standalone-mode-number-of-cores-tp21342.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr
.
Since all the data is stored on a single local machine, it does not benefit
from distributed operations on RDDs.
How does it benefit and what internally is going on when Spark utilizes
several logical cores?
--
View this message in context:
http://apache-spark-user-list.1001560.n3
Spark Committers: Please advise the way forward for this issue.
Thanks for your support.
Regards,
Venkat
From: Venkat, Ankam
Sent: Thursday, January 22, 2015 9:34 AM
To: 'Frank Austin Nothaft'; 'user@spark.apache.org'
Cc: 'Nick Allen'
Subject: RE: How to 'Pipe' Binary Data in Apache Spark
How
at both the PoC and production stages.
On 21 Jan 2015, at 20:39, JacquesH jaaksem...@gmail.com wrote:
I've recently been trying to get to know Apache Spark as a replacement for
Scikit Learn, however it seems to me that even in simple cases, Scikit
converges to an accurate model far faster
: Frank Austin Nothaft [mailto:fnoth...@berkeley.edu]
Sent: Wednesday, January 21, 2015 12:30 PM
To: Venkat, Ankam
Cc: Nick Allen; user@spark.apache.org
Subject: Re: How to 'Pipe' Binary Data in Apache Spark
Hi Venkat/Nick,
The Spark RDD.pipe method pipes text data into a subprocess
Data in Apache Spark
Venkat,
No problem!
So, creating a custom InputFormat or using sc.binaryFiles alone is not the
right solution. We also need the modified version of RDD.pipe to support
binary data? Is my understanding correct?
Yep! That is correct. The custom InputFormat allows Spark
: What's your take on this?
Regards,
Venkat Ankam
From: Frank Austin Nothaft [mailto:fnoth...@berkeley.edu]
Sent: Wednesday, January 21, 2015 12:30 PM
To: Venkat, Ankam
Cc: Nick Allen; user@spark.apache.org
Subject: Re: How to 'Pipe' Binary Data in Apache Spark
Hi Venkat/Nick,
The Spark RDD.pipe
HiI'm using Apache Spark 1.1.0 and I'm currently having issue with broadcast
method. So when I call broadcast function on a small dataset to a 5 nodes
cluster, I experiencing the Error sending message as driverActor is null
after broadcast the variables several times (apps running under jboss
:09 AM
To: user@spark.apache.orgmailto:user@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org
Subject: How to 'Pipe' Binary Data in Apache Spark
I have an RDD containing binary data. I would like to use 'RDD.pipe' to pipe
that binary data to an external program
I'm using Apache Spark 1.1.0 and I'm currently having issue with broadcast
method. So when I call broadcast function on a small dataset to a 5 nodes
cluster, I experiencing the Error sending message as driverActor is null
after broadcast the variables several times (apps running under jboss).
Any
: 'function' object has no attribute 'read'
Any suggestions?
Regards,
Venkat Ankam
From: Nick Allen [mailto:n...@nickallen.org]
Sent: Friday, January 16, 2015 11:46 AM
To: user@spark.apache.org
Subject: Re: How to 'Pipe' Binary Data in Apache Spark
I just wanted to reiterate
different
options.
AttributeError: 'function' object has no attribute 'read'
Any suggestions?
Regards,
Venkat Ankam
From: Nick Allen [mailto:n...@nickallen.org]
Sent: Friday, January 16, 2015 11:46 AM
To: user@spark.apache.org
Subject: Re: How to 'Pipe' Binary Data in Apache Spark
I just wanted
I've recently been trying to get to know Apache Spark as a replacement for
Scikit Learn, however it seems to me that even in simple cases, Scikit
converges to an accurate model far faster than Spark does.
For example I generated 1000 data points for a very simple linear function
(z=x+y
and production stages.
On 21 Jan 2015, at 20:39, JacquesH jaaksem...@gmail.com wrote:
I've recently been trying to get to know Apache Spark as a replacement for
Scikit Learn, however it seems to me that even in simple cases, Scikit
converges to an accurate model far faster than Spark does
...@gmail.com wrote:
I've recently been trying to get to know Apache Spark as a replacement
for
Scikit Learn, however it seems to me that even in simple cases, Scikit
converges to an accurate model far faster than Spark does.
For example I generated 1000 data points for a very simple linear
I have an RDD containing binary data. I would like to use 'RDD.pipe' to
pipe that binary data to an external program that will translate it to
string/text data. Unfortunately, it seems that Spark is mangling the binary
data before it gets passed to the external program.
This code is representative
Well it looks like you're reading some kind of binary file as text.
That isn't going to work, in Spark or elsewhere, as binary data is not
even necessarily the valid encoding of a string. There are no line
breaks to delimit lines and thus elements of the RDD.
Your input has some record structure
Per your last comment, it appears I need something like this:
https://github.com/RIPE-NCC/hadoop-pcap
Thanks a ton. That get me oriented in the right direction.
On Fri, Jan 16, 2015 at 10:20 AM, Sean Owen so...@cloudera.com wrote:
Well it looks like you're reading some kind of binary file
I just wanted to reiterate the solution for the benefit of the community.
The problem is not from my use of 'pipe', but that 'textFile' cannot be
used to read in binary data. (Doh) There are a couple options to move
forward.
1. Implement a custom 'InputFormat' that understands the binary input
this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Creating-Apache-Spark-powered-As-Service-applications-tp21193.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe
be appreciated. Simple toy example program (or steps) that
shows, e.g. how to build such client for simply creating Spark Context on a
local machine and say reading text file and returning basic stats would be
ideal answer!
--
View this message in context:
http://apache-spark-user-list.1001560.n3
Emerging Technology Software Engineer
From: olegshirokikh o...@solver.com
To: user@spark.apache.org
Date: 01/16/2015 01:32 PM
Subject:Creating Apache Spark-powered “As Service” applications
The question is about the ways to create a Windows desktop-based and/or
web-based application
?
Thanks,
Oleg
From: Robert C Senkbeil [mailto:rcsen...@us.ibm.com]
Sent: Friday, January 16, 2015 12:21 PM
To: Oleg Shirokikh
Cc: user@spark.apache.org
Subject: Re: Creating Apache Spark-powered “As Service” applications
Hi,
You can take a look at the Spark Kernel project:
https://github.com/ibm
Well that's confusing. I have the same issue. So you're saying I have to
compile Spark with Yarn set to true to make it work with Hadoop 2.2.0 in
Standalone mode?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-Hadoop-2-2-0-without-Yarn
i am trying to do hBaseRDD.count(); i am gettinig following
exception
java.lang.IllegalStateException (unread block data) [duplicate 1]
pom.xml
http://apache-spark-user-list.1001560.n3.nabble.com/file/n20746/pom.xml
--
View this message in context:
http://apache-spark-user-list.1001560
java.lang.IllegalStateException (unread block data) [duplicate 1]
pom.xml
http://apache-spark-user-list.1001560.n3.nabble.com/file/n20746/pom.xml
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-1-1-1-with-Hbase-0-98-8-hadoop2-and-hadoop-2-3-0
Hi Jai,
Refer this doc and make sure your network is not blocking
http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-job-on-Unix-cluster-from-dev-environment-Windows-td16989.html
Also make sure you are using the same version of spark in both places (the
one on the cluster
.
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015
My thesis is related to big data mining and I have a cluster in the
laboratory of my university. My task is to install apache spark on it and
use it for extraction purposes. Is there any understandable guidance on how
to do this ?
--
View this message in context:
http://apache-spark-user-list
On a rough note,
Step 1: Install Hadoop2.x in all the machines on cluster
Step 2: Check if Hadoop cluster is working
Step 3: Setup Apache Spark as given on the documentation page for the
cluster.
Check the status of cluster on the master UI
As it is some data mining project, configure Hive too
I have a question as the title says, the question link is
http://stackoverflow.com/questions/27370170/query-classification-using-apache-spark-mlib,thanks
Jin
Hello all
I'm very pleased to announce the launch of http://www.SparkBigData.com: The
Apache Spark Knowledge Base.
As your one-stop information resource dedicated to Apache Spark.
SparkBigData.com, provides free, easy and fast access to hundreds of Apache
Spark resources organized in several
Friends,
I am pretty new to Spark as much as to Scala, MLib and the entire Hadoop
stack!! It would be so much help if I could be pointed to some good books on
Spark and MLib?
Further, does MLib support any algorithms for B2B cross sell/ upsell or
customer retention (out of the box
Take a look at the O'Reilly Learning Spark (Early Release) book. I've found
this very useful.
Darin.
From: Saurabh Agrawal saurabh.agra...@markit.com
To: user@spark.apache.org user@spark.apache.org
Sent: Thursday, November 20, 2014 9:04 AM
Subject: Please help me get started on Apache
For Spark,
You can start with a new book like :
https://www.safaribooksonline.com/library/view/learning-spark/9781449359034/ch01.html
I think the paper book is out now,
You can also have a look on tutorials documentation guide available on :
https://spark.apache.org/docs/1.1.0/mllib-guide.html
other tools.
On Mon, Nov 10, 2014 at 10:34 AM, Adamantios Corais
adamantios.cor...@gmail.com wrote:
I have some previous experience with Apache Oozie while I was developing
in Apache Pig. Now, I am working explicitly with Apache Spark and I am
looking for a tool with similar functionality
I have some previous experience with Apache Oozie while I was developing in
Apache Pig. Now, I am working explicitly with Apache Spark and I am looking
for a tool with similar functionality. Is Oozie recommended? What about
Luigi? What do you use \ recommend?
previous experience with Apache Oozie while I was developing
in Apache Pig. Now, I am working explicitly with Apache Spark and I am
looking for a tool with similar functionality. Is Oozie recommended? What
about Luigi? What do you use \ recommend?
--
Nothing under the sun is greater than
in Apache Pig. Now, I am working explicitly with Apache Spark and I am
looking for a tool with similar functionality. Is Oozie recommended? What
about Luigi? What do you use \ recommend?
--
Nothing under the sun is greater than education. By educating one person
and sending him/her
Let me know if you are interested in participating in a meet up in Cincinnati,
OH to discuss Apache Spark.
We currently have 4-5 different companies expressing interest but would like a
few more.
Darin.
Hi,
Sorry to bounce back the old thread.
What is the state now? Is this problem solved. How spark handle categorical
data now?
Regards,
Ashutosh
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Prediction-using-Classification-with-text-attributes
the old thread.
What is the state now? Is this problem solved. How spark handle categorical
data now?
Regards,
Ashutosh
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Prediction-using-Classification-with-text-attributes-in-Apache-Spark-MLLib
I developed the spark-xml-utils library because we have a large amount of XML
in big datasets and I felt this data could be better served by providing some
helpful xml utilities. This includes the ability to filter documents based on
an xpath/xquery expression, return specific nodes for an
Cross posting an interesting question on Stack Overflow
http://stackoverflow.com/questions/26321947/multipart-uploads-to-amazon-s3-from-apache-spark
.
Nick
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Multipart-uploads-to-Amazon-S3-from-Apache-Spark
-spark
.
Nick
--
View this message in context: Multipart uploads to Amazon S3 from Apache
Spark
http://apache-spark-user-list.1001560.n3.nabble.com/Multipart-uploads-to-Amazon-S3-from-Apache-Spark-tp16315.html
Sent from the Apache Spark User List mailing list archive
Oh, that's a straight reversal from their position up until earlier this
year
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-read-a-multipart-s3-file-tp5463p5485.html
.
Was there an announcement explaining the change in recommendation?
Nick
On Mon, Oct 13, 2014 at 4:54 PM, Daniil
Hi,
Has anyone tried Mosek http://www.mosek.com/ Solver in Spark?
I getting weird serialization errors. I came to know that Mosek uses shared
libraries which may not be serialized.
Is this the reason that they are not serialized or Is it working for anyone.
--
Regards,
Raghuveer Chanda
4th
be a problem (can tell from the UI). what worth mentioning is that one rdd
is significant bigger than the other one (much bigger), does anyone have any
idea why?
Thanks
Edwin
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/apache-spark-union-function-cause
does union function cause any data shuffling?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/apache-spark-union-function-cause-executors-disassociate-Lost-executor-1-on-172-32-1-12-remote-Akka--tp15442p15444.html
Sent from the Apache Spark User List
]
at getCallSite at null:-1), which has no missing parents
19:02:47,085 INFO [org.apache.spark.scheduler.DAGScheduler]
(spark-akka.actor.default-dispatcher-14) Submitting 24 missing tasks from
Stage 12 (UnionRDD[31] at getCallSite at null:-1)
--
View this message in context:
http://apache-spark
version), after
the
sbt assembly, I can run spark-shell successfully but python shell does
work.
$ ./bin/pyspark
./bin/pyspark: line 111: exec: python: not found
have you solved your problem?
Thanks,
Christy
--
View this message in context:
http://apache-spark-user-list.1001560.n3
in context:
http://apache-spark-user-list.1001560.n3.nabble.com/What-is-a-pre-built-package-of-Apache-Spark-tp14080p15101.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user
(src not pre-built version), after
the
sbt assembly, I can run spark-shell successfully but python shell does
work.
$ ./bin/pyspark
./bin/pyspark: line 111: exec: python: not found
have you solved your problem?
Thanks,
Christy
--
View this message in context:
http://apache-spark
://apache-spark-user-list.1001560.n3.nabble.com/What-is-a-pre-built-package-of-Apache-Spark-tp14080p14088.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr
multiple working nodes on single host.
It's interesting to know, how do you cope with such a challenge? Thanks.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-perform-blocking-IO-in-apache-spark-job-tp13704.html
Sent from the Apache Spark User List
? Thanks.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-perform-blocking-IO-in-apache-spark-job-tp13704.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
:
http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-perform-blocking-IO-in-apache-spark-job-tp13704p13707.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user
, more suitable for spark, like having
multiple working nodes on single host.
It's interesting to know, how do you cope with such a challenge? Thanks.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-perform-blocking-IO-in-apache-spark-job
you improve code? Or what spark configurations to look for?
(Sorry, I'm quite new to Spark)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-perform-blocking-IO-in-apache-spark-job-tp13704p13713.html
Sent from the Apache Spark User List mailing list
it, at least as much threads, as my machine
allows me. But how to do the same on spark? Is there a possibility to cal
that native component on each worker in multiple threads?
Thanks in advance.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you
? My suspicion is that you're trying to
access something (SparkConf?) within the map closures...
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-Cassandra-NotSerializable-Exception-while-saving-to-cassandra-tp12906p12960.html
Sent from the Apache
Hi Yana
I have done take and confirmed existence of data..Also checked that it is
getting connected to Cassandra.. That is why I suspect that this particular
rdd is not serializable..
Thanks,
Lmk
On Aug 28, 2014 5:13 AM, Yana [via Apache Spark User List]
ml-node+s1001560n12960...@n3.nabble.com
I've seen a couple of issues posted about this, but I never saw a resolution.
When I'm using Spark 1.0.2 (and the spark-submit script to submit my jobs) and
AWS SDK 1.8.7, I get the stack trace below. However, if I drop back to AWS SDK
1.3.26 (or anything from the AWS SDK 1.4.* family) then
You specify your own log4j configuration in the usual log4j way --
package it in your assembly, or specify on the command line for
example. See http://logging.apache.org/log4j/1.2/manual.html
The template you can start with is in
core/src/main/resources/org/apache/spark/log4j-defaults.properties
Hello All,
Basically i need to edit the log4j.properties to filter some of the
unnecessary logs in spark on yarn-client mode. I am not sure where can i
find log4j.properties file (location). Can any one help me on this.
--
View this message in context:
http://apache-spark-user-list.1001560.n3
Hello @ the mailing list,
We think of using spark in one of our projects in a Hadoop cluster. During
evaluation several questions remain which are stated below.
Preconditions
Let's assume Apache Spark is deployed on a hadoop cluster using YARN.
Furthermore a spark execution is running. How
.
*Preconditions*
Let's assume Apache Spark is deployed on a hadoop cluster using YARN.
Furthermore a spark execution is running. How does spark handle the
situations listed below?
*Cases Questions*
1. One node of the hadoop clusters fails due to a disc error. However
replication
Thanks, your answers totally cover all my questions ☺
Von: Sandy Ryza [mailto:sandy.r...@cloudera.com]
Gesendet: Mittwoch, 16. Juli 2014 09:41
An: user@spark.apache.org
Betreff: Re: How does Apache Spark handles system failure when deployed in YARN?
Hi Matthias,
Answers inline.
-Sandy
On Wed
Hello,
I am currently learning Apache Spark and I want to see how it integrates
with an existing Hadoop Cluster.
My current Hadoop configuration is version 2.2.0 without Yarn. I have build
Apache Spark (v1.0.0) following the instructions in the README file. Only
setting the SPARK_HADOOP_VERSION
, 2014 at 9:27 AM, Nick R. Katsipoulakis kat...@cs.pitt.edu
wrote:
Hello,
I am currently learning Apache Spark and I want to see how it integrates
with an existing Hadoop Cluster.
My current Hadoop configuration is version 2.2.0 without Yarn. I have
build Apache Spark (v1.0.0) following
Yieldbot is pleased to announce the release of Flambo, our Clojure DSL for
Apache Spark.
Flambo allows one to write spark applications in pure Clojure as an
alternative to Scala, Java and Python currently available in Spark.
We have already written a substantial amount of internal code
Classification with text attributes in Apache
Spark MLLib
Libsvm dataset converters are data dependent since your input data can be in
any serialization format and not necessarily csv...
We have flows that coverts hdfs data to libsvm/sparse vector rdd which is sent
to mllib
I am not sure
...@gmail.com]
Sent: Wednesday, June 25, 2014 1:27 PM
To: u...@spark.incubator.apache.org
Subject: RE: Prediction using Classification with text attributes in Apache
Spark MLLib
Hi Alexander,
Just one more question on a related note. Should I be following the same
procedure even if my data
text.
Please let me know how I can do this.
Thanks.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Prediction-using-Classification-with-text-attributes-in-Apache-Spark-MLLib-tp8166.html
Sent from the Apache Spark User List mailing list archive
PM
To: u...@spark.incubator.apache.org
Subject: Prediction using Classification with text attributes in Apache Spark
MLLib
Hi,
I am trying to predict an attribute with binary value (Yes/No) using SVM.
All my attributes which belong to the training set are text attributes.
I understand that I have
Hi Alexander,
Thanks for your prompt response. Earlier I was executing this Prediction
using Weka only. But now we are moving to a huge dataset and hence to Apache
Spark MLLib. Is there any other way to convert to libSVM format? Or is there
any other simpler algorithm that I can use in mllib
: Tuesday, June 24, 2014 3:41 PM
To: u...@spark.incubator.apache.org
Subject: RE: Prediction using Classification with text attributes in Apache
Spark MLLib
Hi Alexander,
Thanks for your prompt response. Earlier I was executing this Prediction using
Weka only. But now we are moving to a huge dataset
On Tue, Jun 24, 2014 at 12:28 PM, Ulanov, Alexander
alexander.ula...@hp.com wrote:
You need to convert your text to vector space model:
http://en.wikipedia.org/wiki/Vector_space_model
and then pass it to SVM. As far as I know, in previous versions of MLlib
there was a special class for doing
Hi,
I would like to setup apache platform on a mini cluster. Is there any
recommendation for the hardware that I can buy to set it up. I am thinking
about processing significant amount of data like in the range of few
terabytes.
Thanks
Upender
Suggestion - try to get an idea of your hardware requirements by running a
sample on Amazon's EC2 or Google compute engine. It's relatively easy (and
cheap) to get started on the cloud before you invest in your own hardware
IMO.
On Wed, May 21, 2014 at 8:14 PM, Upender Nimbekar
It depends on what stack you want to run. A quick cut:
- Worker Machines (DataNode, HBase Region Servers, Spark Worker Nodes)
- Dual 6 core CPU
- 64 to 128 GB RAM
- 3 X 3TB disk (JBOD)
- Master Node (Name Node, HBase Master,Spark Master)
- Dual 6 core CPU
- 64
Hi, all
*Spark version: bae07e3 [behind 1] fix different versions of commons-lang
dependency and apache/spark#746 addendum*
I have six worker nodes and four of them have this NoClassDefFoundError when
I use thestart-slaves.sh on my driver node. However, running ./bin/spark-class
The exception:
Exception in thread main org.apache.spark.SparkException: Job aborted:
Task 0.0:1 failed 32 times (most recent failure: Exception failure:
java.lang.IllegalStateException: unread block data)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler
%3DGJh1g2zxOJd02Wt7L06mCLjo-vwwG9Q%40mail.gmail.com%3E
Prashant Sharma
On Fri, May 2, 2014 at 3:56 PM, N.Venkata Naga Ravi nvn_r...@hotmail.comwrote:
Hi,
I am tyring to build Apache Spark with Java 8 in my Mac system ( OS X
10.8.5) , but getting following exception.
Please help on resolving
...@gmail.com
Date: Fri, 2 May 2014 16:02:48 +0530
Subject: Re: Apache Spark is not building in Mac/Java 8
To: user@spark.apache.org
you will need to change sbt version to 13.2. I think spark 0.9.1 was released
with sbt 13 ? Incase not then it may not work with java 8. Just wait for 1.0
release or give
dhcp-173-39-68-28:spark-0.9.1 neravi$ cd sbt/
dhcp-173-39-68-28:sbt neravi$ ls
sbt
*sbt-launch-0.12.4.jar *
--
*From: scrapco...@gmail.com scrapco...@gmail.comDate: Fri, 2 May 2014
16:02:48 +0530Subject: Re: Apache Spark is not building in Mac/Java 8To:
user
901 - 1000 of 1006 matches
Mail list logo