Re: Apache Ignite vs Apache Spark

2015-02-26 Thread Sean Owen
Ignite is the renaming of GridGain, if that helps. It's like Oracle Coherence, if that helps. These do share some similarities -- fault tolerant, in-memory, distributed processing. The pieces they're built on differ, the architecture differs, the APIs differ. So fairly different in particulars. I

Re: Apache Ignite vs Apache Spark

2015-02-26 Thread Jay Vyas
-https://wiki.apache.org/incubator/IgniteProposal has I think been updated recently and has a good comparison. - Although grid gain has been around since the spark days, Apache Ignite is quite new and just getting started I think so - you will probably want to reach out to the developers

RE: Apache Ignite vs Apache Spark

2015-02-26 Thread nate
To: Sean Owen Cc: Ognen Duzlevski; user@spark.apache.org Subject: Re: Apache Ignite vs Apache Spark -https://wiki.apache.org/incubator/IgniteProposal has I think been updated recently and has a good comparison. - Although grid gain has been around since the spark days, Apache Ignite is quite new

Re: Apache Ignite vs Apache Spark

2015-02-26 Thread Ognen Duzlevski
- From: Jay Vyas [mailto:jayunit100.apa...@gmail.com] Sent: Thursday, February 26, 2015 3:40 PM To: Sean Owen Cc: Ognen Duzlevski; user@spark.apache.org Subject: Re: Apache Ignite vs Apache Spark -https://wiki.apache.org/incubator/IgniteProposal has I think been updated recently and has a good

Re: Hamburg Apache Spark Meetup

2015-02-25 Thread Petar Zecevic
Please add the Zagreb Meetup group, too. http://www.meetup.com/Apache-Spark-Zagreb-Meetup/ Thanks! On 18.2.2015. 19:46, Johan Beisser wrote: If you could also add the Hamburg Apache Spark Meetup, I'd appreciate it. http://www.meetup.com/Hamburg-Apache-Spark-Meetup/ On Tue, Feb 17, 2015

Re: Periodic Broadcast in Apache Spark Streaming

2015-02-23 Thread Tathagata Das
it. Thanks in advance. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Periodic-Broadcast-in-Apache-Spark-Streaming-tp21703.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Hamburg Apache Spark Meetup

2015-02-18 Thread Johan Beisser
If you could also add the Hamburg Apache Spark Meetup, I'd appreciate it. http://www.meetup.com/Hamburg-Apache-Spark-Meetup/ On Tue, Feb 17, 2015 at 5:08 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Thanks! I've added you. Matei On Feb 17, 2015, at 4:06 PM, Ralph Bergmann

Periodic Broadcast in Apache Spark Streaming

2015-02-18 Thread aanilpala
: http://apache-spark-user-list.1001560.n3.nabble.com/Periodic-Broadcast-in-Apache-Spark-Streaming-tp21703.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr

Berlin Apache Spark Meetup

2015-02-17 Thread Ralph Bergmann | the4thFloor.eu
Hi, there is a small Spark Meetup group in Berlin, Germany :-) http://www.meetup.com/Berlin-Apache-Spark-Meetup/ Plaes add this group to the Meetups list at https://spark.apache.org/community.html Ralph - To unsubscribe, e

Re: Berlin Apache Spark Meetup

2015-02-17 Thread Matei Zaharia
Thanks! I've added you. Matei On Feb 17, 2015, at 4:06 PM, Ralph Bergmann | the4thFloor.eu ra...@the4thfloor.eu wrote: Hi, there is a small Spark Meetup group in Berlin, Germany :-) http://www.meetup.com/Berlin-Apache-Spark-Meetup/ Plaes add this group to the Meetups list

java.lang.NoClassDefFoundError: org/apache/spark/SparkConf

2015-02-16 Thread siqi chen
for this error Exception in thread main java.lang.NoClassDefFoundError: org/apache/spark/SparkConf ... Caused by: java.lang.ClassNotFoundException: org.apache.spark.SparkConf Here is my build.sbt file import _root_.sbt.Keys._ import _root_.sbtassembly.Plugin.AssemblyKeys._ import _root_

Re: How to define a file filter for file name patterns in Apache Spark Streaming in Java?

2015-02-03 Thread Emre Sevinc
2, 2015 at 6:34 PM, Emre Sevinc emre.sev...@gmail.com wrote: Hello, I'm using Apache Spark Streaming 1.2.0 and trying to define a file filter for file names when creating an InputDStream https://spark.apache.org/docs/1.2.0/api/java/org/apache/spark/streaming/dstream/InputDStream.html

How to define a file filter for file name patterns in Apache Spark Streaming in Java?

2015-02-02 Thread Emre Sevinc
Hello, I'm using Apache Spark Streaming 1.2.0 and trying to define a file filter for file names when creating an InputDStream https://spark.apache.org/docs/1.2.0/api/java/org/apache/spark/streaming/dstream/InputDStream.html by invoking the fileStream https://spark.apache.org/docs/1.2.0/api/java

Re: How to define a file filter for file name patterns in Apache Spark Streaming in Java?

2015-02-02 Thread Akhil Das
(SequenceFileInputFormat.class)); ​ Thanks Best Regards On Mon, Feb 2, 2015 at 6:34 PM, Emre Sevinc emre.sev...@gmail.com wrote: Hello, I'm using Apache Spark Streaming 1.2.0 and trying to define a file filter for file names when creating an InputDStream https://spark.apache.org/docs/1.2.0/api/java/org

Apache Spark standalone mode: number of cores

2015-01-23 Thread olegshirokikh
in context: http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-standalone-mode-number-of-cores-tp21342.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr

Re: Apache Spark standalone mode: number of cores

2015-01-23 Thread Boromir Widas
. Since all the data is stored on a single local machine, it does not benefit from distributed operations on RDDs. How does it benefit and what internally is going on when Spark utilizes several logical cores? -- View this message in context: http://apache-spark-user-list.1001560.n3

RE: How to 'Pipe' Binary Data in Apache Spark

2015-01-23 Thread Venkat, Ankam
Spark Committers: Please advise the way forward for this issue. Thanks for your support. Regards, Venkat From: Venkat, Ankam Sent: Thursday, January 22, 2015 9:34 AM To: 'Frank Austin Nothaft'; 'user@spark.apache.org' Cc: 'Nick Allen' Subject: RE: How to 'Pipe' Binary Data in Apache Spark How

Re: Is Apache Spark less accurate than Scikit Learn?

2015-01-22 Thread Robin East
at both the PoC and production stages. On 21 Jan 2015, at 20:39, JacquesH jaaksem...@gmail.com wrote: I've recently been trying to get to know Apache Spark as a replacement for Scikit Learn, however it seems to me that even in simple cases, Scikit converges to an accurate model far faster

Re: How to 'Pipe' Binary Data in Apache Spark

2015-01-22 Thread Frank Austin Nothaft
: Frank Austin Nothaft [mailto:fnoth...@berkeley.edu] Sent: Wednesday, January 21, 2015 12:30 PM To: Venkat, Ankam Cc: Nick Allen; user@spark.apache.org Subject: Re: How to 'Pipe' Binary Data in Apache Spark Hi Venkat/Nick, The Spark RDD.pipe method pipes text data into a subprocess

RE: How to 'Pipe' Binary Data in Apache Spark

2015-01-22 Thread Venkat, Ankam
Data in Apache Spark Venkat, No problem! So, creating a custom InputFormat or using sc.binaryFiles alone is not the right solution. We also need the modified version of RDD.pipe to support binary data? Is my understanding correct? Yep! That is correct. The custom InputFormat allows Spark

RE: How to 'Pipe' Binary Data in Apache Spark

2015-01-22 Thread Venkat, Ankam
: What's your take on this? Regards, Venkat Ankam From: Frank Austin Nothaft [mailto:fnoth...@berkeley.edu] Sent: Wednesday, January 21, 2015 12:30 PM To: Venkat, Ankam Cc: Nick Allen; user@spark.apache.org Subject: Re: How to 'Pipe' Binary Data in Apache Spark Hi Venkat/Nick, The Spark RDD.pipe

Apache Spark broadcast error: Error sending message as driverActor is null [message = UpdateBlockInfo(BlockManagerId(4)

2015-01-22 Thread Zijing Guo
HiI'm using Apache Spark 1.1.0 and I'm currently having issue with broadcast method. So when I call broadcast function on a small dataset to a 5 nodes cluster, I experiencing the Error sending message as driverActor is null after broadcast the variables several times (apps running under jboss

Re: How to 'Pipe' Binary Data in Apache Spark

2015-01-22 Thread Silvio Fiorito
:09 AM To: user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: How to 'Pipe' Binary Data in Apache Spark I have an RDD containing binary data. I would like to use 'RDD.pipe' to pipe that binary data to an external program

Apache Spark broadcast error: Error sending message as driverActor is null [message = UpdateBlockInfo(BlockManagerId

2015-01-22 Thread Edwin
I'm using Apache Spark 1.1.0 and I'm currently having issue with broadcast method. So when I call broadcast function on a small dataset to a 5 nodes cluster, I experiencing the Error sending message as driverActor is null after broadcast the variables several times (apps running under jboss). Any

Re: How to 'Pipe' Binary Data in Apache Spark

2015-01-21 Thread Frank Austin Nothaft
: 'function' object has no attribute 'read' Any suggestions? Regards, Venkat Ankam From: Nick Allen [mailto:n...@nickallen.org] Sent: Friday, January 16, 2015 11:46 AM To: user@spark.apache.org Subject: Re: How to 'Pipe' Binary Data in Apache Spark I just wanted to reiterate

RE: How to 'Pipe' Binary Data in Apache Spark

2015-01-21 Thread Venkat, Ankam
different options. AttributeError: 'function' object has no attribute 'read' Any suggestions? Regards, Venkat Ankam From: Nick Allen [mailto:n...@nickallen.org] Sent: Friday, January 16, 2015 11:46 AM To: user@spark.apache.org Subject: Re: How to 'Pipe' Binary Data in Apache Spark I just wanted

Is Apache Spark less accurate than Scikit Learn?

2015-01-21 Thread JacquesH
I've recently been trying to get to know Apache Spark as a replacement for Scikit Learn, however it seems to me that even in simple cases, Scikit converges to an accurate model far faster than Spark does. For example I generated 1000 data points for a very simple linear function (z=x+y

Re: Is Apache Spark less accurate than Scikit Learn?

2015-01-21 Thread Robin East
and production stages. On 21 Jan 2015, at 20:39, JacquesH jaaksem...@gmail.com wrote: I've recently been trying to get to know Apache Spark as a replacement for Scikit Learn, however it seems to me that even in simple cases, Scikit converges to an accurate model far faster than Spark does

Re: Is Apache Spark less accurate than Scikit Learn?

2015-01-21 Thread Jacques Heunis
...@gmail.com wrote: I've recently been trying to get to know Apache Spark as a replacement for Scikit Learn, however it seems to me that even in simple cases, Scikit converges to an accurate model far faster than Spark does. For example I generated 1000 data points for a very simple linear

How to 'Pipe' Binary Data in Apache Spark

2015-01-16 Thread Nick Allen
I have an RDD containing binary data. I would like to use 'RDD.pipe' to pipe that binary data to an external program that will translate it to string/text data. Unfortunately, it seems that Spark is mangling the binary data before it gets passed to the external program. This code is representative

Re: How to 'Pipe' Binary Data in Apache Spark

2015-01-16 Thread Sean Owen
Well it looks like you're reading some kind of binary file as text. That isn't going to work, in Spark or elsewhere, as binary data is not even necessarily the valid encoding of a string. There are no line breaks to delimit lines and thus elements of the RDD. Your input has some record structure

Re: How to 'Pipe' Binary Data in Apache Spark

2015-01-16 Thread Nick Allen
Per your last comment, it appears I need something like this: https://github.com/RIPE-NCC/hadoop-pcap Thanks a ton. That get me oriented in the right direction. On Fri, Jan 16, 2015 at 10:20 AM, Sean Owen so...@cloudera.com wrote: Well it looks like you're reading some kind of binary file

Re: How to 'Pipe' Binary Data in Apache Spark

2015-01-16 Thread Nick Allen
I just wanted to reiterate the solution for the benefit of the community. The problem is not from my use of 'pipe', but that 'textFile' cannot be used to read in binary data. (Doh) There are a couple options to move forward. 1. Implement a custom 'InputFormat' that understands the binary input

Creating Apache Spark-powered “As Service” applications

2015-01-16 Thread olegshirokikh
this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Creating-Apache-Spark-powered-As-Service-applications-tp21193.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe

Re: Creating Apache Spark-powered “As Service” applications

2015-01-16 Thread Corey Nolet
be appreciated. Simple toy example program (or steps) that shows, e.g. how to build such client for simply creating Spark Context on a local machine and say reading text file and returning basic stats would be ideal answer! -- View this message in context: http://apache-spark-user-list.1001560.n3

Re: Creating Apache Spark-powered “As Service” applications

2015-01-16 Thread Robert C Senkbeil
Emerging Technology Software Engineer From: olegshirokikh o...@solver.com To: user@spark.apache.org Date: 01/16/2015 01:32 PM Subject:Creating Apache Spark-powered “As Service” applications The question is about the ways to create a Windows desktop-based and/or web-based application

RE: Creating Apache Spark-powered “As Service” applications

2015-01-16 Thread Oleg Shirokikh
? Thanks, Oleg From: Robert C Senkbeil [mailto:rcsen...@us.ibm.com] Sent: Friday, January 16, 2015 12:21 PM To: Oleg Shirokikh Cc: user@spark.apache.org Subject: Re: Creating Apache Spark-powered “As Service” applications Hi, You can take a look at the Spark Kernel project: https://github.com/ibm

Re: Apache Spark, Hadoop 2.2.0 without Yarn Integration

2015-01-02 Thread Moep
Well that's confusing. I have the same issue. So you're saying I have to compile Spark with Yarn set to true to make it work with Hadoop 2.2.0 in Standalone mode? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-Hadoop-2-2-0-without-Yarn

Apache Spark 1.1.1 with Hbase 0.98.8-hadoop2 and hadoop 2.3.0

2014-12-17 Thread Amit Singh Hora
i am trying to do hBaseRDD.count(); i am gettinig following exception java.lang.IllegalStateException (unread block data) [duplicate 1] pom.xml http://apache-spark-user-list.1001560.n3.nabble.com/file/n20746/pom.xml -- View this message in context: http://apache-spark-user-list.1001560

Re: Apache Spark 1.1.1 with Hbase 0.98.8-hadoop2 and hadoop 2.3.0

2014-12-17 Thread Ted Yu
java.lang.IllegalStateException (unread block data) [duplicate 1] pom.xml http://apache-spark-user-list.1001560.n3.nabble.com/file/n20746/pom.xml -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-1-1-1-with-Hbase-0-98-8-hadoop2-and-hadoop-2-3-0

Re: Accessing Apache Spark from Java

2014-12-16 Thread Akhil Das
Hi Jai, Refer this doc and make sure your network is not blocking http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-job-on-Unix-cluster-from-dev-environment-Windows-td16989.html Also make sure you are using the same version of spark in both places (the one on the cluster

Accessing Apache Spark from Java

2014-12-15 Thread Jai
. at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015

Install Apache Spark on a Cluster

2014-12-08 Thread riginos
My thesis is related to big data mining and I have a cluster in the laboratory of my university. My task is to install apache spark on it and use it for extraction purposes. Is there any understandable guidance on how to do this ? -- View this message in context: http://apache-spark-user-list

Re: Install Apache Spark on a Cluster

2014-12-08 Thread Ritesh Kumar Singh
On a rough note, Step 1: Install Hadoop2.x in all the machines on cluster Step 2: Check if Hadoop cluster is working Step 3: Setup Apache Spark as given on the documentation page for the cluster. Check the status of cluster on the master UI As it is some data mining project, configure Hive too

query classification using Apache spark Mlib

2014-12-08 Thread Huang,Jin
I have a question as the title says, the question link is http://stackoverflow.com/questions/27370170/query-classification-using-apache-spark-mlib,thanks Jin

SparkBigData.com: The Apache Spark Knowledge Base

2014-11-22 Thread Slim Baltagi
Hello all I'm very pleased to announce the launch of http://www.SparkBigData.com: The Apache Spark Knowledge Base. As your one-stop information resource dedicated to Apache Spark. SparkBigData.com, provides free, easy and fast access to hundreds of Apache Spark resources organized in several

Please help me get started on Apache Spark

2014-11-20 Thread Saurabh Agrawal
Friends, I am pretty new to Spark as much as to Scala, MLib and the entire Hadoop stack!! It would be so much help if I could be pointed to some good books on Spark and MLib? Further, does MLib support any algorithms for B2B cross sell/ upsell or customer retention (out of the box

Re: Please help me get started on Apache Spark

2014-11-20 Thread Darin McBeath
Take a look at the O'Reilly Learning Spark (Early Release) book.  I've found this very useful. Darin. From: Saurabh Agrawal saurabh.agra...@markit.com To: user@spark.apache.org user@spark.apache.org Sent: Thursday, November 20, 2014 9:04 AM Subject: Please help me get started on Apache

Re: Please help me get started on Apache Spark

2014-11-20 Thread Guibert. J Tchinde
For Spark, You can start with a new book like : https://www.safaribooksonline.com/library/view/learning-spark/9781449359034/ch01.html I think the paper book is out now, You can also have a look on tutorials documentation guide available on : https://spark.apache.org/docs/1.1.0/mllib-guide.html

Re: which is the recommended workflow engine for Apache Spark jobs?

2014-11-11 Thread Harry Brundage
other tools. On Mon, Nov 10, 2014 at 10:34 AM, Adamantios Corais adamantios.cor...@gmail.com wrote: I have some previous experience with Apache Oozie while I was developing in Apache Pig. Now, I am working explicitly with Apache Spark and I am looking for a tool with similar functionality

which is the recommended workflow engine for Apache Spark jobs?

2014-11-10 Thread Adamantios Corais
I have some previous experience with Apache Oozie while I was developing in Apache Pig. Now, I am working explicitly with Apache Spark and I am looking for a tool with similar functionality. Is Oozie recommended? What about Luigi? What do you use \ recommend?

Re: which is the recommended workflow engine for Apache Spark jobs?

2014-11-10 Thread Jimmy McErlain
previous experience with Apache Oozie while I was developing in Apache Pig. Now, I am working explicitly with Apache Spark and I am looking for a tool with similar functionality. Is Oozie recommended? What about Luigi? What do you use \ recommend? -- Nothing under the sun is greater than

Re: which is the recommended workflow engine for Apache Spark jobs?

2014-11-10 Thread Adamantios Corais
in Apache Pig. Now, I am working explicitly with Apache Spark and I am looking for a tool with similar functionality. Is Oozie recommended? What about Luigi? What do you use \ recommend? -- Nothing under the sun is greater than education. By educating one person and sending him/her

Cincinnati, OH Meetup for Apache Spark

2014-11-03 Thread Darin McBeath
Let me know if you  are interested in participating in a meet up in Cincinnati, OH to discuss Apache Spark. We currently have 4-5 different companies expressing interest but would like a few more. Darin.

RE: Prediction using Classification with text attributes in Apache Spark MLLib

2014-11-02 Thread ashu
Hi, Sorry to bounce back the old thread. What is the state now? Is this problem solved. How spark handle categorical data now? Regards, Ashutosh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Prediction-using-Classification-with-text-attributes

Re: Prediction using Classification with text attributes in Apache Spark MLLib

2014-11-02 Thread Xiangrui Meng
the old thread. What is the state now? Is this problem solved. How spark handle categorical data now? Regards, Ashutosh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Prediction-using-Classification-with-text-attributes-in-Apache-Spark-MLLib

XML Utilities for Apache Spark

2014-10-29 Thread Darin McBeath
I developed the spark-xml-utils library because we have a large amount of XML in big datasets and I felt this data could be better served by providing some helpful xml utilities. This includes the ability to filter documents based on an xpath/xquery expression, return specific nodes for an

Multipart uploads to Amazon S3 from Apache Spark

2014-10-13 Thread Nick Chammas
Cross posting an interesting question on Stack Overflow http://stackoverflow.com/questions/26321947/multipart-uploads-to-amazon-s3-from-apache-spark . Nick -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Multipart-uploads-to-Amazon-S3-from-Apache-Spark

Re: Multipart uploads to Amazon S3 from Apache Spark

2014-10-13 Thread Daniil Osipov
-spark . Nick -- View this message in context: Multipart uploads to Amazon S3 from Apache Spark http://apache-spark-user-list.1001560.n3.nabble.com/Multipart-uploads-to-Amazon-S3-from-Apache-Spark-tp16315.html Sent from the Apache Spark User List mailing list archive

Re: Multipart uploads to Amazon S3 from Apache Spark

2014-10-13 Thread Nicholas Chammas
Oh, that's a straight reversal from their position up until earlier this year http://apache-spark-user-list.1001560.n3.nabble.com/How-to-read-a-multipart-s3-file-tp5463p5485.html . Was there an announcement explaining the change in recommendation? Nick On Mon, Oct 13, 2014 at 4:54 PM, Daniil

Mosek Solver with Apache Spark

2014-10-08 Thread Raghuveer Chanda
Hi, Has anyone tried Mosek http://www.mosek.com/ Solver in Spark? I getting weird serialization errors. I came to know that Mosek uses shared libraries which may not be serialized. Is this the reason that they are not serialized or Is it working for anyone. -- Regards, Raghuveer Chanda 4th

apache spark union function cause executors disassociate (Lost executor 1 on 172.32.1.12: remote Akka client disassociated)

2014-09-30 Thread Edwin
be a problem (can tell from the UI). what worth mentioning is that one rdd is significant bigger than the other one (much bigger), does anyone have any idea why? Thanks Edwin -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/apache-spark-union-function-cause

Re: apache spark union function cause executors disassociate (Lost executor 1 on 172.32.1.12: remote Akka client disassociated)

2014-09-30 Thread Edwin
does union function cause any data shuffling? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/apache-spark-union-function-cause-executors-disassociate-Lost-executor-1-on-172-32-1-12-remote-Akka--tp15442p15444.html Sent from the Apache Spark User List

Re: apache spark union function cause executors disassociate (Lost executor 1 on 172.32.1.12: remote Akka client disassociated)

2014-09-30 Thread Edwin
] at getCallSite at null:-1), which has no missing parents 19:02:47,085 INFO [org.apache.spark.scheduler.DAGScheduler] (spark-akka.actor.default-dispatcher-14) Submitting 24 missing tasks from Stage 12 (UnionRDD[31] at getCallSite at null:-1) -- View this message in context: http://apache-spark

Re: What is a pre built package of Apache Spark

2014-09-25 Thread Akhil Das
version), after the sbt assembly, I can run spark-shell successfully but python shell does work. $ ./bin/pyspark ./bin/pyspark: line 111: exec: python: not found have you solved your problem? Thanks, Christy -- View this message in context: http://apache-spark-user-list.1001560.n3

Re: What is a pre built package of Apache Spark

2014-09-24 Thread christy
in context: http://apache-spark-user-list.1001560.n3.nabble.com/What-is-a-pre-built-package-of-Apache-Spark-tp14080p15101.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user

Re: What is a pre built package of Apache Spark

2014-09-24 Thread Denny Lee
(src not pre-built version), after the sbt assembly, I can run spark-shell successfully but python shell does work. $ ./bin/pyspark ./bin/pyspark: line 111: exec: python: not found have you solved your problem? Thanks, Christy -- View this message in context: http://apache-spark

Re: What is a pre built package of Apache Spark

2014-09-12 Thread andrew.craft
://apache-spark-user-list.1001560.n3.nabble.com/What-is-a-pre-built-package-of-Apache-Spark-tp14080p14088.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr

How do you perform blocking IO in apache spark job?

2014-09-08 Thread DrKhu
multiple working nodes on single host. It's interesting to know, how do you cope with such a challenge? Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-perform-blocking-IO-in-apache-spark-job-tp13704.html Sent from the Apache Spark User List

Re: How do you perform blocking IO in apache spark job?

2014-09-08 Thread Jörn Franke
? Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-perform-blocking-IO-in-apache-spark-job-tp13704.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: How do you perform blocking IO in apache spark job?

2014-09-08 Thread DrKhu
: http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-perform-blocking-IO-in-apache-spark-job-tp13704p13707.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user

Re: How do you perform blocking IO in apache spark job?

2014-09-08 Thread Sean Owen
, more suitable for spark, like having multiple working nodes on single host. It's interesting to know, how do you cope with such a challenge? Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-perform-blocking-IO-in-apache-spark-job

Re: How do you perform blocking IO in apache spark job?

2014-09-08 Thread DrKhu
you improve code? Or what spark configurations to look for? (Sorry, I'm quite new to Spark) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-perform-blocking-IO-in-apache-spark-job-tp13704p13713.html Sent from the Apache Spark User List mailing list

Re: How do you perform blocking IO in apache spark job?

2014-09-08 Thread Jörn Franke
it, at least as much threads, as my machine allows me. But how to do the same on spark? Is there a possibility to cal that native component on each worker in multiple threads? Thanks in advance. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you

Re: Apache Spark- Cassandra - NotSerializable Exception while saving to cassandra

2014-08-27 Thread Yana
? My suspicion is that you're trying to access something (SparkConf?) within the map closures... -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-Cassandra-NotSerializable-Exception-while-saving-to-cassandra-tp12906p12960.html Sent from the Apache

Re: Apache Spark- Cassandra - NotSerializable Exception while saving to cassandra

2014-08-27 Thread lmk
Hi Yana I have done take and confirmed existence of data..Also checked that it is getting connected to Cassandra.. That is why I suspect that this particular rdd is not serializable.. Thanks, Lmk On Aug 28, 2014 5:13 AM, Yana [via Apache Spark User List] ml-node+s1001560n12960...@n3.nabble.com

Issues with S3 client library and Apache Spark

2014-08-15 Thread Darin McBeath
I've seen a couple of issues posted about this, but I never saw a resolution. When I'm using Spark 1.0.2 (and the spark-submit script to submit my jobs) and AWS SDK 1.8.7, I get the stack trace below.  However, if I drop back to AWS SDK 1.3.26 (or anything from the AWS SDK 1.4.* family) then

Re: Need info on log4j.properties for apache spark.

2014-07-23 Thread Sean Owen
You specify your own log4j configuration in the usual log4j way -- package it in your assembly, or specify on the command line for example. See http://logging.apache.org/log4j/1.2/manual.html The template you can start with is in core/src/main/resources/org/apache/spark/log4j-defaults.properties

Need info on log4j.properties for apache spark.

2014-07-22 Thread abhiguruvayya
Hello All, Basically i need to edit the log4j.properties to filter some of the unnecessary logs in spark on yarn-client mode. I am not sure where can i find log4j.properties file (location). Can any one help me on this. -- View this message in context: http://apache-spark-user-list.1001560.n3

How does Apache Spark handles system failure when deployed in YARN?

2014-07-16 Thread Matthias Kricke
Hello @ the mailing list, We think of using spark in one of our projects in a Hadoop cluster. During evaluation several questions remain which are stated below. Preconditions Let's assume Apache Spark is deployed on a hadoop cluster using YARN. Furthermore a spark execution is running. How

Re: How does Apache Spark handles system failure when deployed in YARN?

2014-07-16 Thread Sandy Ryza
. *Preconditions* Let's assume Apache Spark is deployed on a hadoop cluster using YARN. Furthermore a spark execution is running. How does spark handle the situations listed below? *Cases Questions* 1. One node of the hadoop clusters fails due to a disc error. However replication

AW: How does Apache Spark handles system failure when deployed in YARN?

2014-07-16 Thread Matthias Kricke
Thanks, your answers totally cover all my questions ☺ Von: Sandy Ryza [mailto:sandy.r...@cloudera.com] Gesendet: Mittwoch, 16. Juli 2014 09:41 An: user@spark.apache.org Betreff: Re: How does Apache Spark handles system failure when deployed in YARN? Hi Matthias, Answers inline. -Sandy On Wed

Apache Spark, Hadoop 2.2.0 without Yarn Integration

2014-07-09 Thread Nick R. Katsipoulakis
Hello, I am currently learning Apache Spark and I want to see how it integrates with an existing Hadoop Cluster. My current Hadoop configuration is version 2.2.0 without Yarn. I have build Apache Spark (v1.0.0) following the instructions in the README file. Only setting the SPARK_HADOOP_VERSION

Re: Apache Spark, Hadoop 2.2.0 without Yarn Integration

2014-07-09 Thread Nick R. Katsipoulakis
, 2014 at 9:27 AM, Nick R. Katsipoulakis kat...@cs.pitt.edu wrote: Hello, I am currently learning Apache Spark and I want to see how it integrates with an existing Hadoop Cluster. My current Hadoop configuration is version 2.2.0 without Yarn. I have build Apache Spark (v1.0.0) following

[ANNOUNCE] Flambo - A Clojure DSL for Apache Spark

2014-07-01 Thread Soren Macbeth
Yieldbot is pleased to announce the release of Flambo, our Clojure DSL for Apache Spark. Flambo allows one to write spark applications in pure Clojure as an alternative to Scala, Java and Python currently available in Spark. We have already written a substantial amount of internal code

RE: Prediction using Classification with text attributes in Apache Spark MLLib

2014-06-26 Thread Ulanov, Alexander
Classification with text attributes in Apache Spark MLLib Libsvm dataset converters are data dependent since your input data can be in any serialization format and not necessarily csv... We have flows that coverts hdfs data to libsvm/sparse vector rdd which is sent to mllib I am not sure

RE: Prediction using Classification with text attributes in Apache Spark MLLib

2014-06-25 Thread Ulanov, Alexander
...@gmail.com] Sent: Wednesday, June 25, 2014 1:27 PM To: u...@spark.incubator.apache.org Subject: RE: Prediction using Classification with text attributes in Apache Spark MLLib Hi Alexander, Just one more question on a related note. Should I be following the same procedure even if my data

Prediction using Classification with text attributes in Apache Spark MLLib

2014-06-24 Thread lmk
text. Please let me know how I can do this. Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Prediction-using-Classification-with-text-attributes-in-Apache-Spark-MLLib-tp8166.html Sent from the Apache Spark User List mailing list archive

RE: Prediction using Classification with text attributes in Apache Spark MLLib

2014-06-24 Thread Ulanov, Alexander
PM To: u...@spark.incubator.apache.org Subject: Prediction using Classification with text attributes in Apache Spark MLLib Hi, I am trying to predict an attribute with binary value (Yes/No) using SVM. All my attributes which belong to the training set are text attributes. I understand that I have

RE: Prediction using Classification with text attributes in Apache Spark MLLib

2014-06-24 Thread lmk
Hi Alexander, Thanks for your prompt response. Earlier I was executing this Prediction using Weka only. But now we are moving to a huge dataset and hence to Apache Spark MLLib. Is there any other way to convert to libSVM format? Or is there any other simpler algorithm that I can use in mllib

RE: Prediction using Classification with text attributes in Apache Spark MLLib

2014-06-24 Thread Ulanov, Alexander
: Tuesday, June 24, 2014 3:41 PM To: u...@spark.incubator.apache.org Subject: RE: Prediction using Classification with text attributes in Apache Spark MLLib Hi Alexander, Thanks for your prompt response. Earlier I was executing this Prediction using Weka only. But now we are moving to a huge dataset

Re: Prediction using Classification with text attributes in Apache Spark MLLib

2014-06-24 Thread Sean Owen
On Tue, Jun 24, 2014 at 12:28 PM, Ulanov, Alexander alexander.ula...@hp.com wrote: You need to convert your text to vector space model: http://en.wikipedia.org/wiki/Vector_space_model and then pass it to SVM. As far as I know, in previous versions of MLlib there was a special class for doing

Run Apache Spark on Mini Cluster

2014-05-21 Thread Upender Nimbekar
Hi, I would like to setup apache platform on a mini cluster. Is there any recommendation for the hardware that I can buy to set it up. I am thinking about processing significant amount of data like in the range of few terabytes. Thanks Upender

Re: Run Apache Spark on Mini Cluster

2014-05-21 Thread Soumya Simanta
Suggestion - try to get an idea of your hardware requirements by running a sample on Amazon's EC2 or Google compute engine. It's relatively easy (and cheap) to get started on the cloud before you invest in your own hardware IMO. On Wed, May 21, 2014 at 8:14 PM, Upender Nimbekar

Re: Run Apache Spark on Mini Cluster

2014-05-21 Thread Krishna Sankar
It depends on what stack you want to run. A quick cut: - Worker Machines (DataNode, HBase Region Servers, Spark Worker Nodes) - Dual 6 core CPU - 64 to 128 GB RAM - 3 X 3TB disk (JBOD) - Master Node (Name Node, HBase Master,Spark Master) - Dual 6 core CPU - 64

java.lang.NoClassDefFoundError: org/apache/spark/deploy/worker/Worker

2014-05-18 Thread Hao Wang
Hi, all *Spark version: bae07e3 [behind 1] fix different versions of commons-lang dependency and apache/spark#746 addendum* I have six worker nodes and four of them have this NoClassDefFoundError when I use thestart-slaves.sh on my driver node. However, running ./bin/spark-class

Apache Spark Throws java.lang.IllegalStateException: unread block data

2014-05-17 Thread sam
The exception: Exception in thread main org.apache.spark.SparkException: Job aborted: Task 0.0:1 failed 32 times (most recent failure: Exception failure: java.lang.IllegalStateException: unread block data) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler

Re: Apache Spark is not building in Mac/Java 8

2014-05-02 Thread Prashant Sharma
%3DGJh1g2zxOJd02Wt7L06mCLjo-vwwG9Q%40mail.gmail.com%3E Prashant Sharma On Fri, May 2, 2014 at 3:56 PM, N.Venkata Naga Ravi nvn_r...@hotmail.comwrote: Hi, I am tyring to build Apache Spark with Java 8 in my Mac system ( OS X 10.8.5) , but getting following exception. Please help on resolving

RE: Apache Spark is not building in Mac/Java 8

2014-05-02 Thread N . Venkata Naga Ravi
...@gmail.com Date: Fri, 2 May 2014 16:02:48 +0530 Subject: Re: Apache Spark is not building in Mac/Java 8 To: user@spark.apache.org you will need to change sbt version to 13.2. I think spark 0.9.1 was released with sbt 13 ? Incase not then it may not work with java 8. Just wait for 1.0 release or give

Re: Apache Spark is not building in Mac/Java 8

2014-05-02 Thread Prashant Sharma
dhcp-173-39-68-28:spark-0.9.1 neravi$ cd sbt/ dhcp-173-39-68-28:sbt neravi$ ls sbt *sbt-launch-0.12.4.jar * -- *From: scrapco...@gmail.com scrapco...@gmail.comDate: Fri, 2 May 2014 16:02:48 +0530Subject: Re: Apache Spark is not building in Mac/Java 8To: user

<    5   6   7   8   9   10   11   >