getting firs N messages froma Kafka topic using Spark Streaming

2014-12-03 Thread Hafiz Mujadid
ssc.awaitTermination(5000) ssc.stop(true) } Where sample is a StringBuilder, when I print the contents of this string builder after getSample method call is returned. I got nothing in it. Any help will be appreciated -- Vi

Re: Problem creating EC2 cluster using spark-ec2

2014-12-02 Thread Shivaram Venkataraman
+Andrew Actually I think this is because we haven't uploaded the Spark binaries to cloudfront / pushed the change to mesos/spark-ec2. Andrew, can you take care of this ? On Tue, Dec 2, 2014 at 5:11 PM, Nicholas Chammas wrote: > Interesting. Do you have any problems when launching in us-east-

Re: Problem creating EC2 cluster using spark-ec2

2014-12-02 Thread Nicholas Chammas
Interesting. Do you have any problems when launching in us-east-1? What is the full output of spark-ec2 when launching a cluster? (Post it to a gist if it’s too big for email.) ​ On Mon, Dec 1, 2014 at 10:34 AM, Dave Challis wrote: > I've been trying to create a Spark cluster on EC2 using the >

Problem creating EC2 cluster using spark-ec2

2014-12-01 Thread Dave Challis
I've been trying to create a Spark cluster on EC2 using the documentation at https://spark.apache.org/docs/latest/ec2-scripts.html (with Spark 1.1.1). Running the script successfully creates some EC2 instances, HDFS etc., but appears to fail to copy the actual files needed to run Spark across. I

Spark cluster with Java 8 using ./spark-ec2

2014-11-25 Thread Jon Chase
I'm trying to use the spark-ec2 command to launch a Spark cluster that runs Java 8, but so far I haven't been able to get the Spark processes to use the right JVM at start up. Here's the command I use for launching the cluster. Note I'm using the user-data feature to install Java 8: ./spark-ec2

Re: Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread Marcelo Vanzin
That's an interesting question for which I do not know the answer. Probably a question for someone with more knowledge of the internals of the shell interpreter... On Mon, Nov 24, 2014 at 2:19 PM, aecc wrote: > Ok, great, I'm gonna do do it that way, thanks :). However I still don't > understand

Re: Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread aecc
de=19692&i=1> > For additional commands, e-mail: [hidden email] > <http://user/SendEmail.jtp?type=node&node=19692&i=2> > > > > -- > If you reply to this email, your message will be added to the discussion > below: > > ht

Re: Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread Marcelo Vanzin
On Mon, Nov 24, 2014 at 1:56 PM, aecc wrote: > I checked sqlContext, they use it in the same way I would like to use my > class, they make the class Serializable with transient. Does this affects > somehow the whole pipeline of data moving? I mean, will I get performance > issues when doing this b

Re: Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread aecc
--- > To unsubscribe, e-mail: [hidden email] > <http://user/SendEmail.jtp?type=node&node=19687&i=1> > For additional commands, e-mail: [hidden email] > <http://user/SendEmail.jtp?type=node&node=19687&i=2> > > > > ------ > If you reply to this

Re: Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread Marcelo Vanzin
Hello, On Mon, Nov 24, 2014 at 12:07 PM, aecc wrote: > This is the stacktrace: > > org.apache.spark.SparkException: Job aborted due to stage failure: Task not > serializable: java.io.NotSerializableException: $iwC$$iwC$$iwC$$iwC$AAA > - field (class "$iwC$$iwC$$iwC$$iwC", name: "aaa", typ

Re: How can I read this avro file using spark & scala?

2014-11-24 Thread Michael Armbrust
Thanks for the feedback, I filed a couple of issues: https://github.com/databricks/spark-avro/issues On Fri, Nov 21, 2014 at 5:04 AM, thomas j wrote: > I've been able to load a different avro file based on GenericRecord with: > > val person = sqlContext.avroFile("/tmp/person.avro") > > When I tr

Re: Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread aecc
If I actually instead of using myNumber I use the 5 value, the exception is not given. E.g: aaa.s.parallelize(1 to 10).filter(_ == 5).count Works perfectly -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-Context-as-an-attribute-of-a-class

Re: Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread aecc
", ) - field (class "org.apache.spark.rdd.FilteredRDD", name: "f", type: "interface scala.Function1") - root object (class "org.apache.spark.rdd.FilteredRDD", FilteredRDD[3] at filter at :20) at org.apache.spark.scheduler.DAGSched

Re: Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread Marcelo Vanzin
d as @transient. But the two examples you posted shouldn't be creating a reference to the "aaa" variable in the serialized task. You could use -Dsun.io.serialization.extendedDebugInfo=true to debug these things. On Mon, Nov 24, 2014 at 10:15 AM, aecc wrote: > Hello guys, >

Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread aecc
Hello guys, I'm using Spark 1.0.0 and Kryo serialization In the Spark Shell, when I create a class that contains as an attribute the SparkContext, in this way: class AAA(val s: SparkContext) { } val aaa = new AAA(sc) and I execute any action using that attribute like: val myNumbe

Re: Parsing a large XML file using Spark

2014-11-21 Thread andy petrella
(sorry about the previous spam... google inbox didn't allowed me to cancel the miserable sent action :-/) So what I was about to say: it's a real PAIN tin the ass to parse the wikipedia articles in the dump due to this mulitline articles... However, there is a way to manage that "quite" easily, a

Re: Parsing a large XML file using Spark

2014-11-21 Thread andy petrella
Actually, it's a real On Tue Nov 18 2014 at 2:52:00 AM Tobias Pfeiffer wrote: > Hi, > > see https://www.mail-archive.com/dev@spark.apache.org/msg03520.html for > one solution. > > One issue with those XML files is that they cannot be processed line by > line in parallel; plus you inherently need

Re: Parsing a large XML file using Spark

2014-11-21 Thread Paul Brown
also) that is stored in HDFS, is it possible >> to parse it in parallel/faster using Spark? Or do we have to use something >> like a PullParser or Iteratee? >> >> My current solution is to read the single XML file in the first pass - >>

Re: How can I read this avro file using spark & scala?

2014-11-21 Thread Simone Franzini
I have also been struggling with reading avro. Very glad to hear that there is a new avro library coming in Spark 1.2 (which by the way, seems to have a lot of other very useful improvements). In the meanwhile, I have been able to piece together several snippets/tips that I found from various sour

Re: How can I read this avro file using spark & scala?

2014-11-21 Thread thomas j
I've been able to load a different avro file based on GenericRecord with: val person = sqlContext.avroFile("/tmp/person.avro") When I try to call `first()` on it, I get "NotSerializableException" exceptions again: person.first() ... 14/11/21 12:59:17 ERROR Executor: Exception in task 0.0 in sta

Re: How can I read this avro file using spark & scala?

2014-11-21 Thread thomas j
Thanks for the pointer Michael. I've downloaded spark 1.2.0 from https://people.apache.org/~pwendell/spark-1.2.0-snapshot1/ and clone and built the spark-avro repo you linked to. When I run it against the example avro file linked to in the documentation it works. However, when I try to load my av

Re: Parsing a large XML file using Spark

2014-11-21 Thread Prannoy
HDFS, is it possible > to parse it in parallel/faster using Spark? Or do we have to use something > like a PullParser or Iteratee? > > My current solution is to read the single XML file in the first pass - > write it to HDFS and then read the small files in parallel on the Spark >

Re: How can I read this avro file using spark & scala?

2014-11-20 Thread Michael Armbrust
One option (starting with Spark 1.2, which is currently in preview) is to use the Avro library for Spark SQL. This is very new, but we would love to get feedback: https://github.com/databricks/spark-avro On Thu, Nov 20, 2014 at 10:19 AM, al b wrote: > I've read several posts of people strugglin

How can I read this avro file using spark & scala?

2014-11-20 Thread al b
I've read several posts of people struggling to read avro in spark. The examples I've tried don't work. When I try this solution ( https://stackoverflow.com/questions/23944615/how-can-i-load-avros-in-spark-using-the-schema-on-board-the-avro-files) I get errors: spark java.io.NotSerializableExcepti

Re: Parsing a large XML file using Spark

2014-11-18 Thread Tobias Pfeiffer
Hi, see https://www.mail-archive.com/dev@spark.apache.org/msg03520.html for one solution. One issue with those XML files is that they cannot be processed line by line in parallel; plus you inherently need shared/global state to parse XML or check for well-formedness, I think. (Same issue with mul

Parsing a large XML file using Spark

2014-11-18 Thread Soumya Simanta
If there a one big XML file (e.g., Wikipedia dump 44GB or the larger dump that all revision information also) that is stored in HDFS, is it possible to parse it in parallel/faster using Spark? Or do we have to use something like a PullParser or Iteratee? My current solution is to read the single

Re: External table partitioned by date using Spark SQL

2014-11-11 Thread ehalpern
/External-table-partitioned-by-date-using-Spark-SQL-tp18663p18675.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail

External table partitioned by date using Spark SQL

2014-11-11 Thread ehalpern
al-table-partitioned-by-date-using-Spark-SQL-tp18663.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: netty on classpath when using spark-submit

2014-11-09 Thread Tobias Pfeiffer
Hi, On Wed, Nov 5, 2014 at 10:23 AM, Tobias Pfeiffer wrote: > > On Tue, Nov 4, 2014 at 8:33 PM, M. Dale wrote: > >>From http://spark.apache.org/docs/latest/configuration.html it seems >> that there is an experimental property: >> >> spark.files.userClassPathFirst >> > > Thank you very much, I

Re: netty on classpath when using spark-submit

2014-11-04 Thread Tobias Pfeiffer
Markus, thanks for your help! On Tue, Nov 4, 2014 at 8:33 PM, M. Dale wrote: > Tobias, >From http://spark.apache.org/docs/latest/configuration.html it seems > that there is an experimental property: > > spark.files.userClassPathFirst > Thank you very much, I didn't know about this. Unfor

Re: netty on classpath when using spark-submit

2014-11-04 Thread M. Dale
Tobias, From http://spark.apache.org/docs/latest/configuration.html it seems that there is an experimental property: spark.files.userClassPathFirst Whether to give user-added jars precedence over Spark's own jars when loading classes in Executors. This feature can be used to mitigate conf

netty on classpath when using spark-submit

2014-11-03 Thread Tobias Pfeiffer
Hi, I tried hard to get a version of netty into my jar file created with sbt assembly that works with all my libraries. Now I managed that and was really happy, but it seems like spark-submit puts an older version of netty on the classpath when submitting to a cluster, such that my code ends up wi

Re: Algebird using spark-shell

2014-10-30 Thread Ian O'Connell
LogLogMonoid >>> >>> >>> scala> val hll = new HyperLogLogMonoid(12) >>> hll: com.twitter.algebird.HyperLogLogMonoid = >>> com.twitter.algebird.HyperLogLogMonoid@7bde289a >>> >>> >>> https://github.com/twitter/algebird/wiki/Algebir

Re: Algebird using spark-shell

2014-10-30 Thread Buntu Dev
t;> https://github.com/twitter/algebird/wiki/Algebird-Examples-with-REPL >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Algebird-using-spark-shell-tp17701p17714.html >> Sent from the

Re: Algebird using spark-shell

2014-10-30 Thread Ian O'Connell
HyperLogLogMonoid@7bde289a > > > https://github.com/twitter/algebird/wiki/Algebird-Examples-with-REPL > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Algebird-using-spark-

Re: Algebird using spark-shell

2014-10-30 Thread thadude
bble.com/Algebird-using-spark-shell-tp17701p17714.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: Algebird using spark-shell

2014-10-29 Thread Akhil Das
ght jar to pass? Are there any non-streaming examples? > > Thanks! > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Algebird-using-spark-shell-tp17701.html > Sent from the Apache Spark User List mailing list archive at

Algebird using spark-shell

2014-10-29 Thread bdev
~~~ Is that the right jar to pass? Are there any non-streaming examples? Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Algebird-using-spark-shell-tp17701.html Sent from the Apache Spark User List mailing list arch

Re: Memory requirement of using Spark

2014-10-24 Thread jian.t
-requirement-of-using-Spark-tp17177p17204.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: Memory requirement of using Spark

2014-10-24 Thread Akhil Das
y, in this case most of the data would reside on the disk and spark will utilize it efficiently. Thanks Best Regards On Fri, Oct 24, 2014 at 8:47 AM, jian.t wrote: > Hello, > I am new to Spark. I have a basic question about the memory requirement of > using Spark. > > I need to j

Re: submit query to spark cluster using spark-sql

2014-10-23 Thread tridib
Figured it out. spark-sql --master spark://sparkmaster:7077 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/submit-query-to-spark-cluster-using-spark-sql-tp17182p17183.html Sent from the Apache Spark User List mailing list archive at Nabble.com

submit query to spark cluster using spark-sql

2014-10-23 Thread tridib
I want to submit query to spark cluster using spark-sql. I am using hive meta store. It's in HDFS. But when I query it does not look like get submitted to spark cluster. I don't see any entry in master web UI. How can I confirm the behavior? -- View this message in context: http://ap

Memory requirement of using Spark

2014-10-23 Thread jian.t
Hello, I am new to Spark. I have a basic question about the memory requirement of using Spark. I need to join multiple data sources between multiple data sets. The join is not a straightforward join. The logic is more like: first join T1 on column A with T2, then for all the records that

New research using Spark: Unified Secure On-/Off-line Analytics

2014-10-20 Thread Peter Coetzee
New open-access research published in the journal of Parallel Computing demonstrates a novel approach to engineering analytics for deployment in streaming and batch contexts. Increasing numbers of users are extracting real value from their data using tools like IBM InfoSphere Streams for near-real

Re: how to submit multiple jar files when using spark-submit script in shell?

2014-10-17 Thread Marcelo Vanzin
On top of what Andrew said, you shouldn't need to manually add the mllib jar to your jobs; it's already included in the Spark assembly jar. On Thu, Oct 16, 2014 at 11:51 PM, eric wong wrote: > Hi, > > i using the comma separated style for submit multiple jar files in the > follow shell but it doe

Re: how to submit multiple jar files when using spark-submit script in shell?

2014-10-17 Thread Andrew Or
Hm, it works for me. Are you sure you have provided the right jars? What happens if you pass in the `--verbose` flag? 2014-10-16 23:51 GMT-07:00 eric wong : > Hi, > > i using the comma separated style for submit multiple jar files in the > follow shell but it does not work: > > bin/spark-submit -

Re: Help required on exercise Data Exploratin using Spark SQL

2014-10-17 Thread Michael Armbrust
et file.. I'm not able to read the content of parquet file directly. > > How to validate the output of these queries with the actual content in the > parquet file. > What is the workaround for this issue. > How to read the file through Spark SQL. > Is there a need to change t

Regarding using spark sql with yarn

2014-10-17 Thread twinkle sachdeva
Hi, I have been using spark sql with yarn. It works fine with yarn-client mode, but with yarn-cluster mode, we are facing 2 issues. Is yarn-cluster mode not recommended for spark-sql using hiveContext ?? *Problem #1* We are not able to use any query with very simple filtering operation "

Re: Help required on exercise Data Exploratin using Spark SQL

2014-10-17 Thread neeraj
ext: http://apache-spark-user-list.1001560.n3.nabble.com/Help-required-on-exercise-Data-Exploratin-using-Spark-SQL-tp16569p16673.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-

how to submit multiple jar files when using spark-submit script in shell?

2014-10-16 Thread eric wong
Hi, i using the comma separated style for submit multiple jar files in the follow shell but it does not work: bin/spark-submit --class org.apache.spark.examples.mllib.JavaKMeans --master yarn-cluster --execur-memory 2g *--jars lib/spark-examples-1.0.2-hadoop2.2.0.jar,lib/spark-mllib_2.10-1.0.0.ja

Re: Help required on exercise Data Exploratin using Spark SQL

2014-10-16 Thread Cheng Lian
Hi Neeraj, The Spark Summit 2014 tutorial uses Spark 1.0. I guess you're using Spark 1.1? Parquet support got polished quite a bit since then, and changed the string representation of the query plan, but this output should be OK :) Cheng On 10/16/14 10:45 PM, neeraj wrote: Hi,

Help required on exercise Data Exploratin using Spark SQL

2014-10-16 Thread neeraj
Hi, I'm exploring an exercise Data Exploratin using Spark SQL from Spark Summit 2014. While running command "val wikiData = sqlContext.parquetFile("data/wiki_parquet")".. I'm getting the following output which doesn't match with the expected output. *

Re: How to create Track per vehicle using spark RDD

2014-10-15 Thread manasdebashiskar
It is wonderful to see some idea. Now the questions: 1) What is a track segment? Ans) It is the line that contains two adjacent points when all points are arranged by time. Say a vehicle moves (t1, p1) -> (t2, p2) -> (t3, p3). Then the segments are (p1, p2), (p2, p3) when the time ordering is (t1

Re: How to create Track per vehicle using spark RDD

2014-10-14 Thread Sean Owen
You say you reduceByKey but are you really collecting all the tuples for a vehicle in a collection, like what groupByKey does already? Yes, if one vehicle has a huge amount of data that could fail. Otherwise perhaps you are simply not increasing memory from the default. Maybe you can consider usi

Re: How to create Track per vehicle using spark RDD

2014-10-14 Thread Mohit Singh
Perhaps, its just me but "lag" function isnt familiar to me .. But have you tried configuring the spark appropriately http://spark.apache.org/docs/latest/configuration.html On Tue, Oct 14, 2014 at 5:37 PM, Manas Kar wrote: > Hi, > I have an RDD containing Vehicle Number , timestamp, Position.

How to create Track per vehicle using spark RDD

2014-10-14 Thread Manas Kar
Hi, I have an RDD containing Vehicle Number , timestamp, Position. I want to get the "lag" function equivalent to my RDD to be able to create track segment of each Vehicle. Any help? PS: I have tried reduceByKey and then splitting the List of position in tuples. For me it runs out of memory eve

Re: Akka Connection refused - standalone cluster using spark-0.9.0

2014-10-03 Thread irina
-cluster-using-spark-0-9-0-tp1297p15684.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: Build error when using spark with breeze

2014-09-26 Thread Xiangrui Meng
; following build error : >>>>> >>>>> Error:scalac: bad symbolic reference. A signature in RandBasis.class >>>>> refers to term math3 >>>>> in package org.apache.commons which is not available. >>>&g

Re: Build error when using spark with breeze

2014-09-26 Thread Jaonary Rabarisoa
Error:scalac: bad symbolic reference. A signature in RandBasis.class >>>> refers to term math3* >>>> *in package org.apache.commons which is not available.* >>>> *It may be completely missing from the current classpath, or the >>>> version on* &g

Re: Build error when using spark with breeze

2014-09-26 Thread Ted Yu
*Error:scalac: bad symbolic reference. A signature in RandBasis.class >>> refers to term math3* >>> *in package org.apache.commons which is not available.* >>> *It may be completely missing from the current classpath, or the version >>> on* >>> *the classpath

Re: Build error when using spark with breeze

2014-09-26 Thread Sean Owen
from the current classpath, or the version on >> the classpath might be incompatible with the version used when compiling >> RandBasis.class. >> >> In my case, I just declare a new Gaussian distribution >> >> val g = new Gaussian(0d,1d) >> >> I'm using sp

Re: Build error when using spark with breeze

2014-09-26 Thread Jaonary Rabarisoa
ompatible with the version used when compiling >> RandBasis.class.* >> >> In my case, I just declare a new Gaussian distribution >> >> *val g = new Gaussian(0d,1d)* >> >> I'm using spark 1.1 >> >> >> Any ideas to fix this ? >> >> >> Best regards, >> >> >> Jao >> > >

Re: Build error when using spark with breeze

2014-09-26 Thread Ted Yu
ng from the current classpath, or the version > on* > *the classpath might be incompatible with the version used when compiling > RandBasis.class.* > > In my case, I just declare a new Gaussian distribution > > *val g = new Gaussian(0d,1d)* > > I'm using spark 1.1 > > > Any ideas to fix this ? > > > Best regards, > > > Jao >

Build error when using spark with breeze

2014-09-26 Thread Jaonary Rabarisoa
rrent classpath, or the version on* *the classpath might be incompatible with the version used when compiling RandBasis.class.* In my case, I just declare a new Gaussian distribution *val g = new Gaussian(0d,1d)* I'm using spark 1.1 Any ideas to fix this ? Best regards, Jao

Re: Specifying Spark Executor Java options using Spark Submit

2014-09-24 Thread Larry Xiao
onf object or the spark-defaults.conf file used with the spark-submit script. Heap size settings can be set with spark.executor.memory. you can find it at Runtime Environment Larry On 9/24/14 10:52 PM, Arun Ahuja wrote: What is the proper way to specify java options for the Spark executors usi

Specifying Spark Executor Java options using Spark Submit

2014-09-24 Thread Arun Ahuja
What is the proper way to specify java options for the Spark executors using spark-submit? We had done this previously using export SPARK_JAVA_OPTS='.." previously, for example to attach a debugger to each executor or add "-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps&qu

Re: Solving Systems of Linear Equations Using Spark?

2014-09-22 Thread durin
560.n3.nabble.com/Solving-Systems-of-Linear-Equations-Using-Spark-tp13674p14856.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional

Spark - Apache Blur Connector : Index Kafka Messages into Blur using Spark Streaming

2014-09-22 Thread Dibyendu Bhattacharya
Hi, Last few days I am working on a Spark - Apache Blur Connector to index Kafka messages into Apache Blur using Spark Streaming. We have been working on to build a distributed search platform for our NRT use cases and we have been playing with Spark Streaming and Apache Blur for the same. We are

Looking for a good sample of Using Spark to do things Hadoop can do

2014-09-12 Thread Steve Lewis
Assume I have a large book with many Chapters and many lines of text. Assume I have a function that tells me the similarity of two lines of text. The objective is to find the most similar line in the same chapter within 200 lines of the line found. The real problem involves biology and is beyond t

RE: problem in using Spark-Cassandra connector

2014-09-11 Thread Karunya Padala
Padala Cc: u...@spark.incubator.apache.org Subject: Re: problem in using Spark-Cassandra connector You will have to create create KeySpace and Table. See the message, Table not found: EmailKeySpace.Emails Looks like you have not created the Emails table. On Thu, Sep 11, 2014 at 6:04 PM, Karunya

Re: problem in using Spark-Cassandra connector

2014-09-11 Thread Reddy Raja
> I am new to spark. I encountered an issue when trying to connect to > Cassandra using Spark Cassandra connector. Can anyone help me. Following > are the details. > > > > 1) Following Spark and Cassandra versions I am using on LUbuntu12.0. > > i)spark-1.0.2-bin-hadoop

problem in using Spark-Cassandra connector

2014-09-11 Thread Karunya Padala
Hi, I am new to spark. I encountered an issue when trying to connect to Cassandra using Spark Cassandra connector. Can anyone help me. Following are the details. 1) Following Spark and Cassandra versions I am using on LUbuntu12.0. i)spark-1.0.2-bin-hadoop2 ii) apache-cassandra-2.0.10 2) In

Re: Solving Systems of Linear Equations Using Spark?

2014-09-08 Thread Debasish Das
Yup...this can be a spark community project...I saw a PR for that...interested users fine with lgpl/gpl code can make use of it... On Mon, Sep 8, 2014 at 12:37 PM, Xiangrui Meng wrote: > I asked Tim whether he would change the license of SuiteSparse to an > Apache-friendly license couple months

Re: Solving Systems of Linear Equations Using Spark?

2014-09-08 Thread Xiangrui Meng
I asked Tim whether he would change the license of SuiteSparse to an Apache-friendly license couple months ago, but the answer was no. So I don't think we can use SuiteSparse in MLlib through JNI. Please feel free to create JIRAs for distributed linear programming and SOCP solvers and run the discu

Re: Solving Systems of Linear Equations Using Spark?

2014-09-08 Thread Debasish Das
Xiangrui, Should I open up a JIRA for this ? Distributed lp/socp solver through ecos/ldl/amd ? I can open source it with gpl license in spark code as that's what our legal cleared (apache + gpl becomes gpl) and figure out the right way to call it...ecos is gpl but we can definitely use the jni v

Re: Solving Systems of Linear Equations Using Spark?

2014-09-08 Thread Debasish Das
Durin, I have integrated ecos with spark which uses suitesparse under the hood for linear equation solvesI have exposed only the qp solver api in spark since I was comparing ip with proximal algorithms but we can expose suitesparse api as well...jni is used to load up ldl amd and ecos librarie

Re: Solving Systems of Linear Equations Using Spark?

2014-09-07 Thread Xiangrui Meng
You can try LinearRegression with sparse input. It converges the least squares solution if the linear system is over-determined, while the convergence rate depends on the condition number. Applying standard scaling is popular heuristic to reduce the condition number. If you are interested in spars

Solving Systems of Linear Equations Using Spark?

2014-09-07 Thread durin
an algorithm for Spark? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Solving-Systems-of-Linear-Equations-Using-Spark-tp13674.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Using Spark to add data to an existing Parquet file without a schema

2014-09-04 Thread Jim Carroll
les into the same parquet directory and managing the file names externally but this seems like a work around. Is that the way others are doing it? Jim -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-to-add-data-to-an-existing-Parquet-file-witho

Using Spark to add data to an existing Parquet file without a schema

2014-09-04 Thread Jim Carroll
Thanks Jim -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-to-add-data-to-an-existing-Parquet-file-without-a-schema-tp13450.html Sent from the Apache Spark User List mailing list archive at

RE: how to correctly run scala script using spark-shell through stdin (spark v1.0.0)

2014-08-27 Thread Matei Zaharia
user@spark.apache.org Subject: how to correctly run scala script using spark-shell through stdin (spark v1.0.0)   HI All,   Right now I’m trying to execute a script using this command:   nohup $SPARK_HOME/bin/spark-shell < $HOME/my-script.scala > $HOME/my-script.log 2>&1 &   m

RE: how to correctly run scala script using spark-shell through stdin (spark v1.0.0)

2014-08-27 Thread Henry Hung
ore spark-shell even finish executing the script. Best regards, Henry Hung From: MA33 YTHung1 Sent: Thursday, August 28, 2014 10:01 AM To: user@spark.apache.org Subject: how to correctly run scala script using spark-shell through stdin (spark v1.0.0) HI All, Right now I'm trying to execute a s

how to correctly run scala script using spark-shell through stdin (spark v1.0.0)

2014-08-27 Thread Henry Hung
HI All, Right now I'm trying to execute a script using this command: nohup $SPARK_HOME/bin/spark-shell < $HOME/my-script.scala > $HOME/my-script.log 2>&1 & my-script.scala just have 1 line of code: println("hallo world") But after waiting for a minute, I still don't receive the result from sp

Is there a way to insert data into existing parquet file using spark ?

2014-08-27 Thread rafeeq s
Hi, *Is there a way to insert data into existing parquet file using spark ?* I am using spark stream and spark sql to store store real time data into parquet files and then query it using impala. spark creating multiple sub directories of parquet files and it make me challenge while loading it

Re: Issue with Spark on EC2 using spark-ec2 script

2014-08-16 Thread rkishore999
I'm also getting into same issue and is blocked here. Did any of you were able to go past this issue? I tried using both ephimeral and persistent-hdfs. I'm getting the same issue. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Issue-with-Spark-on

Using Spark Streaming to listen to HDFS directory and handle different files by file name

2014-08-14 Thread ZhangYi
As we know, in Spark, SparkContext provide the wholeTextFile() method to read all files in the specific directory, then generate RDD(fileName, content): scala> val lines = sc.wholeTextFiles("/Users/workspace/scala101/data") 14/08/14 22:43:02 INFO MemoryStore: ensureFreeSpace(35896) called with

Re: Issue with Spark on EC2 using spark-ec2 script

2014-08-07 Thread Nick Pentreath
k-ec2 script? >> >> The environment I am running on is a 4 data node 1 master spark cluster >> generated by the spark-ec2 script. I haven't modified anything in the >> environment except for adding data to the ephemeral hdfs. >> >> >> >> -- >>

Can we throttle the individual queries using SPARK

2014-08-04 Thread Mahesh Govind
HI , Can we throttle the individual queries in SPARK . So that one query will not hog the system resources ? Regards Mahesh

Re: Issue with Spark on EC2 using spark-ec2 script

2014-08-01 Thread Dean Wampler
master spark cluster > generated by the spark-ec2 script. I haven't modified anything in the > environment except for adding data to the ephemeral hdfs. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Issue-with-Spark-on

Re: Issue with Spark on EC2 using spark-ec2 script

2014-07-31 Thread ratabora
o the ephemeral hdfs. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Issue-with-Spark-on-EC2-using-spark-ec2-script-tp11088p7.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Issue with Spark on EC2 using spark-ec2 script

2014-07-31 Thread Dean Wampler
Forgot to add that I tried your program with the same input file path. It worked fine. (I used local[2], however...) Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition (O'Reilly) Typesafe @deanwampler

Re: Issue with Spark on EC2 using spark-ec2 script

2014-07-31 Thread Dean Wampler
The stack trace suggests it was trying to create a temporary file, not read your file. Of course, it doesn't say what file it couldn't create. Could there be a configuration file, like a Hadoop config file, that was read with a temp dir setting that's invalid for your machine? dean Dean Wampler,

Issue with Spark on EC2 using spark-ec2 script

2014-07-31 Thread Ryan Tabora
Hey all, I was able to spawn up a cluster, but when I'm trying to submit a simple jar via spark-submit it fails to run. I am trying to run the simple "Standalone Application" from the quickstart. Oddly enough, I could get another application running through the spark-shell. What am I doing wrong

Re: Using Spark Streaming with Kafka 0.7.2

2014-07-29 Thread Andre Schumacher
x27;t find any documentation specifically for building spark streaming. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-Streaming-with-Kafka-0-7-2-tp10674.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >

Re: Job using Spark for Machine Learning

2014-07-29 Thread Matei Zaharia
now if not.  Otherwise, if you're interested in using Spark in an R&D machine learning project then please get in touch. We are a startup based in London. Our data sets are on a massive scale- we collect data on over a billion users per month and are second only to Google in the contextual ad

Job using Spark for Machine Learning

2014-07-29 Thread Martin Goodson
I'm not sure if job adverts are allowed on here - please let me know if not. Otherwise, if you're interested in using Spark in an R&D machine learning project then please get in touch. We are a startup based in London. Our data sets are on a massive scale- we collect data on over a

How can I integrate spark cluster into my own program without using spark-submit?

2014-07-26 Thread Lizhengbing (bing, BIPA)
I want to use spark cluster through a scala function. So I can integrate spark into my program directly. For example: When I call count function in my own program, my program will deploy the function to the cluster , so I can get the result directly def count()= { val master = "spark://ma

Re: Using Spark Streaming with Kafka 0.7.2

2014-07-25 Thread Tathagata Das
ating trying to build spark streaming myself but I > can't find any documentation specifically for building spark streaming. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-Streaming-with-Kafka-0-7-2-tp10674.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >

Using Spark Streaming with Kafka 0.7.2

2014-07-25 Thread maddenpj
streaming myself but I can't find any documentation specifically for building spark streaming. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-Streaming-with-Kafka-0-7-2-tp10674.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Simple record matching using Spark SQL

2014-07-24 Thread Yin Huai
On Wed, Jul 23, 2014 at 11:32 PM, Sarath Chandra < > sarathchandra.jos...@algofusiontech.com> wrote: > >> Hi Michael, >> >> Sorry for the delayed response. >> >> I'm using Spark 1.0.1 (pre-built version for hadoop 1). I'm running spark >> pr

<    5   6   7   8   9   10   11   12   >