Spark and Scala

2014-09-12 Thread Deep Pradhan
There is one thing that I am confused about. Spark has codes that have been implemented in Scala. Now, can we run any Scala code on the Spark framework? What will be the difference in the execution of the scala code in normal systems and on Spark? The reason for my question is the following: I had

spark and scala-2.11

2015-08-24 Thread Lanny Ripple
Hello, The instructions for building spark against scala-2.11 indicate using -Dspark-2.11. When I look in the pom.xml I find a profile named 'spark-2.11' but nothing that would indicate I should set a property. The sbt build seems to need the -Dscala-2.11 property set. Finally build/mvn does a

Re: Spark and Scala

2014-09-12 Thread Nicholas Chammas
unpersist is a method on RDDs. RDDs are abstractions introduced by Spark. An Int is just a Scala Int. You can't call unpersist on Int in Scala, and that doesn't change in Spark. On Fri, Sep 12, 2014 at 12:33 PM, Deep Pradhan wrote: > There is one thing that I am confused about. > Spark has code

Re: Spark and Scala

2014-09-12 Thread Deep Pradhan
I know that unpersist is a method on RDD. But my confusion is that, when we port our Scala programs to Spark, doesn't everything change to RDDs? On Fri, Sep 12, 2014 at 10:16 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > unpersist is a method on RDDs. RDDs are abstractions introduce

Re: Spark and Scala

2014-09-12 Thread Hari Shreedharan
No, Scala primitives remain primitives. Unless you create an RDD using one of the many methods - you would not be able to access any of the RDD methods. There is no automatic porting. Spark is an application as far as scala is concerned - there is no compilation (except of course, the scala, JIT co

Re: Spark and Scala

2014-09-12 Thread Deep Pradhan
Take for example this: I have declared one queue *val queue = Queue.empty[Int]*, which is a pure scala line in the program. I actually want the queue to be an RDD but there are no direct methods to create RDD which is a queue right? What say do you have on this? Does there exist something like: *Cr

Re: Spark and Scala

2014-09-12 Thread Soumya Simanta
An RDD is a fault-tolerant distributed structure. It is the primary abstraction in Spark. I would strongly suggest that you have a look at the following to get a basic idea. http://www.cs.berkeley.edu/~pwendell/strataconf/api/core/spark/RDD.html http://spark.apache.org/docs/latest/quick-start.htm

Re: Spark and Scala

2014-09-12 Thread Deep Pradhan
Is it always true that whenever we apply operations on an RDD, we get another RDD? Or does it depend on the return type of the operation? On Sat, Sep 13, 2014 at 9:45 AM, Soumya Simanta wrote: > > An RDD is a fault-tolerant distributed structure. It is the primary > abstraction in Spark. > > I w

Re: Spark and Scala

2014-09-13 Thread Mark Hamstra
This is all covered in http://spark.apache.org/docs/latest/programming-guide.html#rdd-operations By definition, RDD transformations take an RDD to another RDD; actions produce some other type as a value on the driver program. On Fri, Sep 12, 2014 at 11:15 PM, Deep Pradhan wrote: > Is it always

Re: Spark and Scala

2014-09-13 Thread Deep Pradhan
Take for example this: *val lines = sc.textFile(args(0))* *val nodes = lines.map(s =>{ * *val fields = s.split("\\s+")* *(fields(0),fields(1))* *}).distinct().groupByKey().cache() * *val nodeSizeTuple = nodes.map(node => (node._1.toInt, node._2.size))* *val rootNode = nodeSizeTuple.

Re: Spark and Scala

2014-09-13 Thread Mark Hamstra
Again, RDD operations are of two basic varieties: transformations, that produce further RDDs; and operations, that return values to the driver program. You've used several RDD transformations and then finally the top(1) action, which returns an array of one element to your driver program. That is

Re: Spark and Scala

2014-09-13 Thread Mark Hamstra
Sorry, posting too late at night. That should be "...transformations, that produce further RDDs; and actions, that return values to the driver program." On Sat, Sep 13, 2014 at 12:45 AM, Mark Hamstra wrote: > Again, RDD operations are of two basic varieties: transformations, that > produce furt

Re: spark and scala-2.11

2015-08-24 Thread Sean Owen
The property "scala-2.11" triggers the profile "scala-2.11" -- and additionally disables the scala-2.10 profile, so that's the way to do it. But yes, you also need to run the script before-hand to set up the build for Scala 2.11 as well. On Mon, Aug 24, 2015 at 8:48 PM, Lanny Ripple wrote: > Hell

Re: spark and scala-2.11

2015-08-24 Thread Jonathan Coveney
I've used the instructions and it worked fine. Can you post exactly what you're doing, and what it fails with? Or are you just trying to understand how it works? 2015-08-24 15:48 GMT-04:00 Lanny Ripple : > Hello, > > The instructions for building spark against scala-2.11 indicate using > -Dspark

Re: spark and scala-2.11

2015-08-24 Thread Lanny Ripple
We're going to be upgrading from spark 1.0.2 and using hadoop-1.2.1 so need to build by hand. (Yes, I know. Use hadoop-2.x but standard resource constraints apply.) I want to build against scala-2.11 and publish to our artifact repository but finding build/spark-2.10.4 and tracing down what build

Pivot Data in Spark and Scala

2015-10-29 Thread Ascot Moss
Hi, I have data as follows: A, 2015, 4 A, 2014, 12 A, 2013, 1 B, 2015, 24 B, 2013 4 I need to convert the data to a new format: A ,4,12,1 B, 24,,4 Any idea how to make it in Spark Scala? Thanks

XML Parsing with Spark and SCala

2017-08-11 Thread Etisha Jain
ala> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/XML-Parsing-with-Spark-and-SCala-tp29053.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail

CI/CD for spark and scala

2018-01-24 Thread Deepak Sharma
Hi All, I just wanted to check if there are any best practises around using CI/CD for spark / scala projects running on AWS hadoop clusters. IF there is any specific tools , please do let me know. -- Thanks Deepak

Moving average using Spark and Scala

2015-07-11 Thread Anupam Bagchi
I have to do the following tasks on a dataset using Apache Spark with Scala as the programming language: Read the dataset from HDFS. A few sample lines look like this: deviceid,bytes,eventdate 15590657,246620,20150630 14066921,1907,20150621 14066921,1906,20150626 6522013,2349,20150626 6522013,252

Finding moving average using Spark and Scala

2015-07-13 Thread Anupam Bagchi
I have to do the following tasks on a dataset using Apache Spark with Scala as the programming language: - Read the dataset from HDFS. A few sample lines look like this: deviceid,bytes,eventdate 15590657,246620,20150630 14066921,1907,20150621 14066921,1906,20150626 6522013,2349,20150626 652

Re: Pivot Data in Spark and Scala

2015-10-29 Thread Deng Ching-Mallete
Hi, You could transform it into a pair RDD then use the combineByKey function. HTH, Deng On Thu, Oct 29, 2015 at 7:29 PM, Ascot Moss wrote: > Hi, > > I have data as follows: > > A, 2015, 4 > A, 2014, 12 > A, 2013, 1 > B, 2015, 24 > B, 2013 4 > > > I need to convert the data to a new format: >

Re: Pivot Data in Spark and Scala

2015-10-30 Thread Adrian Tanase
-spark-ml-pipelines.html -adrian From: Deng Ching-Mallete Date: Friday, October 30, 2015 at 4:35 AM To: Ascot Moss Cc: User Subject: Re: Pivot Data in Spark and Scala Hi, You could transform it into a pair RDD then use the combineByKey function. HTH, Deng On Thu, Oct 29, 2015 at 7:29 PM, Ascot

RE: Pivot Data in Spark and Scala

2015-10-30 Thread Andrianasolo Fanilo
d you would definitely need to use a specialized timeseries library… result.foreach(println) sc.stop() Best regards, Fanilo De : Adrian Tanase [mailto:atan...@adobe.com] Envoyé : vendredi 30 octobre 2015 11:50 À : Deng Ching-Mallete; Ascot Moss Cc : User Objet : Re: Pivot Data in Spark and S

Re: Pivot Data in Spark and Scala

2015-10-30 Thread Ali Tajeldin EDU
t; > Depending on your downstream processing, I’d probably try to emulate it with > a hash map with years as keys instead of the columns. > > There is probably a nicer solution using the data frames API but I’m not > familiar with it. > > If you actually need vectors I think this

Re: Pivot Data in Spark and Scala

2015-10-30 Thread Ruslan Dautkhanov
https://issues.apache.org/jira/browse/SPARK-8992 Should be in 1.6? -- Ruslan Dautkhanov On Thu, Oct 29, 2015 at 5:29 AM, Ascot Moss wrote: > Hi, > > I have data as follows: > > A, 2015, 4 > A, 2014, 12 > A, 2013, 1 > B, 2015, 24 > B, 2013 4 > > > I need to convert the data to a new format:

Re: Pivot Data in Spark and Scala

2015-10-31 Thread ayan guha
p{ *case *(name, mapOfYearsToValues) => (*Seq*(name) ++ > sequenceOfYears.map(year => mapOfYearsToValues.getOrElse(year, *" "* > ))).mkString(*","*)}* // here we assume that sequence of all years isn’t > too big to not fit in memory. If you had to compute for each day

Re: XML Parsing with Spark and SCala

2017-08-11 Thread Jörn Franke
t coming. > I am attaching a file. Can anyone help me to do this > > solvePuzzle1.scala > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n29053/solvePuzzle1.scala> > > > > > -- > View this message in context: > http://apache-spark-user-list.

Re: Finding moving average using Spark and Scala

2015-07-13 Thread Feynman Liang
> > The call to Sorting.quicksort is not working. Perhaps I am calling it the > wrong way. allaggregates.toArray allocates and creates a new array separate from allaggregates which is sorted by Sorting.quickSort; allaggregates. Try: val sortedAggregates = allaggregates.toArray Sorting.quickSort(so

Re: Finding moving average using Spark and Scala

2015-07-13 Thread Feynman Liang
A good example is RegressionMetrics 's use of of OnlineMultivariateSummarizer to aggregate statistics across labels and residuals; take a look at how aggregateByKey is use

Re: Finding moving average using Spark and Scala

2015-07-13 Thread Anupam Bagchi
Thank you Feynman for your response. Since I am very new to Scala I may need a bit more hand-holding at this stage. I have been able to incorporate your suggestion about sorting - and it now works perfectly. Thanks again for that. I tried to use your suggestion of using MultiVariateOnlineSummar

Re: Finding moving average using Spark and Scala

2015-07-13 Thread Anupam Bagchi
Thank you Feynman for the lead. I was able to modify the code using clues from the RegressionMetrics example. Here is what I got now. val deviceAggregateLogs = sc.textFile(logFile).map(DailyDeviceAggregates.parseLogLine).cache() // Calculate statistics based on bytes-transferred val deviceIdsM

Re: Finding moving average using Spark and Scala

2015-07-13 Thread Feynman Liang
Dimensions mismatch when adding new sample. Expecting 8 but got 14. Make sure all the vectors you are summarizing over have the same dimension. Why would you want to write a MultivariateOnlineSummary object (which can be represented with a couple Double's) into a distributed filesystem like HDFS?

Re: Finding moving average using Spark and Scala

2015-07-13 Thread Anupam Bagchi
Hello Feynman, Actually in my case, the vectors I am summarizing over will not have the same dimension since many devices will be inactive on some days. This is at best a sparse matrix where we take only the active days and attempt to fit a moving average over it. The reason I would like to sa

Re: Finding moving average using Spark and Scala

2015-07-14 Thread Feynman Liang
If your rows may have NAs in them, I would process each column individually by first projecting the column ( map(x => x.nameOfColumn) ), filtering out the NAs, then running a summarizer over each column. Even if you have many rows, after summarizing you will only have a vector of length #columns.

Re: Finding moving average using Spark and Scala

2015-07-17 Thread Anupam Bagchi
Thanks Feynman for your direction. I was able to solve this problem by calling Spark API from Java. Here is a code snippet that may help other people who might face the same challenge. if (args.length > 2) { earliestEventDate = Integer.parseInt(args[2]); } else {

Real Time Recommendation Engines with Spark and Scala

2016-09-05 Thread Mich Talebzadeh
Hi, Has anyone done any work on Real time recommendation engines with Spark and Scala. I have seen few PPTs with Python but wanted to see if these have been done with Scala. I trust this question makes sense. Thanks p.s. My prime interest would be in Financial markets. Dr Mich Talebzadeh

Re: Real Time Recommendation Engines with Spark and Scala

2016-09-05 Thread Alonso Isidoro Roman
https://]about.me/alonso.isidoro.roman <https://about.me/alonso.isidoro.roman?promo=email_sig&utm_source=email_sig&utm_medium=email_sig&utm_campaign=external_links> 2016-09-05 15:41 GMT+02:00 Mich Talebzadeh : > Hi, > > Has anyone done any work on Real time recommendati

Re: Real Time Recommendation Engines with Spark and Scala

2016-09-05 Thread Mich Talebzadeh
e: https://]about.me/alonso.isidoro.roman > > <https://about.me/alonso.isidoro.roman?promo=email_sig&utm_source=email_sig&utm_medium=email_sig&utm_campaign=external_links> > > 2016-09-05 15:41 GMT+02:00 Mich Talebzadeh : > >> Hi, >> >> Has anyone don

Re: Real Time Recommendation Engines with Spark and Scala

2016-09-05 Thread Alonso Isidoro Roman
-engine> >> >> >> >> Alonso Isidoro Roman >> [image: https://]about.me/alonso.isidoro.roman >> >> <https://about.me/alonso.isidoro.roman?promo=email_sig&utm_source=email_sig&utm_medium=email_sig&utm_campaign=external_links> &g

Re: Real Time Recommendation Engines with Spark and Scala

2016-09-05 Thread Alonso Isidoro Roman
The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> On 5 September 2016 at 15:08, Alonso Isidoro Roman >> wrote: >> >>> Hi Mitch, i wrote few months ago a tiny project with this issue in mind. &

Re: Real Time Recommendation Engines with Spark and Scala

2016-09-05 Thread Mich Talebzadeh
> [image: Inline images 1] >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Dr Mich Talebzadeh >>>>> >>>>> >>>>> >>>>> Link

Calculating moving average of dataset in Apache Spark and Scala

2015-07-11 Thread Anupam Bagchi
I have to do the following tasks on a dataset using Apache Spark with Scala as the programming language: Read the dataset from HDFS. A few sample lines look like this: deviceid,bytes,eventdate 15590657,246620,20150630 14066921,1907,20150621 14066921,1906,20150626 6522013,2349,20150626 6522013,252

Optimized way to multiply two large matrices and save output using Spark and Scala

2016-01-13 Thread Devi P.V
I want to multiply two large matrices (from csv files)using Spark and Scala and save output.I use the following code val rows=file1.coalesce(1,false).map(x=>{ val line=x.split(delimiter).map(_.toDouble) Vectors.sparse(line.length, line.zipWithIndex.map(e => (e._2

Re: Optimized way to multiply two large matrices and save output using Spark and Scala

2016-01-13 Thread Burak Yavuz
kMatrix to multiply, and CoordinateMatrix to save it back again. Thanks, Burak On Wed, Jan 13, 2016 at 8:16 PM, Devi P.V wrote: > I want to multiply two large matrices (from csv files)using Spark and > Scala and save output.I use the following code > > val rows=file1.c