Hi DB,
Thanks a lot.
Appreciated.
BR,
Aslan
On Sun, Jun 8, 2014 at 2:52 AM, DB Tsai wrote:
> Hi Aslan,
>
> You can check out the unittest code of GradientDescent.runMiniBatchSGD
>
>
> https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/mllib/optimization/Gradient
Hi All,
I am new to Spark.
In the Spark shell, how can I get the help or explanation for those
functions that I can use for a variable or RDD? For example, after I input a
RDD's name with a dot (.) at the end, if I press the Tab key, a list of
functions that I can use for this RDD will be displa
You can consult the docs at :
https://spark.apache.org/docs/latest/api/scala/index.html#package
In particular, the rdd docs contain the explanation of each method :
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD
Kr, Gerard
On Jun 8, 2014 1:00 PM, "Carter" wrot
Thank you very much Gerard.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-the-help-or-explanation-for-the-functions-in-Spark-shell-tp7191p7193.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Yes.. but from what I understand that's a "sliding window" so for a window
of (60) over (1) second DStreams, that would save the entire last minute of
data once per second. That's more than I need.
I think what I'm after is probably updateStateByKey... I want to mutate
data structures (probably ev
I read it more carefully, and window() might actually work for some other
stuff like logs. (assuming I can have multiple windows with entirely
different attributes on a single stream..)
Thanks for that!
On Sun, Jun 8, 2014 at 11:11 PM, Jeremy Lee
wrote:
> Yes.. but from what I understand that'
Yeah... Have not tried it, but if you set the slidingDuration == windowDuration
that should prevent overlaps.
Gino B.
> On Jun 8, 2014, at 8:25 AM, Jeremy Lee wrote:
>
> I read it more carefully, and window() might actually work for some other
> stuff like logs. (assuming I can have multiple
Hi All,
I just downloaded the Scala IDE for Eclipse. After I created a Spark project
and clicked "Run" there was an error on this line of code "import
org.apache.spark.SparkContext": "object apache is not a member of package
org". I guess I need to import the Spark dependency into Scala IDE for
Ec
Project->Properties->Java Build Path->Add External Jars
Add the /spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar
Cheers
On Sun, Jun 8, 2014 at 8:06 AM, Carter wrote:
> Hi All,
>
> I just downloaded the Scala IDE for Eclipse. After I created a Spark
> project
> and clicked "Run
This will make the compilation pass but you may not be able to run it
correctly.
I used maven adding these two jars (I use Hadoop 1), maven added their
dependent jars (a lot) for me.
org.apache.spark
spark-core_2.10
1.0.0
org.apache.hadoop
hadoop-client
1.2.1
Best
I shut down my first (working) cluster and brought up a fresh one... and
It's been a bit of a horror and I need to sleep now. Should I be worried
about these errors? Or did I just have the old log4j.config tuned so I
didn't see them?
I
14/06/08 16:32:52 ERROR scheduler.JobScheduler: Error running
A match clause needs to cover all the possibilities, and not matching
any regex is a distinct possibility. It's not really like 'switch'
because it requires this and I think that has benefits, like being
able to interpret a match as something with a type. I think it's all
in order, but it's more of
When you use match, the match must be exhaustive. That is, a match error is
thrown if the match fails.
That's why you usually handle the default case using "case _ => ..."
Here it looks like your taking the text of all statuses - which means not all
of them will be commands... Which means
>
> The solution is either to add a default case which does nothing, or
> probably better to add a .filter such that you filter out anything that's
> not a command before matching.
>
And you probably want to push down that filter into the cluster --
collecting all of the elements of an RDD only to
HI,
I am stuck here, my cluster is not effficiently utilized . Appreciate any
input on this.
Thanks
Subacini
On Sat, Jun 7, 2014 at 10:54 PM, Subacini B wrote:
> Hi All,
>
> My cluster has 5 workers each having 4 cores (So total 20 cores).It is in
> stand alone mode (not using Mesos or Yarn)
Hi All,
I was writing a simple Streaming job to get more understanding about Spark
streaming.
I am not understanding why the union behaviour in this particular case
*WORKS:*
val lines = ssc.socketTextStream("localhost", ,
StorageLevel.MEMORY_AND_DISK_SER)
val words = lines..flatMap(_.
In PySpark you can also do help(my_rdd) and get a nice help page of methods
available.
2014년 6월 8일 일요일, Carter님이 작성한 메시지:
> Thank you very much Gerard.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-the-help-or-explanation-for-the-funct
Have a look at:
https://spark.apache.org/docs/1.0.0/job-scheduling.html
https://spark.apache.org/docs/1.0.0/spark-standalone.html
The default is to grab resource on all nodes. In your case you could set
spark.cores.max to 2 or less to enable running two apps on a cluster of
4-core machines simult
Moving over to the dev list, as this isn't a user-scope issue.
I just ran into this issue with the missing saveAsTestFile, and here's a
little additional information:
- Code ported from 0.9.1 up to 1.0.0; works with local[n] in both cases.
- Driver built as an uberjar via Maven.
- Deployed to sma
Paul,
Could you give the version of Java that you are building with and the
version of Java you are running with? Are they the same?
Just off the cuff, I wonder if this is related to:
https://issues.apache.org/jira/browse/SPARK-1520
If it is, it could appear that certain functions are not in the
Also I should add - thanks for taking time to help narrow this down!
On Sun, Jun 8, 2014 at 1:02 PM, Patrick Wendell wrote:
> Paul,
>
> Could you give the version of Java that you are building with and the
> version of Java you are running with? Are they the same?
>
> Just off the cuff, I wonder
I suspect Patrick is right about the cause. The Maven artifact that
was released does contain this class (phew)
http://search.maven.org/#artifactdetails%7Corg.apache.spark%7Cspark-core_2.10%7C1.0.0%7Cjar
As to the hadoop1 / hadoop2 artifact question -- agree that is often
done. Here the working t
Hi, Patrick --
Java 7 on the development machines:
» java -version
1 ↵
java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
And on the deployed boxes:
$ java -version
Okay I think I've isolated this a bit more. Let's discuss over on the JIRA:
https://issues.apache.org/jira/browse/SPARK-2075
On Sun, Jun 8, 2014 at 1:16 PM, Paul Brown wrote:
>
> Hi, Patrick --
>
> Java 7 on the development machines:
>
> » java -version
> 1 ↵
> java version "1.7.0_51"
> Java(TM)
Thanks Sean, let me try to set spark.deploy.spreadOut as false.
On Sun, Jun 8, 2014 at 12:44 PM, Sean Owen wrote:
> Have a look at:
>
> https://spark.apache.org/docs/1.0.0/job-scheduling.html
> https://spark.apache.org/docs/1.0.0/spark-standalone.html
>
> The default is to grab resource on al
Gaurav,
I am not sure that the "*" expands to what you expect it to do.
Normally the bash expands "*" to a space-separated string, not
colon-separated. Try specifying all the jars manually, maybe?
Tobias
On Thu, Jun 5, 2014 at 6:45 PM, Gaurav Dasgupta wrote:
> Hi,
>
> I have written my own cust
On Sun, Jun 8, 2014 at 10:00 AM, Nick Pentreath
wrote:
> When you use match, the match must be exhaustive. That is, a match error
> is thrown if the match fails.
Ahh, right. That makes sense. Scala is applying its "strong typing" rules
here instead of "no ceremony"... but isn't the idea that t
Jeremy,
On Mon, Jun 9, 2014 at 10:22 AM, Jeremy Lee
wrote:
>> When you use match, the match must be exhaustive. That is, a match error
>> is thrown if the match fails.
>
> Ahh, right. That makes sense. Scala is applying its "strong typing" rules
> here instead of "no ceremony"... but isn't the id
I'm having some trouble getting a basic matrix multiply to work with Breeze.
I'm pretty sure it's related to my classpath. My setup is a cluster on AWS
with 8 m3.xlarges. To create the cluster I used the provided ec2 scripts and
Spark 1.0.0.
I've made a gist with the relevant pieces of my app:
ht
Dear All,
I recently installed Spark 1.0.0 on a 10-slave dedicate cluster. However,
the max input rate that the system can sustain with stable latency seems
very low. I use a simple word counting workload over tweets:
theDStream.flatMap(extractWordOnePairs).reduceByKey(sumFunc).count.print
With
Hi,
I had a similar problem; I was using `sbt assembly` to build a jar
containing all my dependencies, but since my file system has a problem
with long file names (due to disk encryption), some class files (which
correspond to functions in Scala) where not included in the jar I
uploaded. Although,
Thanks a lot Krishna, this works for me.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-compile-a-Spark-project-in-Scala-IDE-for-Eclipse-tp7197p7223.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Thanks for your reply Wei, will try this.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-compile-a-Spark-project-in-Scala-IDE-for-Eclipse-tp7197p7224.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Thanks for the quick response. No, I actually build my jar via 'sbt package'
on EC2 on the master itself.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Classpath-errors-with-Breeze-tp7220p7225.html
Sent from the Apache Spark User List mailing list archive
Hi dlaw,
You are using breeze-0.8.1, but the spark assembly jar depends on
breeze-0.7. If the spark assembly jar comes the first on the classpath
but the method from DenseMatrix is only available in breeze-0.8.1, you
get NoSuchMethod. So,
a) If you don't need the features in breeze-0.8.1, do not
Hi Tobias,
Which file system and which encryption are you using?
Best,
Xiangrui
On Sun, Jun 8, 2014 at 10:16 PM, Xiangrui Meng wrote:
> Hi dlaw,
>
> You are using breeze-0.8.1, but the spark assembly jar depends on
> breeze-0.7. If the spark assembly jar comes the first on the classpath
> but t
Without (C), what is the best practice to implement the following scenario?
1. rdd = sc.textFile(FileA)
2. rdd = rdd.map(...) // actually modifying the rdd
3. rdd.saveAsTextFile(FileA)
Since the rdd transformation is 'lazy', rdd will not materialize until
saveAsTextFile(), so FileA must still ex
Hi Jacob,
The port configuration docs that we worked on together are now available
at:
http://spark.apache.org/docs/latest/spark-standalone.html#configuring-ports-for-network-security
Thanks for the help!
Andrew
On Wed, May 28, 2014 at 3:21 PM, Jacob Eisinger wrote:
> Howdy Andrew,
>
> This
38 matches
Mail list logo