will be registered as an output stream and therefore
materialized.
Change it to a map, foreach or some other form of transform.
HTH
-Todd
On Thu, Jul 9, 2015 at 5:24 PM, Su She suhsheka...@gmail.com wrote:
Hello All,
I also posted this on the Spark/Datastax thread, but thought it was also
Hello All,
I also posted this on the Spark/Datastax thread, but thought it was also
50% a spark question (or mostly a spark question).
I was wondering what is the best practice to saving streaming Spark SQL (
directory and typing sbin/stop-dfs.sh and
then sbin/start-dfs.sh
Thanks
Best Regards
On Tue, Jun 2, 2015 at 5:03 AM, Su She suhsheka...@gmail.com wrote:
Hello All,
A bit scared I did something stupid...I killed a few PIDs that were
listening to ports 2183 (kafka), 4042 (spark app), some
Hello All,
A bit scared I did something stupid...I killed a few PIDs that were
listening to ports 2183 (kafka), 4042 (spark app), some of the PIDs
didn't even seem to be stopped as they still are running when i do
lsof -i:[port number]
I'm not sure if the problem started after or before I did
I'm trying to set up my own cluster and am having trouble running this script:
./spark-ec2 --key-pair=xx --identity-file=xx.pem --region=us-west-2
--zone=us-west-2c --num-slaves=1 launch my-spark-cluster
based off: https://spark.apache.org/docs/latest/ec2-scripts.html
It just tries to open the
Hmm, just tried to run it again, but opened the script with python,
the cmd line seemed to pop up really quick and exited.
On Wed, May 13, 2015 at 2:06 PM, Su She suhsheka...@gmail.com wrote:
Hi Ted, Yes I do have Python 3.5 installed. I just ran py from the
ec2 directory and it started up
!
On Mon, Apr 27, 2015 at 11:48 AM, Su She suhsheka...@gmail.com wrote:
Hello Xiangrui,
I am using this spark-submit command (as I do for all other jobs):
/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/bin/spark-submit
--class MLlib --master local[2] --jars $(echo
/home/ec2-user
/learning-spark/target/simple-project-1.1.jar
Thank you for the help!
Best,
Su
On Mon, Apr 27, 2015 at 9:58 AM, Xiangrui Meng men...@gmail.com wrote:
How did you run the example app? Did you use spark-submit? -Xiangrui
On Thu, Apr 23, 2015 at 2:27 PM, Su She suhsheka...@gmail.com wrote:
Sorry
I had asked this question before, but wanted to ask again as I think
it is related to my pom file or project setup.
I have been trying on/off for the past month to try to run this MLlib example:
-
To unsubscribe, e-mail:
Sorry, accidentally sent the last email before finishing.
I had asked this question before, but wanted to ask again as I think
it is now related to my pom file or project setup. Really appreciate the help!
I have been trying on/off for the past month to try to run this MLlib
example:
Hello Everyone,
I am trying to implement this example (Spark Streaming with Twitter).
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/TwitterPopularTags.scala
I am able to do:
hashTags.print() to get a live stream of filtered hashtags,
AM, Xiangrui Meng men...@gmail.com wrote:
+Holden, Joseph
It seems that there is something wrong with the sample data file:
https://github.com/databricks/learning-spark/blob/master/files/ham.txt
-Xiangrui
On Fri, Mar 27, 2015 at 1:03 PM, Su She suhsheka...@gmail.com wrote:
Hello
On Thu, Mar 19, 2015 at 1:15 PM, Su She suhsheka...@gmail.com wrote:
Hi Akhil,
1) How could I see how much time it is spending on stage 1? Or what if,
like above, it doesn't get past stage 1?
2) How could I check if its a GC time? and where would I increase the
parallelism
Hello Everyone,
I am trying to run this MLlib example from Learning Spark:
https://github.com/databricks/learning-spark/blob/master/src/main/scala/com/oreilly/learningsparkexamples/scala/MLlib.scala#L48
Things I'm doing differently:
1) Using spark shell instead of an application
2) instead of
time it spend on Stage 1.
See if its a GC time, then try increasing the level of parallelism or
repartition it like sc.getDefaultParallelism*3.
Thanks
Best Regards
On Thu, Mar 19, 2015 at 12:15 PM, Su She suhsheka...@gmail.com wrote:
Hello Everyone,
I am trying to run this MLlib example
Hello,
So actually solved the problem...see point 3.
Here are a few approaches/errors I was getting:
1) mvn package exec:java -Dexec.mainClass=HelloWorld
Error: java.lang.ClassNotFoundException: HelloWorld
2)
these jars (joda-time-2.7.jar, joda-convert-1.7.jar)
either as part of your build and assembly or via the --jars option to
spark-submit.
HTH.
On Fri, Feb 27, 2015 at 2:48 PM, Su She suhsheka...@gmail.com wrote:
Hello Everyone,
I'm having some issues launching (non-spark) applications via
Hello Everyone,
I'm having some issues launching (non-spark) applications via the
spark-submit commands. The common error I am getting is c/p below. I am
able to submit a spark streaming/kafka spark application, but can't start a
dynamoDB java app. The common error is related to joda-time.
1) I
http://stackoverflow.com/questions/23527941/how-to-write-to-csv-in-spark
Just read this...seems like it should be easily readable. Thanks!
On Sat, Feb 14, 2015 at 1:36 AM, Su She suhsheka...@gmail.com wrote:
Thanks Akhil for the link. Is there a reason why there is a new directory
created
)
Thanks
Best Regards
On Sat, Feb 14, 2015 at 2:18 AM, Su She suhsheka...@gmail.com wrote:
Thanks Akhil for the suggestion, it is now only giving me one part -
. Is there anyway I can just create a file rather than a directory? It
doesn't seem like there is just a saveAsTextFile
with Spark. Here's and example for doing
that https://issues.apache.org/jira/browse/SPARK-944
Thanks
Best Regards
On Sat, Feb 14, 2015 at 2:55 PM, Su She suhsheka...@gmail.com wrote:
Hello Akhil, thank you for your continued help!
1) So, if I can write it in programitically after every batch
to the merged files /merged-ouput, true(to
delete
the original dir),null)
Thanks
Best Regards
On Sat, Feb 14, 2015 at 2:18 AM, Su She suhsheka...@gmail.com wrote:
Thanks Akhil for the suggestion, it is now only giving me one part -
.
Is there anyway I can just create
at a directory of these files.
On Sat, Feb 14, 2015 at 9:05 PM, Su She suhsheka...@gmail.com wrote:
Thanks Sean and Akhil! I will take out the repartition(1). Please let me
know if I understood this correctly, Spark Streamingwrites data like
this:
foo-1001.csv/part -x, part-x
foo
a repartition before the saveAs*
call.
messages.repartition(1).saveAsHadoopFiles(hdfs://user/ec2-user/,csv,String.class,
String.class, (Class) TextOutputFormat.class);
Thanks
Best Regards
On Fri, Feb 13, 2015 at 11:59 AM, Su She suhsheka...@gmail.com wrote:
Hello Everyone,
I am writing
Hello Everyone,
I am writing simple word counts to hdfs using
messages.saveAsHadoopFiles(hdfs://user/ec2-user/,csv,String.class,
String.class, (Class) TextOutputFormat.class);
1) However, each 2 seconds I getting a new *directory *that is titled as a
csv. So i'll have test.csv, which will be a
it works.
See here: https://github.com/andypetrella/spark-notebook
From: Su She
Date: Thursday, February 12, 2015 at 1:55 AM
To: Felix C
Cc: Kelvin Chu, user@spark.apache.org
Subject: Re: Can spark job server be used to visualize streaming data?
Hello Felix,
I am already streaming
...@hotmail.com wrote:
Checkout
https://databricks.com/blog/2015/01/28/introducing-streaming-k-means-in-spark-1-2.html
In there are links to how that is done.
--- Original Message ---
From: Kelvin Chu 2dot7kel...@gmail.com
Sent: February 10, 2015 12:48 PM
To: Su She suhsheka...@gmail.com
--- Original Message ---
From: Su She suhsheka...@gmail.com
Sent: February 11, 2015 10:23 AM
To: Felix C felixcheun...@hotmail.com
Cc: Kelvin Chu 2dot7kel...@gmail.com, user@spark.apache.org
Subject: Re: Can spark job server be used to visualize streaming data?
Thank you Felix and Kelvin. I
Hello Everyone,
I was reading this blog post:
http://homes.esat.kuleuven.be/~bioiuser/blog/a-d3-visualisation-from-spark-as-a-service/
and was wondering if this approach can be taken to visualize streaming
data...not just historical data?
Thank you!
-Suh
Hello Everyone,
I wanted to hear the community's thoughts on what (open - source) tools
have been used to visualize data from Spark/Spark Streaming? I've taken a
look at Zepellin, but had some trouble working with it.
Couple questions:
1) I've looked at a couple blog posts and it seems like
Hello Everyone,
A bit confused on this one...I have set up the KafkaWordCount found here:
https://github.com/apache/spark/blob/master/examples/scala-2.10/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java
Everything runs fine when I run it using this on instance A:
in that case)
On 27 Jan 2015 07:25, Su She suhsheka...@gmail.com wrote:
Hello Sean and Akhil,
I shut down the services on Cloudera Manager. I shut them down in the
appropriate order and then stopped all services of CM. I then shut down my
instances. I then turned my instances back on, but I
the steps correctly across your whole cluster.
I'm not sure if the stock stop-all.sh script is supposed to work.
Certainly, if you are using CM, by far the easiest is to start/stop
all of these things in CM.
On Wed, Jan 21, 2015 at 6:08 PM, Su She suhsheka...@gmail.com wrote:
Hello Sean Akhil
do) and then shutdown the machines.
You can execute the following command to disable safe mode:
hadoop fs -safemode leave
Thanks
Best Regards
On Sat, Jan 17, 2015 at 8:31 AM, Su She suhsheka...@gmail.com wrote:
Hello Everyone,
I am encountering trouble running Spark
Hello Everyone,
I am encountering trouble running Spark applications when I shut down my
EC2 instances. Everything else seems to work except Spark. When I try
running a simple Spark application, like sc.parallelize() I get the message
that hdfs name node is in safemode.
Has anyone else had this
Hello Everyone,
Quick followup, is there any way I can append output to one file rather
then create a new directory/file every X milliseconds?
Thanks!
Suhas Shekar
University of California, Los Angeles
B.A. Economics, Specialization in Computing 2014
On Thu, Jan 8, 2015 at 11:41 PM, Su She
, then you can repartition the data to 1 before saving.
Another way would be to use hadoop's copy merge command/api(available from
2.0 versions)
On 13 Jan 2015 01:08, Su She suhsheka...@gmail.com wrote:
Hello Everyone,
Quick followup, is there any way I can append output to one file rather
yourStream.saveAsNewAPIHadoopFiles(hdfsUrl, /output-location,Text.class,
Text.class, outputFormatClass);
Thanks
Best Regards
On Fri, Jan 9, 2015 at 10:22 AM, Su She suhsheka...@gmail.com wrote:
Yes, I am calling the saveAsHadoopFiles on the Dstream. However, when I
call print on the Dstream it works
calling the saveAsText files on the DStream --looks like it? Look
at the section called Design Patterns for using foreachRDD in the link
you sent -- you want to do dstream.foreachRDD(rdd = rdd.saveAs)
On Thu, Jan 8, 2015 at 5:20 PM, Su She suhsheka...@gmail.com wrote:
Hello Everyone,
Thanks
Hello Everyone,
Thanks in advance for the help!
I successfully got my Kafka/Spark WordCount app to print locally. However,
I want to run it on a cluster, which means that I will have to save it to
HDFS if I want to be able to read the output.
I am running Spark 1.1.0, which means according to
40 matches
Mail list logo