is is an output operator,
> so 'this' DStream will be registered as an output stream and therefore
> materialized.
>
> Change it to a map, foreach or some other form of transform.
>
> HTH
>
> -Todd
>
>
> On Thu, Jul 9, 2015 at 5:24 PM, Su She wrote:
>
Hello All,
I also posted this on the Spark/Datastax thread, but thought it was also
50% a spark question (or mostly a spark question).
I was wondering what is the best practice to saving streaming Spark SQL (
https://github.com/Intel-bigdata/spark-streamingsql/blob/master/src/main/scala/org/apach
g sbin/stop-dfs.sh and
> then sbin/start-dfs.sh
>
> Thanks
> Best Regards
>
> On Tue, Jun 2, 2015 at 5:03 AM, Su She wrote:
>>
>> Hello All,
>>
>> A bit scared I did something stupid...I killed a few PIDs that were
>> listening to ports 2183 (kafka),
Hello All,
A bit scared I did something stupid...I killed a few PIDs that were
listening to ports 2183 (kafka), 4042 (spark app), some of the PIDs
didn't even seem to be stopped as they still are running when i do
lsof -i:[port number]
I'm not sure if the problem started after or before I did th
Hmm, just tried to run it again, but opened the script with python,
the cmd line seemed to pop up really quick and exited.
On Wed, May 13, 2015 at 2:06 PM, Su She wrote:
> Hi Ted, Yes I do have Python 3.5 installed. I just ran "py" from the
> ec2 directory and it started up
I'm trying to set up my own cluster and am having trouble running this script:
./spark-ec2 --key-pair=xx --identity-file=xx.pem --region=us-west-2
--zone=us-west-2c --num-slaves=1 launch my-spark-cluster
based off: https://spark.apache.org/docs/latest/ec2-scripts.html
It just tries to open the s
!
On Mon, Apr 27, 2015 at 11:48 AM, Su She wrote:
> Hello Xiangrui,
>
> I am using this spark-submit command (as I do for all other jobs):
>
> /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/bin/spark-submit
> --class MLlib --master local[2] --jars $(echo
> /ho
e/ec2-user/sparkApps/learning-spark/target/simple-project-1.1.jar
Thank you for the help!
Best,
Su
On Mon, Apr 27, 2015 at 9:58 AM, Xiangrui Meng wrote:
> How did you run the example app? Did you use spark-submit? -Xiangrui
>
> On Thu, Apr 23, 2015 at 2:27 PM, Su She wrote:
>>
Sorry, accidentally sent the last email before finishing.
I had asked this question before, but wanted to ask again as I think
it is now related to my pom file or project setup. Really appreciate the help!
I have been trying on/off for the past month to try to run this MLlib
example:
https://git
I had asked this question before, but wanted to ask again as I think
it is related to my pom file or project setup.
I have been trying on/off for the past month to try to run this MLlib example:
-
To unsubscribe, e-mail: user-uns
Hello Everyone,
I am trying to implement this example (Spark Streaming with Twitter).
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/TwitterPopularTags.scala
I am able to do:
hashTags.print() to get a live stream of filtered hashtags, but
tly.
>
> On Mon, Mar 30, 2015 at 10:16 AM, Xiangrui Meng wrote:
>>
>> +Holden, Joseph
>>
>> It seems that there is something wrong with the sample data file:
>> https://github.com/databricks/learning-spark/blob/master/files/ham.txt
>>
>> -Xiangrui
>>
> > Thanks
> > Best Regards
> >
> > On Thu, Mar 19, 2015 at 1:15 PM, Su She wrote:
> >>
> >> Hi Akhil,
> >>
> >> 1) How could I see how much time it is spending on stage 1? Or what if,
> >> like above, it doesn't get past stag
on Stage 1.
> See if its a GC time, then try increasing the level of parallelism or
> repartition it like sc.getDefaultParallelism*3.
>
> Thanks
> Best Regards
>
> On Thu, Mar 19, 2015 at 12:15 PM, Su She wrote:
>
>> Hello Everyone,
>>
>> I am trying to ru
Hello Everyone,
I am trying to run this MLlib example from Learning Spark:
https://github.com/databricks/learning-spark/blob/master/src/main/scala/com/oreilly/learningsparkexamples/scala/MLlib.scala#L48
Things I'm doing differently:
1) Using spark shell instead of an application
2) instead of t
Hello,
So actually solved the problem...see point 3.
Here are a few approaches/errors I was getting:
1) mvn package exec:java -Dexec.mainClass=HelloWorld
Error: java.lang.ClassNotFoundException: HelloWorld
2)
http://stackoverflow.com/questions/26929100/running-a-scala-application-in-maven-pro
Hello Everyone,
I am trying to run the Word Count from here:
https://github.com/holdenk/learning-spark-examples/blob/master/mini-complete-example/src/main/scala/com/oreilly/learningsparkexamples/mini/scala/WordCount.scala
I was able to successfully run the app using SBT, but not Maven. I don't
se
specify these jars (joda-time-2.7.jar, joda-convert-1.7.jar)
> either as part of your build and assembly or via the --jars option to
> spark-submit.
>
> HTH.
>
> On Fri, Feb 27, 2015 at 2:48 PM, Su She wrote:
>
>> Hello Everyone,
>>
>> I'm having some is
Hello Everyone,
I'm having some issues launching (non-spark) applications via the
spark-submit commands. The common error I am getting is c/p below. I am
able to submit a spark streaming/kafka spark application, but can't start a
dynamoDB java app. The common error is related to joda-time.
1) I r
ectory of these files.
>
> On Sat, Feb 14, 2015 at 9:05 PM, Su She wrote:
> > Thanks Sean and Akhil! I will take out the repartition(1). Please let me
> > know if I understood this correctly, Spark Streamingwrites data like
> this:
> >
> > foo-1001.csv/part -x
til.copyMerge(FileSystem of source(hdfs), /output-location,
> FileSystem
> > of destination(hdfs), Path to the merged files /merged-ouput, true(to
> delete
> > the original dir),null)
> >
> >
> >
> > Thanks
> > Best Regards
> >
> > On Sat, Fe
http://stackoverflow.com/questions/23527941/how-to-write-to-csv-in-spark
Just read this...seems like it should be easily readable. Thanks!
On Sat, Feb 14, 2015 at 1:36 AM, Su She wrote:
> Thanks Akhil for the link. Is there a reason why there is a new directory
> created for each bat
27;s and example for doing
> that https://issues.apache.org/jira/browse/SPARK-944
>
> Thanks
> Best Regards
>
> On Sat, Feb 14, 2015 at 2:55 PM, Su She wrote:
>
>> Hello Akhil, thank you for your continued help!
>>
>> 1) So, if I can write it in programitically aft
rue(to delete the original dir),null)
>
>
>
> Thanks
> Best Regards
>
> On Sat, Feb 14, 2015 at 2:18 AM, Su She wrote:
>
>> Thanks Akhil for the suggestion, it is now only giving me one part -
>> . Is there anyway I can just create a file rather than a dire
artition before the saveAs*
> call.
>
> messages.repartition(1).saveAsHadoopFiles("hdfs://user/ec2-user/","csv",String.class,
> String.class, (Class) TextOutputFormat.class);
>
>
> Thanks
> Best Regards
>
> On Fri, Feb 13, 2015 at 11:59 AM, Su She
Hello Everyone,
I am writing simple word counts to hdfs using
messages.saveAsHadoopFiles("hdfs://user/ec2-user/","csv",String.class,
String.class, (Class) TextOutputFormat.class);
1) However, each 2 seconds I getting a new *directory *that is titled as a
csv. So i'll have test.csv, which will be
dating graphs
>> periodically. I haven’t used it myself yet so not sure how well it works.
>> See here: https://github.com/andypetrella/spark-notebook
>>
>> From: Su She
>> Date: Thursday, February 12, 2015 at 1:55 AM
>> To: Felix C
>> Cc: Kelvin Chu, &qu
-receivers
>
> --- Original Message ---
>
> From: "Su She"
> Sent: February 11, 2015 10:23 AM
> To: "Felix C"
> Cc: "Kelvin Chu" <2dot7kel...@gmail.com>, user@spark.apache.org
> Subject: Re: Can spark job server be used to visualize strea
wrote:
> Checkout
>
> https://databricks.com/blog/2015/01/28/introducing-streaming-k-means-in-spark-1-2.html
>
> In there are links to how that is done.
>
>
> --- Original Message ---
>
> From: "Kelvin Chu" <2dot7kel...@gmail.com>
> Sent: February 10, 201
Hello Everyone,
I was reading this blog post:
http://homes.esat.kuleuven.be/~bioiuser/blog/a-d3-visualisation-from-spark-as-a-service/
and was wondering if this approach can be taken to visualize streaming
data...not just historical data?
Thank you!
-Suh
Hello Everyone,
I wanted to hear the community's thoughts on what (open - source) tools
have been used to visualize data from Spark/Spark Streaming? I've taken a
look at Zepellin, but had some trouble working with it.
Couple questions:
1) I've looked at a couple blog posts and it seems like spar
Hello Everyone,
A bit confused on this one...I have set up the KafkaWordCount found here:
https://github.com/apache/spark/blob/master/examples/scala-2.10/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java
Everything runs fine when I run it using this on instance A: reposito
( mostly, it binds to localhost in that case)
> On 27 Jan 2015 07:25, "Su She" wrote:
>
>> Hello Sean and Akhil,
>>
>> I shut down the services on Cloudera Manager. I shut them down in the
>> appropriate order and then stopped all services of CM. I then shu
if you do the steps correctly across your whole cluster.
> I'm not sure if the stock stop-all.sh script is supposed to work.
> Certainly, if you are using CM, by far the easiest is to start/stop
> all of these things in CM.
>
> On Wed, Jan 21, 2015 at 6:08 PM, Su She wrote:
&g
e command for shutting down storage or can I simply stop
hdfs in Cloudera Manager?
Thank you for the help!
On Sat, Jan 17, 2015 at 12:58 PM, Su She wrote:
> Thanks Akhil and Sean for the responses.
>
> I will try shutting down spark, then storage and then the instances.
> Initia
> > stop-all.sh would do) and then shutdown the machines.
> >
> > You can execute the following command to disable safe mode:
> >
> >> hadoop fs -safemode leave
> >
> >
> >
> > Thanks
> > Best Regards
> >
> > On Sat, Jan 17, 201
Hello Everyone,
I am encountering trouble running Spark applications when I shut down my
EC2 instances. Everything else seems to work except Spark. When I try
running a simple Spark application, like sc.parallelize() I get the message
that hdfs name node is in safemode.
Has anyone else had this i
on the data to 1 before saving.
> Another way would be to use hadoop's copy merge command/api(available from
> 2.0 versions)
> On 13 Jan 2015 01:08, "Su She" wrote:
>
>> Hello Everyone,
>>
>> Quick followup, is there any way I can append output to one fi
Hello Everyone,
Quick followup, is there any way I can append output to one file rather
then create a new directory/file every X milliseconds?
Thanks!
Suhas Shekar
University of California, Los Angeles
B.A. Economics, Specialization in Computing 2014
On Thu, Jan 8, 2015 at 11:41 PM, Su She
; yourStream.saveAsNewAPIHadoopFiles(hdfsUrl, "/output-location",Text.class,
> Text.class, outputFormatClass);
>
>
>
> Thanks
> Best Regards
>
> On Fri, Jan 9, 2015 at 10:22 AM, Su She wrote:
>
>> Yes, I am calling the saveAsHadoopFiles on the Dstream. How
ext files on the DStream --looks like it? Look
> at the section called "Design Patterns for using foreachRDD" in the link
> you sent -- you want to do dstream.foreachRDD(rdd => rdd.saveAs)
>
> On Thu, Jan 8, 2015 at 5:20 PM, Su She wrote:
>
>> Hello Everyone,
Hello Everyone,
Thanks in advance for the help!
I successfully got my Kafka/Spark WordCount app to print locally. However,
I want to run it on a cluster, which means that I will have to save it to
HDFS if I want to be able to read the output.
I am running Spark 1.1.0, which means according to th
42 matches
Mail list logo