Re: Spark on Tachyon

2014-12-20 Thread Peng Cheng
IMHO: cache doesn't provide redundancy, and its in the same jvm, so its much faster. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-Tachyon-tp1463p20800.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

spark-repl_1.2.0 was not uploaded to central maven repository.

2014-12-20 Thread Peng Cheng
Everything else is there except spark-repl. Can someone check that out this weekend? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-repl-1-2-0-was-not-uploaded-to-central-maven-repository-tp20799.html Sent from the Apache Spark User List mailing list

How to deploy my java code which invokes Spark in Tomcat?

2014-12-20 Thread Tao Lu
Hi, Guys, I have some code which runs will using Spark-Submit command. $SPARK_HOME/bin/spark-submit --class com.myorg.service.SparkService ./Search.jar How can I deploy it to Tomcat? If I simply deploy the jar file, I will get ClassNotFound error. Thanks! -

Re: SchemaRDD to Hbase

2014-12-20 Thread Alex Kamil
I'm using JDBCRDD + Hbase JDBC driver + schemaRDD make sure to use spark 1.2 On Sat, Dec 20

Re: SchemaRDD to Hbase

2014-12-20 Thread Subacini B
Hi , Can someone help me , Any pointers would help. Thanks Subacini On Fri, Dec 19, 2014 at 10:47 PM, Subacini B wrote: > Hi All, > > Is there any API that can be used directly to write schemaRDD to HBase?? > If not, what is the best way to write schemaRDD to HBase. > > Thanks > Subacini >

Re: Downloads from S3 exceedingly slow when running on spark-ec2

2014-12-20 Thread Paul Brown
I would suggest checking out disk IO on the nodes in your cluster and then reading up on the limiting behaviors that accompany different kinds of EC2 storage. Depending on how things are configured for your nodes, you may have a local storage configuration that provides "bursty" IOPS where you get

Re: Downloads from S3 exceedingly slow when running on spark-ec2

2014-12-20 Thread Nicholas Chammas
Is the operation slow every time or does it run normally if you repeat the operation within the same app? Nick On Thu, Dec 18, 2014 at 8:56 AM, Jon Chase wrote: > I'm running a very simple Spark application that downloads files from S3, > does a bit of mapping, then uploads new files. Each fi

Re: EC2 VPC script

2014-12-20 Thread Nicholas Chammas
What version of the script are you running? What did you see in the EC2 web console when this happened? Sometimes instances just don't come up in a reasonable amount of time and you have to kill and restart the process. Does this always happen, or was it just once? Nick On Thu, Dec 18, 2014 at

Re: v1.2.0 (re?)introduces Wrong FS behavior in thriftserver

2014-12-20 Thread Matt Mead
That seemed to correct the issue. Thanks for pointing out the lack of diffs between v1.2.0-rc2 and v1.2.0 -- I'm not sure how my git repo ended up not matching its origin. -matt On Sat, Dec 20, 2014 at 4:25 PM, Matt Mead wrote: > Bizarre. I originally cloned from and have been pulling fro

Re: v1.2.0 (re?)introduces Wrong FS behavior in thriftserver

2014-12-20 Thread Matt Mead
Bizarre. I originally cloned from and have been pulling from https://github.com/apache/spark, and my repo shows the following: user@host:~/development/spark$ git diff v1.2.0-rc2..v1.2.0 | wc -l > 1898 If I pull a fresh clone, I get this: user@host:~$ git clone https://github.com/apache/spark >

Re: v1.2.0 (re?)introduces Wrong FS behavior in thriftserver

2014-12-20 Thread Mark Hamstra
This makes no sense. There is no difference between v1.2.0-rc2 and v1.2.0: https://github.com/apache/spark/compare/v1.2.0-rc2...v1.2.0 On Sat, Dec 20, 2014 at 12:44 PM, Matt Mead wrote: > First, thanks for the efforts and contribution to such a useful software > stack! Spark is great! > > I ha

v1.2.0 (re?)introduces Wrong FS behavior in thriftserver

2014-12-20 Thread Matt Mead
First, thanks for the efforts and contribution to such a useful software stack! Spark is great! I have been using the git tags for v1.2.0-rc1 and v1.2.0-rc2 built as follows: ./make-distribution.sh -Dhadoop.version=2.5.0-cdh5.2.0 > -Dyarn.version=2.5.0-cdh5.2.0 -Phadoop-2.4 -Phive -Pyarn -Phive-

Using "SparkSubmit.main()" to submit SparkContext in web application

2014-12-20 Thread Corey Nolet
I am looking to run a SparkContext in a web application that is outside of my Spark cluster. I understand that I can use the "client" deployment mode and use the spark-submit script that hsipts with Spark but I'm really interested in running this inside of a SpringWeb application that can be starte

Are failures normal / to be expected on an AWS cluster?

2014-12-20 Thread Joe Wass
I have a Spark job running on about 300 GB of log files, on Amazon EC2, with 10 x Large instances (each with 600 GB disk). The job hasn't yet completed. So far, 18 stages have completed (2 of which have retries) and 3 stages have failed. In each failed stage there are ~160 successful tasks, but "C

Re: Including data nucleus tools

2014-12-20 Thread Jakub Dubovsky
Hi DB,   I cherry-picked the commit into branch-1.2 and it solved the problem. It solves the problem but has some bits and pieces around which was not finalized thus reverted beeing late in release process.   Jakub -- "Just out of my curiosity. Do you manually apply this patch and see if t

Interpreting MLLib's linear regression o/p

2014-12-20 Thread Sameer Tilak
Hi All,I use LIBSVM format to specify my input feature vector, which used 1-based index. When I run regression the o/p is 0-indexed based. I have a master lookup file that maps back these indices to what they stand or. However, I need to add offset of 2 and not 1 to the regression outcome during

Re: Hadoop 2.6 compatibility?

2014-12-20 Thread Sean Owen
To clarify clarify, Ted's got the right formula. You would use -Phadoop-2.4 to set up the build configuration, and then customize -Dhadoop.version= further to the 2.4+ version you want. On Sat, Dec 20, 2014 at 12:35 AM, Denny Lee wrote: > To clarify, there isn't a Hadoop 2.6 profile per se but yo