Hi,
I'm using IntelliJ ide for my spark project.
I've compiled spark 1.3.0 for scala 2.11.4 and here's the one of the
compiled jar installed in my m2 folder :
~/.m2/repository/org/apache/spark/spark-core_2.11/1.3.0/spark-core_2.11-1.3.0.jar
But when I add this dependency in my pom file for the
On 19 Jun 2015, at 16:48, Sea 261810...@qq.commailto:261810...@qq.com wrote:
Hi, all:
I run spark on yarn, I want to see the Jobs UI http://ip:4040/,
but it redirect to http://${yarn.ip}/proxy/application_1428110196022_924324/
which can not be found. Why?
Anyone can help?
whenever you point
How to know that In stream Processing over the cluster of 8 machines
all the machines/woker nodes are being used (my cluster have 8 slaves )
.
--
Thanks Regards,
Anshu Shukla
On 17 Jun 2015, at 19:10, jcai jonathon@yale.edu wrote:
Hi,
I am running this on Spark stand-alone mode. I find that when I examine the
web UI, a couple bugs arise:
1. There is a discrepancy between the number denoting the duration of the
application when I run the history server
Not sure, but try removing the provided or create a lib directory in the
project home and bring that jar over there.
On 20 Jun 2015 18:08, Ritesh Kumar Singh riteshoneinamill...@gmail.com
wrote:
Hi,
I'm using IntelliJ ide for my spark project.
I've compiled spark 1.3.0 for scala 2.11.4 and
Hi,
In Spark SQL JDBC data source there is an option to specify upper/lower
bound and num of partitions. How Spark handles data distribution, if we do
not give the upper/lower/num of parititons ? Will all data from the
external data source skewed up in one executor?
In many situations, we do not
Is velox NOT open source?
On Saturday, June 20, 2015, Debasish Das debasish.da...@gmail.com wrote:
Hi,
The demo of end-to-end ML pipeline including the model server component at
Spark Summit was really cool.
I was wondering if the Model Server component is based upon Velox or it
uses a
It is a simple Play-based web application. It exposes an URI for submitting a
SQL query. It then executes that query using CassandraSQLContext provided by
Spark Cassandra Connector. Since it is web-based, I added an authentication and
authorization layer to make sure that only users with the
Can any one help i am getting below error when i try to start the History
Server
I do not see any org.apache.spark.deploy.yarn.history.pakage inside the
assembly jar not sure how to get that
java.lang.ClassNotFoundException:
org.apache.spark.deploy.yarn.history.YarnHistoryProvider
Thanks,
Are you sure you were using all 100 executors even with the receiver model?
Because in receiver mode, the number of partitions is dependent on the batch
duration and block interval. It may not necessarily map directly to the number
of executors in your app unless you've adjusted the block
Any suggestions please ..!!
How to know that In stream Processing over the cluster of 8 machines
all the machines/woker nodes are being used (my cluster have 8 slaves )
.
I am submitting job from master itself over the ec-2 cluster crated by the
ec-2 scripts available with spark. But i am
Yes, finally solved. It was there in front of my eyes all time.
Thanks a lot Pete.
Hi Debasish,
The Oryx project (https://github.com/cloudera/oryx), which is Apache 2
licensed, contains a model server that can serve models built with MLlib.
-Sandy
On Sat, Jun 20, 2015 at 8:00 AM, Charles Earl charles.ce...@gmail.com
wrote:
Is velox NOT open source?
On Saturday, June 20,
Hello;
I am fitting ALS models and would like to get an initial idea of the number
of factors.I wan tot use the reconstruction error on train data as a
measure. Does the API expose the reconstruction error ?
Thanks
Ayman
--
View this message in context:
Hi Mohammad
Can you provide more info about the Service u developed
On Jun 20, 2015 7:59 AM, Mohammed Guller moham...@glassbeam.com wrote:
Hi Matthew,
It looks fine to me. I have built a similar service that allows a user to
submit a query from a browser and returns the result in JSON
Hi Roberto,
I'm not an EMR person, but it looks like option -h is deploying the necessary
dataneucleus JARs for you.The req for HiveContext is the hive-site.xml and
dataneucleus JARs. As long as these 2 are there, and Spark is compiled with
-Phive, it should work.
spark-shell runs in
It looks like you are using parens instead of curly braces on scala.version
On Jun 20, 2015, at 8:38 AM, Ritesh Kumar Singh
riteshoneinamill...@gmail.com wrote:
Hi,
I'm using IntelliJ ide for my spark project.
I've compiled spark 1.3.0 for scala 2.11.4 and here's the one of the
Hi,
The demo of end-to-end ML pipeline including the model server component at
Spark Summit was really cool.
I was wondering if the Model Server component is based upon Velox or it
uses a completely different architecture.
https://github.com/amplab/velox-modelserver
We are looking for an open
We worked it out. There was multiple items (like location of remote metastore
and db user auth) to make HiveContext happy in yarn-cluster mode.
For reference
https://github.com/awslabs/emr-bootstrap-actions/blob/master/spark/examples/using-hivecontext-yarn-cluster.md
-Christopher Bozeman
On
Mind if I ask what 1.3/1.4 ML features that you are looking for?
On Saturday, June 20, 2015, Debasish Das debasish.da...@gmail.com wrote:
After getting used to Scala, writing Java is too much work :-)
I am looking for scala based project that's using netty at its core (spray
is one example).
Integration of model server with ML pipeline API.
On Sat, Jun 20, 2015 at 12:25 PM, Donald Szeto don...@prediction.io wrote:
Mind if I ask what 1.3/1.4 ML features that you are looking for?
On Saturday, June 20, 2015, Debasish Das debasish.da...@gmail.com wrote:
After getting used to
I got the same problem when I upgrade from 1.3.1 to 1.4.
The same Conf has been used, 1.3 works, but 1.4UI does not work.
So I added the
property
nameyarn.resourcemanager.webapp.address/name
value:8088/value
/property
property
nameyarn.resourcemanager.hostname/name
I confirm,
Christopher was very kind helping me out here. The solution presented in the
linked doc worked perfectly. IMO it should be linked in the official Spark
documentation.
Thanks again,
Roberto
On 20 Jun 2015, at 19:25, Bozeman, Christopher bozem...@amazon.com wrote:
We worked it
Oops, that link was for Oryx 1. Here's the repo for Oryx 2:
https://github.com/OryxProject/oryx
On Sat, Jun 20, 2015 at 10:20 AM, Sandy Ryza sandy.r...@cloudera.com
wrote:
Hi Debasish,
The Oryx project (https://github.com/cloudera/oryx), which is Apache 2
licensed, contains a model server
After getting used to Scala, writing Java is too much work :-)
I am looking for scala based project that's using netty at its core (spray
is one example).
prediction.io is an option but that also looks quite complicated and not
using all the ML features that got added in 1.3/1.4
Velox built on
Hey Andrew,
I tried the following approach: I modified my Spark build on my local machine.
I did downloaded the Spark 1.4.0 src code and then made a change to
ResultTask.scala( I made a simple change to see if it work. I added a print
statement). Now, I built spark using
mvn
If you use rdd.mapPartitions(), you'll be able to get a hold of the
iterators for each partiton. Then you should be able to do
iterator.grouped(size) on each of the partitions. I think it may mean you
have 1 element at the end of each partition that may have less than size
elements. If that's okay
No it does not. By default, only after all the retries etc related to batch
X is done, then batch X+1 will be started.
Yes, one RDD per batch per DStream. However, the RDD could be a union of
multiple RDDs (e.g. RDDs generated by windowed DStream, or unioned
DStream).
TD
On Fri, Jun 19, 2015 at
Hi everyone,
I'm trying to use the logstash-logback-encoder
https://github.com/logstash/logstash-logback-encoder in my spark jobs but
I'm having some problems with the Spark classloader. The
logstash-logback-encoder uses a special version of the slf4j BasicMarker
Thank you very much for confirmation.
On 20 June 2015 at 17:21, Tathagata Das t...@databricks.com wrote:
No it does not. By default, only after all the retries etc related to
batch X is done, then batch X+1 will be started.
Yes, one RDD per batch per DStream. However, the RDD could be a
How would you do a .grouped(10) on a RDD, is it possible? Here is an
example for a Scala list
scala List(1,2,3,4).grouped(2).toList
res1: List[List[Int]] = List(List(1, 2), List(3, 4))
Would like to group n elements.
Hey,
I am testing the StreamingLinearRegressionWithSGD following the tutorial.
It works, but I could not output the prediction results. I tried the
saveAsTextFile, but it only output _SUCCESS to the folder.
I am trying to check the prediction results and use
BinaryClassificationMetrics to get
Hi,
I am loading data from Hive table to Hbase after doing some manipulation.
I am getting error as 'Task not Serializable'.
My code is as below.
public class HiveToHbaseLoader implements Serializable {
public static void main(String[] args) throws Exception {
String
33 matches
Mail list logo