Building Spark 1.2 jmx and jmxtools issue?

2014-12-29 Thread John Omernik
I am trying to do the sbt assembly for spark 1.2 sbt/sbt -Pmapr4 -Phive -Phive-thriftserver assembly and I am getting the errors below. Any thoughts? Thanks in advance! [warn] :: [warn] :: FAILED DOWNLOADS:: [warn] :: ^ see

re: How to incrementally compile spark examples using mvn

2014-11-24 Thread Yiming (John) Zhang
, On Wed, Nov 19, 2014 at 5:35 PM, Yiming (John) Zhang sdi...@gmail.com wrote: Thank you for your reply. I was wondering whether there is a method of reusing locally-built components without installing them? That is, if I have successfully built the spark project as a whole, how should I configure

re: How to incrementally compile spark examples using mvn

2014-11-19 Thread Yiming (John) Zhang
are compiling all modules at once. If you want to compile everything and reuse the local artifacts later, you need 'install' not 'package'. On Mon, Nov 17, 2014 at 12:27 AM, Yiming (John) Zhang sdi...@gmail.com wrote: Thank you Marcelo. I tried your suggestion (# mvn -pl :spark-examples_2.10 compile

re: How to incrementally compile spark examples using mvn

2014-11-16 Thread Yiming (John) Zhang
at 5:31 PM, Yiming (John) Zhang sdi...@gmail.com wrote: Hi, I have already successfully compile and run spark examples. My problem is that if I make some modifications (e.g., on SparkPi.scala or LogQuery.scala) I have to use “mvn -DskipTests package” to rebuild the whole spark project

How to incrementally compile spark examples using mvn

2014-11-15 Thread Yiming (John) Zhang
Hi, I have already successfully compile and run spark examples. My problem is that if I make some modifications (e.g., on SparkPi.scala or LogQuery.scala) I have to use mvn -DskipTests package to rebuild the whole spark project and wait a relatively long time. I also tried mvn scala:cc as

Re: Spark and Play

2014-11-11 Thread John Meehan
You can also build a Play 2.2.x + Spark 1.1.0 fat jar with sbt-assembly for, e.g. yarn-client support or using with spark-shell for debugging: play.Project.playScalaSettings libraryDependencies ~= { _ map { case m if m.organization == com.typesafe.play = m.exclude(commons-logging,

How to read BZ2 XML file in Spark?

2014-10-21 Thread John Roberts
://wiki.openstreetmap.org/wiki/Planet.osm http://wiki.openstreetmap.org/wiki/OSM_XML John -- John S. Roberts SigInt Technologies LLC, a Novetta Solutions Company 8830 Stanford Blvd, Suite 306; Columbia, MD 21045 - To unsubscribe, e-mail

Re: SparkSQL Thriftserver in Mesos

2014-09-22 Thread John Omernik
Any thoughts on this? On Sat, Sep 20, 2014 at 12:16 PM, John Omernik j...@omernik.com wrote: I am running the Thrift server in SparkSQL, and running it on the node I compiled spark on. When I run it, tasks only work if they landed on that node, other executors started on nodes I didn't

SparkSQL Thriftserver in Mesos

2014-09-20 Thread John Omernik
I am running the Thrift server in SparkSQL, and running it on the node I compiled spark on. When I run it, tasks only work if they landed on that node, other executors started on nodes I didn't compile spark on (and thus don't have the compile directory) fail. Should spark be distributed

apply at Option.scala:120 callback in Spark 1.1, but no user code involved?

2014-09-15 Thread John Salvatier
In Spark 1.1, I'm seeing tasks with callbacks that don't involve my code at all! I'd seen something like this before in 1.0.0, but the behavior seems to be back apply at Option.scala:120 http://localhost:4040/stages/stage?id=52attempt=0

Potential Thrift Server Bug on Spark SQL,perhaps with cache table?

2014-08-20 Thread John Omernik
, if I don't cache the table through cache table table1 in thrift, I get results for all queries. If I uncache, I start getting results again. I hope I was clear enough here, I am happy to help however I can. John

setCallSite for API backtraces not showing up in logs?

2014-08-18 Thread John Salvatier
What's the correct way to use setCallSite to get the change to show up in the spark logs? I have something like class RichRDD (rdd : RDD[MyThing]) { def mySpecialOperation() { rdd.context.setCallSite(bubbles and candy!) rdd.map() val result = rdd.groupBy()

Re: Spark SQL JDBC

2014-08-12 Thread John Omernik
not just build the thrift server in? (I am not a programming expert, and not trying to judge the decision to have it in a separate profile, I would just like to understand why it'd done that way) On Mon, Aug 11, 2014 at 11:47 AM, Cheng Lian lian.cs@gmail.com wrote: Hi John, the JDBC Thrift

Spark SQL Thrift Server

2014-08-05 Thread John Omernik
I gave things working on my cluster with the sparksql thrift server. (Thank you Yin Huai at Databricks!) That said, I was curious how I can cache a table via my instance here? I tried the shark like create table table_cached as select * from table and that did not create a cached table.

Re: Spark SQL Thrift Server

2014-08-05 Thread John Omernik
. On Tue, Aug 5, 2014 at 9:02 AM, John Omernik j...@omernik.com wrote: I gave things working on my cluster with the sparksql thrift server. (Thank you Yin Huai at Databricks!) That said, I was curious how I can cache a table via my instance here? I tried the shark like create table table_cached

Spark SQL JDBC

2014-08-04 Thread John Omernik
I am using spark-1.1.0-SNAPSHOT right now and trying to get familiar with the JDBC thrift server. I have everything compiled correctly, I can access data in spark-shell on yarn from my hive installation. Cached tables, etc all work. When I execute ./sbin/start-thriftserver.sh I get the error

Re: Is there a way to write spark RDD to Avro files

2014-07-30 Thread Lewis John Mcgibbney
Hi, Have you checked out SchemaRDD? There should be an examp[le of writing to Parquet files there. BTW, FYI I was discussing this with the SparlSQL developers last week and possibly using Apache Gora [0] for achieving this. HTH Lewis [0] http://gora.apache.org On Wed, Jul 30, 2014 at 5:14 AM,

SPARK OWLQN Exception: Iteration Stage is so slow

2014-07-29 Thread John Wu
. And there are so many info log from stdout like this: BlockManagerMasterActor$BlockManagerInfo: Removed taskresult_xxx on SHXJ-Hx-HBxxx:44126 in memory Thank you. John Wu 晶赞广告(上海)有限公司 Zamplus Advertising (Shanghai) Co., Ltd. Tel: +8621-6076 0818 Ext. 885 Fax: +8621-6076 0812 Mobile: +86

Re: Configuring Spark Memory

2014-07-24 Thread John Omernik
SO this is good information for standalone, but how is memory distributed within Mesos? There's coarse grain mode where the execute stays active, or theres fine grained mode where it appears each task is it's only process in mesos, how to memory allocations work in these cases? Thanks! On Thu,

Spark on Yarn: Connecting to Existing Instance

2014-07-09 Thread John Omernik
I am trying to get my head around using Spark on Yarn from a perspective of a cluster. I can start a Spark Shell no issues in Yarn. Works easily. This is done in yarn-client mode and it all works well. In multiple examples, I see instances where people have setup Spark Clusters in Stand Alone

Re: Spark on Yarn: Connecting to Existing Instance

2014-07-09 Thread John Omernik
9, 2014, at 8:31 AM, John Omernik j...@omernik.com wrote: I am trying to get my head around using Spark on Yarn from a perspective of a cluster. I can start a Spark Shell no issues in Yarn. Works easily. This is done in yarn-client mode and it all works well. In multiple examples, I see

Re: Spark on Yarn: Connecting to Existing Instance

2014-07-09 Thread John Omernik
, Jul 9, 2014 at 12:41 PM, John Omernik j...@omernik.com wrote: Thank you for the link. In that link the following is written: For those familiar with the Spark API, an application corresponds to an instance of the SparkContext class. An application can be used for a single batch job

Re: Spark on Yarn: Connecting to Existing Instance

2014-07-09 Thread John Omernik
want to write a Spark application that fires off jobs on behalf of remote processes, you would need to implement the communication between those remote processes and your Spark application code yourself. On Wed, Jul 9, 2014 at 10:41 AM, John Omernik j...@omernik.com wrote: Thank you

Re: Re: spark table to hive table

2014-07-01 Thread John Omernik
Michael - Does Spark SQL support rlike and like yet? I am running into that same error with a basic select * from table where field like '%foo%' using the hql() funciton. Thanks On Wed, May 28, 2014 at 2:22 PM, Michael Armbrust mich...@databricks.com wrote: On Tue, May 27, 2014 at 6:08 PM,

Streaming aggregation

2014-06-24 Thread john levingston
I have a use case where I cannot figure out the spark streaming way to do it. Given two kafka topics corresponding to two different types of events A and B. For each element from topic A correspond an element from topic B. Unfortunately elements can arrive separately by hours. The aggregation

Re: Spark is now available via Homebrew

2014-06-18 Thread Sheryl John
Cool. Looked at the Pull Requests, the upgrade to 1.0.0 was just merged yesterday. https://github.com/Homebrew/homebrew/pull/30231 https://github.com/Homebrew/homebrew/blob/master/Library/Formula/apache-spark.rb On Wed, Jun 18, 2014 at 1:57 PM, Matei Zaharia matei.zaha...@gmail.com wrote:

Re: Why Scala?

2014-06-04 Thread John Omernik
So Python is used in many of the Spark Ecosystem products, but not Streaming at this point. Is there a roadmap to include Python APIs in Spark Streaming? Anytime frame on this? Thanks! John On Thu, May 29, 2014 at 4:19 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Quite a few people ask

Re: Why Scala?

2014-06-04 Thread John Omernik
Python integration/support, if possible, would be a home run. On Wed, Jun 4, 2014 at 7:06 PM, Matei Zaharia matei.zaha...@gmail.com wrote: We are definitely investigating a Python API for Streaming, but no announced deadline at this point. Matei On Jun 4, 2014, at 5:02 PM, John Omernik j

Better line number hints for logging?

2014-06-03 Thread John Salvatier
I have created some extension methods for RDDs in RichRecordRDD and these are working exceptionally well for me. However, when looking at the logs, its impossible to tell what's going on because all the line number hints point to RichRecordRDD.scala rather than the code that uses it. For example:

Re: Announcing Spark 1.0.0

2014-05-30 Thread John Omernik
All: In the pom.xml file I see the MapR repository, but it's not included in the ./project/SparkBuild.scala file. Is this expected? I know to build I have to add it there otherwise sbt hates me with evil red messages and such. John On Fri, May 30, 2014 at 6:24 AM, Kousuke Saruta saru

Re: Announcing Spark 1.0.0

2014-05-30 Thread John Omernik
wondered if there were other options I should consider before building. Thanks! On Fri, May 30, 2014 at 6:52 AM, John Omernik j...@omernik.com wrote: All: In the pom.xml file I see the MapR repository, but it's not included in the ./project/SparkBuild.scala file. Is this expected? I know

Re: spark table to hive table

2014-05-27 Thread John Omernik
Did you try the Hive Context? Look under Hive Support here: http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html On Tue, May 27, 2014 at 2:09 AM, 정재부 itsjb.j...@samsung.com wrote: Hi all, I'm trying to compare functions available in Spark1.0 hql to original

proximity of events within the next group of events instead of time

2014-05-27 Thread Navarro, John
Hi, Spark newbie here with a general question In a stream consisting of several types of events, how can I detect if event X happened within Z transactions of event Y? is it just a matter of iterating thru all the RDDs, when event type Y found, take the next Z transactions and check if

Re: Running out of memory Naive Bayes

2014-04-26 Thread John King
I'm just wondering are the SparkVector calculations really taking into account the sparsity or just converting to dense? On Fri, Apr 25, 2014 at 10:06 PM, John King usedforprinting...@gmail.comwrote: I've been trying to use the Naive Bayes classifier. Each example in the dataset is about 2

Running out of memory Naive Bayes

2014-04-25 Thread John King
I've been trying to use the Naive Bayes classifier. Each example in the dataset is about 2 million features, only about 20-50 of which are non-zero, so the vectors are very sparse. I keep running out of memory though, even for about 1000 examples on 30gb RAM while the entire dataset is 4 million

Spark mllib throwing error

2014-04-24 Thread John King
./spark-shell: line 153: 17654 Killed $FWDIR/bin/spark-class org.apache.spark.repl.Main $@ Any ideas?

Re: Spark mllib throwing error

2014-04-24 Thread John King
Last command was: val model = new NaiveBayes().run(points) On Thu, Apr 24, 2014 at 4:27 PM, Xiangrui Meng men...@gmail.com wrote: Could you share the command you used and more of the error message? Also, is it an MLlib specific problem? -Xiangrui On Thu, Apr 24, 2014 at 11:49 AM, John King

Re: Trying to use pyspark mllib NaiveBayes

2014-04-24 Thread John King
, Apr 24, 2014 at 11:38 AM, John King usedforprinting...@gmail.com wrote: I receive this error: Traceback (most recent call last): File stdin, line 1, in module File /home/ubuntu/spark-1.0.0-rc2/python/pyspark/mllib/classification.py, line 178, in train ans = sc

Re: Deploying a python code on a spark EC2 cluster

2014-04-24 Thread John King
the hostnames, this should not happen. Matei On Apr 24, 2014, at 11:36 AM, John King usedforprinting...@gmail.com wrote: Same problem. On Thu, Apr 24, 2014 at 10:54 AM, Shubhabrata mail2shu...@gmail.comwrote: Moreover it seems all the workers are registered and have sufficient memory

Re: Trying to use pyspark mllib NaiveBayes

2014-04-24 Thread John King
PM, Xiangrui Meng men...@gmail.com wrote: I tried locally with the example described in the latest guide: http://54.82.157.211:4000/mllib-naive-bayes.html , and it worked fine. Do you mind sharing the code you used? -Xiangrui On Thu, Apr 24, 2014 at 1:57 PM, John King usedforprinting

Re: Spark mllib throwing error

2014-04-24 Thread John King
) points.cache() val model = new NaiveBayes().run(points) On Thu, Apr 24, 2014 at 6:57 PM, Xiangrui Meng men...@gmail.com wrote: Do you mind sharing more code and error messages? The information you provided is too little to identify the problem. -Xiangrui On Thu, Apr 24, 2014 at 1:55 PM, John King

Re: Trying to use pyspark mllib NaiveBayes

2014-04-24 Thread John King
Also when will the official 1.0 be released? On Thu, Apr 24, 2014 at 7:04 PM, John King usedforprinting...@gmail.comwrote: I was able to run simple examples as well. Which version of Spark? Did you use the most recent commit or from branch-1.0? Some background: I tried to build both

Re: Spark mllib throwing error

2014-04-24 Thread John King
examples you have? Also, make sure you don't have negative feature values. The error message you sent did not say NaiveBayes went wrong, but the Spark shell was killed. -Xiangrui On Thu, Apr 24, 2014 at 4:05 PM, John King usedforprinting...@gmail.com wrote: In the other thread I had an issue

Re: Spark is slow

2014-04-21 Thread John Meagher
Yahoo made some changes that drive mailing list posts into spam folders: http://www.virusbtn.com/blog/2014/04_15.xml On Mon, Apr 21, 2014 at 2:50 PM, Marcelo Vanzin van...@cloudera.com wrote: Hi Joe, On Mon, Apr 21, 2014 at 11:23 AM, Joe L selme...@yahoo.com wrote: And, I haven't gotten any

How are exceptions in map functions handled in Spark?

2014-04-04 Thread John Salvatier
, right now we're seeing the task just re-tried over and over again in an infinite loop because there's a value that always generates an exception. John

Re: How are exceptions in map functions handled in Spark?

2014-04-04 Thread John Salvatier
On Apr 4, 2014, at 10:40 AM, John Salvatier jsalvat...@gmail.com wrote: I'm trying to get a clear idea about how exceptions are handled in Spark? Is there somewhere where I can read about this? I'm on spark .7 For some reason I was under the impression that such exceptions are swallowed

Re: How are exceptions in map functions handled in Spark?

2014-04-04 Thread John Salvatier
Btw, thank you for your help. On Fri, Apr 4, 2014 at 11:49 AM, John Salvatier jsalvat...@gmail.comwrote: Is there a way to log exceptions inside a mapping function? logError and logInfo seem to freeze things. On Fri, Apr 4, 2014 at 11:02 AM, Matei Zaharia matei.zaha...@gmail.comwrote

<    1   2