zinc invocation examples

2014-12-04 Thread Nicholas Chammas
https://github.com/apache/spark/blob/master/docs/building-spark.md#speeding-up-compilation-with-zinc Could someone summarize how they invoke zinc as part of a regular build-test-etc. cycle? I'll add it in to the aforelinked page if appropriate. Nick

Re: [Thrift,1.2 RC] what happened to parquet.hive.serde.ParquetHiveSerDe

2014-12-04 Thread Michael Armbrust
Thanks for reporting. As a workaround you should be able to SET spark.sql.hive.convertMetastoreParquet=false, but I'm going to try to fix this before the next RC. On Wed, Dec 3, 2014 at 7:09 AM, Yana Kadiyska wrote: > Thanks Michael, you are correct. > > I also opened https://issues.apache.org/j

Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-04 Thread Xiangrui Meng
Krishna, could you send me some code snippets for the issues you saw in naive Bayes and k-means? -Xiangrui On Sun, Nov 30, 2014 at 6:49 AM, Krishna Sankar wrote: > +1 > 1. Compiled OSX 10.10 (Yosemite) mvn -Pyarn -Phadoop-2.4 > -Dhadoop.version=2.4.0 -DskipTests clean package 16:46 min (slightly

RE: object xxx is not a member of package com

2014-12-04 Thread flyson
Hi Daoyuan, Actually I had already tried the way as you mentioned, but it didn't work for my case. I still got the same compilation errors. Anyone can tell me how to resolve the library dependency on the 3rd party jar in sbt? Thanks! Min -- View this message in context: http://apache-spark-d

Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-04 Thread Takeshi Yamamuro
+1 (non-binding) Checked on CentOS 6.5, compiled from the source. Ran various examples in stand-alone master and three slaves, and browsed the web UI. On Sat, Nov 29, 2014 at 2:16 PM, Patrick Wendell wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.2.0! > >

RDDs for "dimensional" (time series, spatial) data

2014-12-04 Thread RJ Nowling
Hi all, I created a JIRA to discuss adding RDDs for "dimensional" (not sure what else to call it) data like time series and spatial data. Spark could be a better time series and/or spatial "database" than existing approaches out there. https://issues.apache.org/jira/browse/SPARK-4727 I saw that

Re: zinc invocation examples

2014-12-04 Thread Sean Owen
You just run it once with "zinc -start" and leave it running as a background process on your build machine. You don't have to do anything for each build. On Wed, Dec 3, 2014 at 3:44 PM, Nicholas Chammas wrote: > https://github.com/apache/spark/blob/master/docs/building-spark.md#speeding-up-compil

Re: [Thrift,1.2 RC] what happened to parquet.hive.serde.ParquetHiveSerDe

2014-12-04 Thread Michael Armbrust
Here's a fix: https://github.com/apache/spark/pull/3586 On Wed, Dec 3, 2014 at 11:05 AM, Michael Armbrust wrote: > Thanks for reporting. As a workaround you should be able to SET > spark.sql.hive.convertMetastoreParquet=false, but I'm going to try to fix > this before the next RC. > > On Wed, De

Re: packaging spark run time with osgi service

2014-12-04 Thread Lochana Menikarachchi
I think the problem has to do with akka not picking up the reference.conf file in the assembly.jar We managed to make akka pick the conf file by temporary switching the class loaders. Thread.currentThread().setContextClassLoader(JavaSparkContext.class.getClassLoader()); The model gets build

Ooyala Spark JobServer

2014-12-04 Thread Jun Feng Liu
Hi, I am wondering the status of the Ooyala Spark Jobserver, any plan to get it into the spark release? Best Regards Jun Feng Liu IBM China Systems & Technology Laboratory in Beijing Phone: 86-10-82452683 E-mail: liuj...@cn.ibm.com BLD 28,ZGC Software Park No.8 Rd.Dong Bei Wang West, Di

scala.MatchError on SparkSQL when creating ArrayType of StructType

2014-12-04 Thread invkrh
Hi, I am using SparkSQL on 1.1.0 branch. The following code leads to a scala.MatchError at org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala:247) val scm = StructType(*inputRDD*.schema.fields.init :+ StructField("list", ArrayType( StructType(

Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-04 Thread Krishna Sankar
Will do. Am on the road - will annotate an iPython notebook with what works & what didn't work ... Cheers On Wed, Dec 3, 2014 at 4:19 PM, Xiangrui Meng wrote: > Krishna, could you send me some code snippets for the issues you saw > in naive Bayes and k-means? -Xiangrui > > On Sun, Nov 30, 2014

spark osgi class loading issue

2014-12-04 Thread Lochana Menikarachchi
We are trying to call spark through an osgi service (with osgifyed version of assembly.jar). Spark does not work (due to the way spark reads akka reference.conf) unless we switch the class loader as follows. Thread.currentThread().setContextClassLoader(JavaSparkContext.class.getClassLoader());

Re: Spurious test failures, testing best practices

2014-12-04 Thread Ryan Williams
Thanks Marcelo, "this is just how Maven works (unfortunately)" answers my question. Another related question: I tried to use `mvn scala:cc` and discovered that it only seems to work scan src/main and src/test directories (according to its docs

Re: keeping PR titles / descriptions up to date

2014-12-04 Thread Andrew Or
I realize we're not voting, but +1 to this proposal since commit messages can't be changed whereas JIRA issues can always be updated after the fact. 2014-12-02 13:05 GMT-08:00 Patrick Wendell : > Also a note on this for committers - it's possible to re-word the > title during merging, by just run

Dependent on multiple versions of servlet-api jars lead to throw an SecurityException when Spark built for hadoop 2.5.0

2014-12-04 Thread nivdul
Hi ! I have the same issue that this one but using the version 2.5 of Hadoop. The fix bug is here for Hadoop 2.4 and 2.3. For now I just changed my version o

a question of Graph build api

2014-12-04 Thread jinkui.sjk
hi, all where build a graph from edge tuples with api Graph.fromEdgeTuples, the edges object type is RDD[Edge], inside of EdgeRDD.fromEdge, EdgePartitionBuilder.add func’s param is better to be Edge object. no need to create a new Edge object again. def fromEdgeTuples[VD: ClassTag](

a question of Graph build api

2014-12-04 Thread jinkui . sjk
hi, all where build a graph from edge tuples with api Graph.fromEdgeTuples, the edges object type is RDD[Edge], inside of EdgeRDD.fromEdge, EdgePartitionBuilder.add func’s param is better to be Edge object. no need to create a new Edge object again. def fromEdgeTuples[VD: ClassTag](

Re: Spurious test failures, testing best practices

2014-12-04 Thread Imran Rashid
I agree we should separate out the integration tests so it's easy for dev to just run the other fast tests locally. I opened a jira for it https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-4746 On Nov 30, 2014 3:08 PM, "Matei Zaharia" wrote: > Hi Ryan, > > As a tip (and maybe th

Re: Ooyala Spark JobServer

2014-12-04 Thread Patrick Wendell
Hey Jun, The Ooyala server is being maintained by it's original author (Evan Chan) here: https://github.com/spark-jobserver/spark-jobserver This is likely to stay as a standalone project for now, since it builds directly on Spark's public API's. - Patrick On Wed, Dec 3, 2014 at 9:02 PM, Jun Fe

Re: zinc invocation examples

2014-12-04 Thread Nicholas Chammas
Oh, derp. I just assumed from looking at all the options that there was something to it. Thanks Sean. On Thu Dec 04 2014 at 7:47:33 AM Sean Owen wrote: > You just run it once with "zinc -start" and leave it running as a > background process on your build machine. You don't have to do > anything

Re: Unit tests in < 5 minutes

2014-12-04 Thread Nicholas Chammas
fwiw, when we did this work in HBase, we categorized the tests. Then some tests can share a single jvm, while some others need to be isolated in their own jvm. Nevertheless surefire can still run them in parallel by starting/stopping several jvm. I think we need to do this as well. Perhaps the tes

Re: Unit tests in < 5 minutes

2014-12-04 Thread Ted Yu
Have you seen this thread http://search-hadoop.com/m/JW1q5xxSAa2 ? Test categorization in HBase is done through maven-surefire-plugin Cheers On Thu, Dec 4, 2014 at 4:05 PM, Nicholas Chammas wrote: > fwiw, when we did this work in HBase, we categorized the tests. Then some > tests can share a s

Exception adding resource files in latest Spark

2014-12-04 Thread Jianshi Huang
I got the following error during Spark startup (Yarn-client mode): 14/12/04 19:33:58 INFO Client: Uploading resource file:/x/home/jianshuang/spark/spark-latest/lib/datanucleus-api-jdo-3.2.6.jar -> hdfs://stampy/user/jianshuang/.sparkStaging/application_1404410683830_531767/datanucleus-api-jdo-3.2.

Re: Exception adding resource files in latest Spark

2014-12-04 Thread Jianshi Huang
Looks like somehow Spark failed to find the core-site.xml in /et/hadoop/conf I've already set the following env variables: export YARN_CONF_DIR=/etc/hadoop/conf export HADOOP_CONF_DIR=/etc/hadoop/conf export HBASE_CONF_DIR=/etc/hbase/conf Should I put $HADOOP_CONF_DIR/* to HADOOP_CLASSPATH? Jia

Re: Exception adding resource files in latest Spark

2014-12-04 Thread Jianshi Huang
Actually my HADOOP_CLASSPATH has already been set to include /etc/hadoop/conf/* export HADOOP_CLASSPATH=/etc/hbase/conf/hbase-site.xml:/usr/lib/hbase/lib/hbase-protocol.jar:$(hbase classpath) Jianshi On Fri, Dec 5, 2014 at 11:54 AM, Jianshi Huang wrote: > Looks like somehow Spark failed to fin

Re: Exception adding resource files in latest Spark

2014-12-04 Thread Jianshi Huang
Looks like the datanucleus*.jar shouldn't appear in the hdfs path in Yarn-client mode. Maybe this patch broke yarn-client. https://github.com/apache/spark/commit/a975dc32799bb8a14f9e1c76defaaa7cfbaf8b53 Jianshi On Fri, Dec 5, 2014 at 12:02 PM, Jianshi Huang wrote: > Actually my HADOOP_CLASSPA

Re: Exception adding resource files in latest Spark

2014-12-04 Thread Jianshi Huang
Correction: According to Liancheng, this hotfix might be the root cause: https://github.com/apache/spark/commit/38cb2c3a36a5c9ead4494cbc3dde008c2f0698ce Jianshi On Fri, Dec 5, 2014 at 12:45 PM, Jianshi Huang wrote: > Looks like the datanucleus*.jar shouldn't appear in the hdfs path in > Yarn

Re: Exception adding resource files in latest Spark

2014-12-04 Thread Jianshi Huang
I created a ticket for this: https://issues.apache.org/jira/browse/SPARK-4757 Jianshi On Fri, Dec 5, 2014 at 1:31 PM, Jianshi Huang wrote: > Correction: > > According to Liancheng, this hotfix might be the root cause: > > > https://github.com/apache/spark/commit/38cb2c3a36a5c9ead4494cbc3dde

Re: Exception adding resource files in latest Spark

2014-12-04 Thread Patrick Wendell
Thanks for flagging this. I reverted the relevant YARN fix in Spark 1.2 release. We can try to debug this in master. On Thu, Dec 4, 2014 at 9:51 PM, Jianshi Huang wrote: > I created a ticket for this: > > https://issues.apache.org/jira/browse/SPARK-4757 > > > Jianshi > > On Fri, Dec 5, 2014 at

Re: Auto BroadcastJoin optimization failed in latest Spark

2014-12-04 Thread Jianshi Huang
Sorry for the late of follow-up. I used Hao's DESC EXTENDED command and found some clue: new (broadcast broken Spark build): parameters:{numFiles=0, EXTERNAL=TRUE, transient_lastDdlTime=1417763892, COLUMN_STATS_ACCURATE=false, totalSize=0, numRows=-1, rawDataSize=-1} old (broadcast working Spark

drop table if exists throws exception

2014-12-04 Thread Jianshi Huang
Hi, I got exception saying Hive: NoSuchObjectException(message: table not found) when running "DROP TABLE IF EXISTS " Looks like a new regression in Hive module. Anyone can confirm this? Thanks, -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/

Re: Auto BroadcastJoin optimization failed in latest Spark

2014-12-04 Thread Jianshi Huang
If I run ANALYZE without NOSCAN, then Hive can successfully get the size: parameters:{numFiles=0, EXTERNAL=TRUE, transient_lastDdlTime=1417764589, COLUMN_STATS_ACCURATE=true, totalSize=0, numRows=1156, rawDataSize=76296} Is Hive's PARQUET support broken? Jianshi On Fri, Dec 5, 2014 at 3:30 PM,

Re: Auto BroadcastJoin optimization failed in latest Spark

2014-12-04 Thread Jianshi Huang
With Liancheng's suggestion, I've tried setting spark.sql.hive.convertMetastoreParquet false but still analyze noscan return -1 in rawDataSize Jianshi On Fri, Dec 5, 2014 at 3:33 PM, Jianshi Huang wrote: > If I run ANALYZE without NOSCAN, then Hive can successfully get the size: > > parame