Re: Spark 1.5: How to trigger expression execution through UnsafeRow/TungstenProject

2015-09-09 Thread Ted Yu
Here is the example from Reynold ( http://search-hadoop.com/m/q3RTtfvs1P1YDK8d) : scala> val data = sc.parallelize(1 to size, 5).map(x => (util.Random.nextInt(size / repetitions),util.Random.nextDouble)).toDF("key", "value") data: org.apache.spark.sql.DataFrame = [key: int, value: double] scala>

Re: [ANNOUNCE] Announcing Spark 1.5.0

2015-09-11 Thread Ted Yu
1.5.0, a correct hadoop version >> (default to 2.2.0 though) and there you go :-) >> >> >> On Wed, Sep 9, 2015 at 6:39 PM Ted Yu wrote: >> >>> Jerry: >>> I just tried building hbase-spark module with 1.5.0 and I see: >>> >&g

Re: spark dataframe transform JSON to ORC meet “column ambigous exception”

2015-09-12 Thread Ted Yu
Is it possible that Canonical_URL occurs more than once in your json ? Can you check your json input ? Thanks On Sat, Sep 12, 2015 at 2:05 AM, Fengdong Yu wrote: > Hi, > > I am using spark1.4.1 data frame, read JSON data, then save it to orc. the > code is very simple: > > DataFrame json = sql

Re: spark dataframe transform JSON to ORC meet “column ambigous exception”

2015-09-12 Thread Ted Yu
e Engineer > > cel: 158-0164-9103 > wetchat: azuryy > > > On Sat, Sep 12, 2015 at 5:52 PM, Ted Yu wrote: > >> Is it possible that Canonical_URL occurs more than once in your json ? >> >> Can you check your json input ? >> >> Thanks >> >>

Re: (send this email to subscribe)

2015-09-13 Thread Ted Yu
See first section of http://spark.apache.org/community.html Cheers > On Sep 13, 2015, at 6:43 PM, 蒋林 wrote: > > Hi,I need subscribe email list,please send me,thank you > > >

Re: SparkR installation not working

2015-09-19 Thread Ted Yu
Looks like you didn't specify sparkr profile when building. Cheers On Sat, Sep 19, 2015 at 12:30 PM, Devl Devel wrote: > Hi All, > > I've built spark 1.5.0 with hadoop 2.6 with a fresh download : > > build/mvn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package > > I try to run Spark

Re: Using scala-2.11 when making changes to spark source

2015-09-20 Thread Ted Yu
Maybe the following can be used for changing Scala version: http://maven.apache.org/archetype/maven-archetype-plugin/ I played with it a little bit but didn't get far. FYI On Sun, Sep 20, 2015 at 6:18 AM, Stephen Boesch wrote: > > The dev/change-scala-version.sh [2.11] script modifies in-plac

Re: passing SparkContext as parameter

2015-09-21 Thread Ted Yu
You can use broadcast variable for passing connection information. Cheers > On Sep 21, 2015, at 4:27 AM, Priya Ch wrote: > > can i use this sparkContext on executors ?? > In my application, i have scenario of reading from db for certain records in > rdd. Hence I need sparkContext to read from

Re: How to modify Hadoop APIs used by Spark?

2015-09-21 Thread Ted Yu
Can you clarify what you want to do: If you modify existing hadoop InputFormat, etc, it would be a matter of rebuilding hadoop and build Spark using the custom built hadoop as dependency. Do you introduce new InputFormat ? Cheers On Mon, Sep 21, 2015 at 1:20 PM, Dogtail Ray wrote: > Hi all, >

Re: Derby version in Spark

2015-09-22 Thread Ted Yu
Which Spark release are you building ? For master branch, I get the following: lib_managed/jars/datanucleus-api-jdo-3.2.6.jar lib_managed/jars/datanucleus-core-3.2.10.jar lib_managed/jars/datanucleus-rdbms-3.2.9.jar FYI On Tue, Sep 22, 2015 at 1:28 PM, Richard Hillegas wrote: > I see that l

Re: Derby version in Spark

2015-09-22 Thread Ted Yu
r > commons-math-2.2.jar jaxb-impl-2.2.3-1.jar paranamer-2.3.jar > xmlenc-0.52.jar > commons-math3-3.4.1.jar jaxb-impl-2.2.7.jar paranamer-2.6.jar xz-1.0.jar > commons-net-3.1.jar jblas-1.2.4.jar parquet-avro-1.7.0.jar > zookeeper-3.4.5.jar > commons-pool-1.5.4.jar jcl-over-slf4j-1.7.

Re: Derby version in Spark

2015-09-22 Thread Ted Yu
I cloned Hive 1.2 code base and saw: 10.10.2.0 So the version used by Spark is quite close to what Hive uses. On Tue, Sep 22, 2015 at 3:29 PM, Ted Yu wrote: > I see. > I use maven to build so I observe different contents under lib_managed > directory. > > Here is snippe

Re: failed to run spark sample on windows

2015-09-28 Thread Ted Yu
What version of hadoop are you using ? Is that version consistent with the one which was used to build Spark 1.4.0 ? Cheers On Mon, Sep 28, 2015 at 4:36 PM, Renyi Xiong wrote: > I tried to run HdfsTest sample on windows spark-1.4.0 > > bin\run-sample org.apache.spark.examples.HdfsTest > > but

Re: [Build] repo1.maven.org: spark libs 1.5.0 for scala 2.10 poms are broken (404)

2015-10-02 Thread Ted Yu
I tried to access https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.10/1.5.0/spark-streaming_2.10-1.5.0.pom on Chrome and Firefox (on Mac) I got 404 FYI On Fri, Oct 2, 2015 at 10:49 AM, andy petrella wrote: > Yup folks, > > I've been reported by someone building the Spark-Notebo

Re: [Build] repo1.maven.org: spark libs 1.5.0 for scala 2.10 poms are broken (404)

2015-10-02 Thread Ted Yu
ore). Maybe the servers are > having issues. > > On Fri, Oct 2, 2015 at 11:05 AM, Ted Yu wrote: > > I tried to access > > > https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.10/1.5.0/spark-streaming_2.10-1.5.0.pom > > on Chrome and Firefox (on Mac) > &g

Re: [Build] repo1.maven.org: spark libs 1.5.0 for scala 2.10 poms are broken (404)

2015-10-02 Thread Ted Yu
Andy: 1.5.1 has many critical bug fixes on top of 1.5.0 http://search-hadoop.com/m/q3RTtGrXP31BVt4l1 Please consider using 1.5.1 Cheers On Fri, Oct 2, 2015 at 11:19 AM, andy petrella wrote: > it's an option but not a solution, indeed > > Le ven. 2 oct. 2015 20:08, Ted Yu a écr

Re: Spark 1.5.1 - Scala 2.10 - Hadoop 1 package is missing from S3

2015-10-04 Thread Ted Yu
hadoop1 package for Scala 2.10 wasn't in RC1 either: http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-bin/ On Sun, Oct 4, 2015 at 5:17 PM, Nicholas Chammas wrote: > I’m looking here: > > https://s3.amazonaws.com/spark-related-packages/ > > I believe this is where one set of offi

Re: IllegalArgumentException: Size exceeds Integer.MAX_VALUE

2015-10-05 Thread Ted Yu
As a workaround, can you set the number of partitions higher in the sc.textFile method ? Cheers On Mon, Oct 5, 2015 at 3:31 PM, Jegan wrote: > Hi All, > > I am facing the below exception when the size of the file being read in a > partition is above 2GB. This is apparently because Java's limita

Re: Scala 2.11 builds broken/ Can the PR build run also 2.11?

2015-10-08 Thread Ted Yu
Interesting https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/job/Spark-Master-Scala211-Compile/ shows green builds. On Thu, Oct 8, 2015 at 6:40 AM, Iulian Dragoș wrote: > Since Oct. 4 the build fails on 2.11 with the dreaded > > [error] /home/ubuntu/workspace/Apache Spark (master)

Re: Scala 2.11 builds broken/ Can the PR build run also 2.11?

2015-10-08 Thread Ted Yu
] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 17:49 min FYI On Thu, Oct 8, 2015 at 6:50 AM, Ted Yu wrote: > Interesting > > > https://amplab.cs.berkeley.edu/jenkins/view/Spar

Re: Compiling Spark with a local hadoop profile

2015-10-08 Thread Ted Yu
In root pom.xml : 2.2.0 You can override the version of hadoop with command similar to: -Phadoop-2.4 -Dhadoop.version=2.7.0 Cheers On Thu, Oct 8, 2015 at 11:22 AM, sbiookag wrote: > I'm modifying hdfs module inside hadoop, and would like the see the > reflection while i'm running spark on

Re: taking the heap dump when an executor goes OOM

2015-10-12 Thread Ted Yu
http://stackoverflow.com/questions/542979/using-heapdumponoutofmemoryerror-parameter-for-heap-dump-for-jboss > On Oct 11, 2015, at 10:45 PM, Niranda Perera wrote: > > Hi all, > > is there a way for me to get the heap-dump hprof of an executor jvm, when it > goes out of memory? > > is this c

Re: Flaky Jenkins tests?

2015-10-12 Thread Ted Yu
You can go to: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN and see if the test failure(s) you encountered appeared there. FYI On Mon, Oct 12, 2015 at 1:24 PM, Meihua Wu wrote: > Hi Spark Devs, > > I recently encountered several cases that the Jenkin failed tests tha

Re: Flaky Jenkins tests?

2015-10-12 Thread Ted Yu
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43553/console > ] > > Traceback (most recent call last): > File > "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", > line 316, in _get_connection > IndexError

Re: Flaky Jenkins tests?

2015-10-12 Thread Ted Yu
Josh: We're on the same page. I used the term 're-submit your PR' which was different from opening new PR. On Mon, Oct 12, 2015 at 2:47 PM, Personal wrote: > Just ask Jenkins to retest; no need to open a new PR just to re-trigger > the build. > > > On October

Re: Getting started

2015-10-13 Thread Ted Yu
Please see https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark On Tue, Oct 13, 2015 at 5:49 AM, _abhishek wrote: > Hello > I am interested in contributing to apache spark.I am new to open source.Can > someone please help me with how to get started,beginner level bugs etc. > T

Re: SPARK_MASTER_IP actually expects a DNS name, not IP address

2015-10-14 Thread Ted Yu
Some old bits: http://stackoverflow.com/questions/28162991/cant-run-spark-1-2-in-standalone-mode-on-mac http://stackoverflow.com/questions/29412157/passing-hostname-to-netty FYI On Wed, Oct 14, 2015 at 7:10 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > I’m setting the Spark master

Re: Building Spark

2015-10-15 Thread Ted Yu
bq. Access is denied Please check permission of the path mentioned. On Thu, Oct 15, 2015 at 3:45 PM, Annabel Melongo < melongo_anna...@yahoo.com.invalid> wrote: > I was trying to build a cloned version of Spark on my local machine using > the command: > mvn -Pyarn -Phadoop-2.4 -Dhadoop.v

Re: SPARK_MASTER_IP actually expects a DNS name, not IP address

2015-10-16 Thread Ted Yu
codebase for `SPARK_MASTER_IP`, amazingly, does not show it > being used in any place directly by Spark > <https://github.com/apache/spark/search?utf8=%E2%9C%93&q=SPARK_MASTER_IP>. > > Clearly, Spark is using this environment variable (otherwise I wouldn't > see the

Re: Build spark 1.5.1 branch fails

2015-10-17 Thread Ted Yu
Have you set MAVEN_OPTS with the following ? -Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m Cheers On Sat, Oct 17, 2015 at 2:35 PM, Chester Chen wrote: > I was using jdk 1.7 and maven version is the same as pom file. > > ᚛ |(v1.5.1)|$ java -version > java version "1.7.0_51" > Java(T

test failed due to OOME

2015-10-18 Thread Ted Yu
From https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=spark-test/3846/console : SparkListenerSuite:- basic creation and shutdown of LiveListenerBus- bus.stop() waits for the event queue to completely drain- basic creation of StageInfo- basic c

streaming test failure

2015-10-18 Thread Ted Yu
When I ran the following command on Linux with latest master branch: ~/apache-maven-3.3.3/bin/mvn clean -Phive -Phive-thriftserver -Pyarn -Phadoop-2.4 -Dhadoop.version=2.7.0 package I saw some test failures: http://pastebin.com/1VYZYy5K Has anyone seen similar test failure before ? Thanks

Re: Problem building Spark

2015-10-19 Thread Ted Yu
See this thread http://search-hadoop.com/m/q3RTtV3VFNdgNri2&subj=Re+Build+spark+1+5+1+branch+fails > On Oct 19, 2015, at 6:59 PM, Annabel Melongo > wrote: > > I tried to build Spark according to the build directions and the it failed > due to the following error: > > > > > > > Bui

Re: Trouble creating JIRA issue

2015-10-22 Thread Ted Yu
You can use the following link: https://issues.apache.org/jira/secure/CreateIssue!default.jspa Remember to select Spark as the project. On Thu, Oct 22, 2015 at 9:38 AM, Richard Marscher wrote: > Hi, > > I'm working on following the guidelines for contributing code to Spark and > am trying to cr

Re: Adding support for truncate operator

2015-10-25 Thread Ted Yu
Have you seen the following ? [SPARK-3907][SQL] Add truncate table support Cheers On Sun, Oct 25, 2015 at 9:01 AM, Shagun Sodhani wrote: > Hi! I noticed that SparkSQL does not support truncate operator as of now. > Can we add it? I am willing to send over a PR for it >

Re: [VOTE] Release Apache Spark 1.5.2 (RC1)

2015-10-25 Thread Ted Yu
When I ran the following command: ~/apache-maven-3.3.3/bin/mvn -Phive -Phive-thriftserver -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 package I got: testChildProcLauncher(org.apache.spark.launcher.SparkLauncherSuite) Time elapsed: 0.031 sec <<< FAILURE! java.lang.AssertionError: expected:<0> but

Re: Exception when using some aggregate operators

2015-10-27 Thread Ted Yu
Have you tried using avg in place of mean ? (1 to 5).foreach { i => val df = (1 to 1000).map(j => (j, s"str$j")).toDF("a", "b").save(s"/tmp/partitioned/i=$i") } sqlContext.sql(""" CREATE TEMPORARY TABLE partitionedParquet USING org.apache.spark.sql.parquet OPTIONS ( path '/tm

Re: Exception when using some aggregate operators

2015-10-28 Thread Ted Yu
ng them in the wrong >> way or is it a bug as I asked in my first mail. >> >> On Wed, Oct 28, 2015 at 3:20 AM, Ted Yu wrote: >> >>> Have you tried using avg in place of mean ? >>> >>> (1 to 5).foreach { i => val df = (1 to 1000).map

Re: Exception when using some aggregate operators

2015-10-28 Thread Ted Yu
he other aggregate functions to be treated as bugs or not? >> >> On Wed, Oct 28, 2015 at 4:08 PM, Shagun Sodhani > > wrote: >> >>> Wouldnt it be: >>> >>> +expression[Max]("avg"), >>> >>> On Wed, Oct 28, 2015 at 4:0

Re: test failed due to OOME

2015-10-30 Thread Ted Yu
This happened recently on Jenkins: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.3,label=spark-test/3964/console On Sun, Oct 18, 2015 at 7:54 AM, Ted Yu wrote: > From > https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-wit

Re: test failed due to OOME

2015-10-30 Thread Ted Yu
> > > system ram: 128G > > > > we can bump it pretty easily... it's just a matter of deciding if we > > want to do this globally (super easy, but will affect ALL maven builds > > on our system -- not just spark) or on a per-job basis (this doesn't >

Re: SparkLauncher#setJavaHome does not set JAVA_HOME in child process

2015-10-31 Thread Ted Yu
On Linux, I got the following test failure (with or without suggested change): testChildProcLauncher(org.apache.spark.launcher.SparkLauncherSuite) Time elapsed: 0.036 sec <<< FAILURE! java.lang.AssertionError: expected:<0> but was:<1> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.

Re: unscribe

2015-11-01 Thread Ted Yu
Please take a look at first section of spark.apache.org/community FYI On Sun, Nov 1, 2015 at 1:09 AM, Chenxi Li wrote: > unscribe >

Re: test failed due to OOME

2015-11-02 Thread Ted Yu
Looks like SparkListenerSuite doesn't OOM on QA runs compared to Jenkins builds. I wonder if this is due to difference between machines running QA tests vs machines running Jenkins builds. On Fri, Oct 30, 2015 at 1:19 PM, Ted Yu wrote: > I noticed that the SparkContext created in each

Re: test failed due to OOME

2015-11-02 Thread Ted Yu
bly need to log into Jenkins and > heap dump some running tests and figure out what is going on. > > On Mon, Nov 2, 2015 at 7:42 AM, Ted Yu wrote: > >> Looks like SparkListenerSuite doesn't OOM on QA runs compared to Jenkins >> builds. >> >> I wonder if t

Re: Running individual test classes

2015-11-03 Thread Ted Yu
My experience is that going through tests in each module takes some time before reaching the test specified by the wildcard. Some test, such as SparkLauncherSuite, would run even if not in wildcard. FYI > On Nov 3, 2015, at 1:24 AM, Nitin Goyal wrote: > > In maven, you might want to try fo

Re: SparkLauncher#setJavaHome does not set JAVA_HOME in child process

2015-11-03 Thread Ted Yu
Opening JIRA is fine. Thanks On Tue, Nov 3, 2015 at 4:25 AM, gus wrote: > Thanks, Ted. > The SparkLauncher test suite runs fine for me, with or without the change. > Do you agree this is a bug? If so, should I open a JIRA? > > > > -- > View this message in context: > http://apache-spark-develop

Re: Master build fails ?

2015-11-03 Thread Ted Yu
Interesting, Sbt builds were not all failing: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/ FYI On Tue, Nov 3, 2015 at 5:58 AM, Jean-Baptiste Onofré wrote: > Hi Jacek, > > it works fine with mvn: the problem is with sbt. > > I suspect a different reactor order in sbt compare to

Re: Build a specific module only

2015-11-04 Thread Ted Yu
Please take a look at https://issues.apache.org/jira/browse/SPARK-10883 > On Nov 4, 2015, at 3:27 AM, gsvic wrote: > > Is it possible to build a specific spark module without building the whole > project? > > For example, I am trying to build sql-core project by > > /build/mvn -pl sql/core ins

Re: Master build fails ?

2015-11-05 Thread Ted Yu
t; > Is there a solution to this ? > > Regards, > Dilip Biswal > Tel: 408-463-4980 > dbis...@us.ibm.com > > > > From: Jean-Baptiste Onofré > To:Ted Yu > Cc:"dev@spark.apache.org" > Date:

Re: Master build fails ?

2015-11-05 Thread Ted Yu
lip Biswal > Tel: 408-463-4980 > dbis...@us.ibm.com > > > > From:Ted Yu > To:Dilip Biswal/Oakland/IBM@IBMUS > Cc:Jean-Baptiste Onofré , "dev@spark.apache.org" > > Date:11/05/2015 10:46 AM > Subject:Re: Master build fail

Re: State of the Build

2015-11-05 Thread Ted Yu
See previous discussion: http://search-hadoop.com/m/q3RTtPnPnzwOhBr FYI On Thu, Nov 5, 2015 at 4:30 PM, Stephen Boesch wrote: > Yes. The current dev/change-scala-version.sh mutates (/pollutes) the build > environment by updating the pom.xml in each of the subprojects. If you were > able to come

Re: Master build fails ?

2015-11-06 Thread Ted Yu
Since maven is the preferred build vehicle, ivy style dependencies policy would produce surprising results compared to today's behavior. I would suggest staying with current dependencies policy. My two cents. On Fri, Nov 6, 2015 at 6:25 AM, Koert Kuipers wrote: > if there is no strong preferen

Re: State of the Build

2015-11-06 Thread Ted Yu
bq. include an sbt jar in the source repo Can you clarify which sbt jar (by path) ? I tried 'git log' on the following files but didn't see commit history: ./build/sbt-launch-0.13.7.jar ./build/zinc-0.3.5.3/lib/sbt-interface.jar ./sbt/sbt-launch-0.13.2.jar ./sbt/sbt-launch-0.13.5.jar On Fri, No

Re: Build fails due to...multiple overloaded alternatives of constructor RDDInfo define default arguments?

2015-11-07 Thread Ted Yu
Created a PR for the compilation error: https://github.com/apache/spark/pull/9538 Cheers On Sat, Nov 7, 2015 at 4:41 AM, Jacek Laskowski wrote: > Hi, > > Checked out the latest sources and the build failed: > > [error] > /Users/jacek/dev/oss/spark/core/src/main/scala/org/apache/spark/storage/RD

Re: Calling stop on StreamingContext locks up

2015-11-07 Thread Ted Yu
Would the following change work for you ? diff --git a/core/src/main/scala/org/apache/spark/util/AsynchronousListenerBus.scala b/core/src/main/scala/org/apache/spark/util/AsynchronousListenerBus.scala index 61b5a4c..c330d25 100644 --- a/core/src/main/scala/org/apache/spark/util/AsynchronousListene

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-08 Thread Ted Yu
+1 On Sat, Nov 7, 2015 at 4:35 PM, Denny Lee wrote: > +1 > > > On Sat, Nov 7, 2015 at 12:01 PM Mark Hamstra > wrote: > >> +1 >> >> On Tue, Nov 3, 2015 at 3:22 PM, Reynold Xin wrote: >> >>> Please vote on releasing the following candidate as Apache Spark version >>> 1.5.2. The vote is open unti

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-08 Thread Ted Yu
Why did you directly jump to spark-streaming-mqtt module ? Can you drop 'spark-streaming-mqtt' and try again ? Not sure why 1.5.0-SNAPSHOT showed up. Were you using RC2 source ? Cheers On Sun, Nov 8, 2015 at 7:28 PM, 欧锐 <494165...@qq.com> wrote: > > build spark-streaming-mqtt_2.10 failed! > >

Re: OLAP query using spark dataframe with cassandra

2015-11-09 Thread Ted Yu
Please consider using NoSQL engine such as hbase. Cheers > On Nov 9, 2015, at 3:03 PM, Andrés Ivaldi wrote: > > Hi, > I'm also considering something similar, Spark plain is too slow for my case, > a possible solution is use Spark as Multiple Source connector and basic > transformation layer,

Re: Seems jenkins is down (or very slow)?

2015-11-12 Thread Ted Yu
I was able to access the following where response was fast: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45806/ Cheers On Thu, Nov 12, 2015 at 6:21 PM, Yin Huai wrote: > Hi Guys, > > Seems Jenkins is

SparkPullRequestBuilder coverage

2015-11-13 Thread Ted Yu
Hi, I noticed that SparkPullRequestBuilder completes much faster than maven Jenkins build. From https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45871/consoleFull , I couldn't get exact time the builder started but looks like the duration was around 20 minutes. From https://ampl

Re: SparkPullRequestBuilder coverage

2015-11-13 Thread Ted Yu
are impacted by the change. E.g. if you only > modify SQL, it won't run the core or streaming tests. > > > On Fri, Nov 13, 2015 at 11:17 AM, Ted Yu wrote: > >> Hi, >> I noticed that SparkPullRequestBuilder completes much faster than maven >> Jenkins build. >

Re: spark 1.4 GC issue

2015-11-15 Thread Ted Yu
Please take a look at http://www.infoq.com/articles/tuning-tips-G1-GC Cheers On Sat, Nov 14, 2015 at 10:03 PM, Renu Yadav wrote: > I have tried with G1 GC .Please if anyone can provide their setting for GC. > At code level I am : > 1.reading orc table usind dataframe > 2.map df to rdd of my cas

Re: releasing Spark 1.4.2

2015-11-16 Thread Ted Yu
See this thread: http://search-hadoop.com/m/q3RTtLKc2ctNPcq&subj=Re+Spark+1+4+2+release+and+votes+conversation+ > On Nov 15, 2015, at 10:53 PM, Niranda Perera wrote: > > Hi, > > I am wondering when spark 1.4.2 will be released? > > is it in the voting stage at the moment? > > rgds > > --

Re: Dropping support for earlier Hadoop versions in Spark 2.0?

2015-11-19 Thread Ted Yu
Should a new job be setup under Spark-Master-Maven-with-YARN for hadoop 2.6.x ? Cheers On Thu, Nov 19, 2015 at 5:16 PM, 张志强(旺轩) wrote: > I agreed > +1 > > -- > 发件人:Reynold Xin > 日 期:2015年11月20日 06:14:44 > 收件人:dev@spark.apache.org;

Re: [ANNOUNCE] Spark 1.6.0 Release Preview

2015-11-24 Thread Ted Yu
If I am not mistaken, the binaries for Scala 2.11 were generated against hadoop 1. What about binaries for Scala 2.11 against hadoop 2.x ? Cheers On Sun, Nov 22, 2015 at 2:21 PM, Michael Armbrust wrote: > In order to facilitate community testing of Spark 1.6.0, I'm excited to > announce the av

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

2015-12-02 Thread Ted Yu
I tried to run test suite and encountered the following: http://pastebin.com/DPnwMGrm FYI On Wed, Dec 2, 2015 at 12:39 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > -0 > > If spark-ec2 is still a supported part of the project, then we should > update its version lists as new relea

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

2015-12-02 Thread Ted Yu
+1 Ran through test suite (minus docker-integration-tests) which passed. Overall experience was much better compared with some of the prior RC's. [INFO] Spark Project External Kafka ... SUCCESS [ 53.956 s] [INFO] Spark Project Examples . SUCCESS [0

Maven build against Hadoop 2.4 times out

2015-12-11 Thread Ted Yu
Hi, You may have noticed that maven build against Hadoop 2.4 times out on Jenkins. The last module is spark-hive-thriftserver This seemed to start with build #4440 FYI - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

Re: Maven build against Hadoop 2.4 times out

2015-12-13 Thread Ted Yu
ailed > way before the thrift server tests. > > On Fri, Dec 11, 2015 at 10:27 AM, Ted Yu wrote: > >> Hi, >> You may have noticed that maven build against Hadoop 2.4 times out on >> Jenkins. >> >> The last module is spark-hive-thrift

Re: Maven build against Hadoop 2.4 times out

2015-12-14 Thread Ted Yu
> I am wondering if there is any environment related issue. > > On Sun, Dec 13, 2015 at 3:38 PM, Ted Yu wrote: > >> Thanks for checking, Yin. >> >> Looks like the cause might be in one of the commits for build #4438 >> >> Cheers >> >>

Re: spark with label nodes in yarn

2015-12-15 Thread Ted Yu
Please take a look at: https://issues.apache.org/jira/browse/SPARK-7173 Cheers > On Dec 15, 2015, at 1:23 AM, 张志强(旺轩) wrote: > > Hi all, > > Has anyone tried label based scheduling via spark on yarn? I’ve tried that, > it didn’t work, spark 1.4.1 + apache hadoop 2.6.0 > > Any feedbacks are

Re: status of 2.11 support?

2015-12-15 Thread Ted Yu
Please see related JIRA: https://issues.apache.org/jira/browse/SPARK-8013 This question is better suited for user mailing list. Thanks On Mon, Dec 14, 2015 at 10:29 PM, Sachin Aggarwal < different.sac...@gmail.com> wrote: > Hi, > > > adding question from user group to dev group need expert advi

Re: spark with label nodes in yarn

2015-12-15 Thread Ted Yu
I was blocked to get the YARN containers by setting > spark.yarn.executor.nodeLabelExpression property. My question, > https://issues.apache.org/jira/browse/SPARK-7173 will fix this? > > > > Thanks > > Allen > > > > > > *发件人:* Ted Yu [mailto:yuzhih...@gmail.co

Re: spark with label nodes in yarn

2015-12-15 Thread Ted Yu
ny labels. > > > > It’s weird to me that YARN page shows my application is running, but > actually it is still waiting for its executor > > > > See the attached. > > > > Thanks, > > Allen > > > > *发件人:* Saisai Shao [mailto:sai.sai.s...@gmail.

Re: does spark really support label expr like && or || ?

2015-12-16 Thread Ted Yu
Allen: Since you mentioned scheduling, I assume you were talking about node label support in YARN. If that is the case, can you give us some more information: How node labels are setup in YARN cluster How you specified node labels in application Hadoop and Spark releases you are using Cheers >

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-16 Thread Ted Yu
Ran test suite (minus docker-integration-tests) All passed +1 [INFO] Spark Project External ZeroMQ .. SUCCESS [ 13.647 s] [INFO] Spark Project External Kafka ... SUCCESS [ 45.424 s] [INFO] Spark Project Examples . SUCCESS [02:06

Re: does spark really support label expr like && or || ?

2015-12-17 Thread Ted Yu
.jar > > so , my question is does the spark.yarn.executor.nodeLabelExpression > and spark.yarn.am.nodeLabelExpression really support "EXPRESSION" like and > &&, or ||, or even ! and so on. > > NOTE: > I didn't change the capacity-scheduler.xml at all,

Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour

2015-12-21 Thread Ted Yu
In Jerry's example, the first SparkContext, sc, has been stopped. So there would be only one SparkContext running at any given moment. Cheers On Mon, Dec 21, 2015 at 8:23 AM, Chester @work wrote: > Jerry > I thought you should not create more than one SparkContext within one > Jvm, ... > C

Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

2015-12-22 Thread Ted Yu
Running test suite, there was timeout in hive-thriftserver module. This has been fixed by SPARK-11823. So I assume this is test issue. lgtm On Tue, Dec 22, 2015 at 2:28 PM, Benjamin Fradet wrote: > +1 > On 22 Dec 2015 9:54 p.m., "Andrew Or" wrote: > >> +1 >> >> 2015-12-22 12:43 GMT-08:00 Reyn

Re: [DAGScheduler] resubmitFailedStages, failedStages.clear() and submitStage

2015-12-24 Thread Ted Yu
getMissingParentStages(stage) would be called for the stage (being re-submitted) If there is no missing parents, submitMissingTasks() would be called. If there is missing parent(s), the parent would go through the same flow. I don't see issue in this part of the code. Cheers On Thu, Dec 24, 201

recurring test failures against hadoop-2.4 profile

2015-12-25 Thread Ted Yu
Hi, You may have noticed the following test failures: org.apache.spark.sql.hive.execution.HiveUDFSuite.UDFIntegerToString org.apache.spark.sql.hive.execution.SQLQuerySuite.udf_java_method Tracing backwards, they started failing since this build: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Ma

Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

2015-12-25 Thread Ted Yu
I found that SBT build for Scala 2.11 has been failing ( https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/SPARK-branch-1.6-COMPILE-SBT-SCALA-2.11/3/consoleFull ) I logged SPARK-12527 and sent a PR. FYI On Tue, Dec 22, 2015 at 12:10 PM, Michael Armbrust wrote: > Please vote

Re: Akka with Spark

2015-12-26 Thread Ted Yu
Do you mind sharing your use case ? It may be possible to use a different approach than Akka. Cheers On Sat, Dec 26, 2015 at 10:08 AM, Disha Shrivastava wrote: > Hi, > > I wanted to know how to use Akka framework with Spark starting from > basics. I saw online that Spark uses Akka framework bu

Re: Akka with Spark

2015-12-27 Thread Ted Yu
scale them independently. So consider streaming data >>>> from Akka to Spark Streaming or go the other way, from Spark to Akka >>>> Streams. >>>> >>>> dean >>>> >>>> Dean Wampler, Ph.D. >>>> Author: Programming Scala,

Re: what is the best way to debug spark / mllib?

2015-12-27 Thread Ted Yu
For #1, 9 minutes seem to be normal. Here was duration for recent build on master branch: [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 10:44 mi

Re: Is there any way to stop a jenkins build

2015-12-29 Thread Ted Yu
HiveThriftBinaryServerSuite got stuck. I thought Josh has fixed this issue: [SPARK-11823][SQL] Fix flaky JDBC cancellation test in HiveThriftBinaryServerSuite On Tue, Dec 29, 2015 at 9:56 AM, Herman van Hövell tot Westerflier < hvanhov...@questtec.nl> wrote: > My AMPLAB jenkins build has been s

Re: Is there any way to stop a jenkins build

2015-12-29 Thread Ted Yu
, 2015 at 10:04 AM, Herman van Hövell tot Westerflier < > hvanhov...@questtec.nl> wrote: > >> Thanks. I'll merge the most recent master... >> >> Still curious if we can stop a build. >> >> Kind regards, >> >> Herman van Hövell tot Westerflier &

IndentationCheck of checkstyle

2015-12-29 Thread Ted Yu
Hi, I noticed that there are a lot of checkstyle warnings in the following form: To my knowledge, we use two spaces for each tab. Not sure why all of a sudden we have so many IndentationCheck warnings: grep 'hild have incorrect indentati' trunkCheckstyle.xml | wc 3133 52645 678294 If th

Re: IndentationCheck of checkstyle

2015-12-29 Thread Ted Yu
Oops, wrong list :-) > On Dec 29, 2015, at 9:48 PM, Reynold Xin wrote: > > +Herman > > Is this coming from the newly merged Hive parser? > > > >> On Tue, Dec 29, 2015 at 9:46 PM, Allen Zhang wrote: >> >> >> format issue I think, go ahead &g

Re: IndentationCheck of checkstyle

2015-12-30 Thread Ted Yu
Right. Pardon my carelessness. > On Dec 29, 2015, at 9:58 PM, Reynold Xin wrote: > > OK to close the loop - this thread has nothing to do with Spark? > > >> On Tue, Dec 29, 2015 at 9:55 PM, Ted Yu wrote: >> Oops, wrong list :-) >> >>> On De

Re: Spark streaming 1.6.0-RC4 NullPointerException using mapWithState

2015-12-30 Thread Ted Yu
I went through StateMap.scala a few times but didn't find any logic error yet. According to the call stack, the following was executed in get(key): } else { parentStateMap.get(key) } This implies that parentStateMap was null. But it seems parentStateMap is properly assigned in readO

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Ted Yu
+1 > On Jan 5, 2016, at 10:49 AM, Davies Liu wrote: > > +1 > > On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas > wrote: >> +1 >> >> Red Hat supports Python 2.6 on REHL 5 until 2020, but otherwise yes, Python >> 2.6 is ancient history and the core Python developers stopped supporting it >> in

Re: Tungsten in a mixed endian environment

2016-01-12 Thread Ted Yu
I logged SPARK-12778 where endian awareness in Platform.java should help in mixed endian set up. There could be other parts of the code base which are related. Cheers On Tue, Jan 12, 2016 at 7:01 AM, Adam Roberts wrote: > Hi all, I've been experimenting with DataFrame operations in a mixed > e

Re: Dependency on TestingUtils in a Spark package

2016-01-12 Thread Ted Yu
There is no annotation in TestingUtils class indicating whether it is suitable for consumption by external projects. You should assume the class is not public since its methods may change in future Spark releases. Cheers On Tue, Jan 12, 2016 at 12:36 PM, Robert Dodier wrote: > Hi, > > I'm putt

Re: Spark 1.6.0 and HDP 2.2 - problem

2016-01-13 Thread Ted Yu
I would suggest trying option #1 first. Thanks > On Jan 13, 2016, at 2:12 AM, Maciej Bryński wrote: > > Hi, > I/m trying to run Spark 1.6.0 on HDP 2.2 > Everything was fine until I tried to turn on dynamic allocation. > According to instruction I need to add shuffle service to yarn classpath.

Re: timeout in shuffle problem

2016-01-24 Thread Ted Yu
Cycling past bits: http://search-hadoop.com/m/q3RTtU5CRU1KKVA42&subj=RE+shuffle+FetchFailedException+in+spark+on+YARN+job On Sun, Jan 24, 2016 at 5:52 AM, wangzhenhua (G) wrote: > Hi, > > I have a problem of time out in shuffle, it happened after shuffle write > and at the start of shuffle read,

Re: BUILD FAILURE at spark-sql_2.11?!

2016-01-27 Thread Ted Yu
Strangely both Jenkins jobs showed green status: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/SPARK-master-COMPILE-sbt-SCALA-2.11/ https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/SPARK-master-COMPILE-MAVEN-SCALA-2.11/ On Wed, Jan 27, 2016 at 12:47 AM,

Re: Spark 1.6.0 + Hive + HBase

2016-01-28 Thread Ted Yu
For the last two problems, hbase-site.xml seems not to be on classpath. Once hbase-site.xml is put on classpath, you should be able to make progress. Cheers > On Jan 28, 2016, at 1:14 AM, Maciej Bryński wrote: > > Hi, > I'm trying to run SQL query on Hive table which is stored on HBase. > I'

Re: 回复: Spark 1.6.0 + Hive + HBase

2016-01-28 Thread Ted Yu
alsIgnoreCase("string"); > > String tsColName = null; > if (iTimestamp >= 0) { > tsColName = > jobConf.get(serdeConstants.LIST_COLUMNS).split(",")[iTimestamp]; > } > > > > -- 原始邮件 -- > *发件人:* "Jörn Franke";;

<    4   5   6   7   8   9   10   11   12   13   >