Archiving XML test reports for analysis

2014-12-15 Thread Nicholas Chammas
Every time we run a test cycle on our Jenkins cluster, we generate hundreds of XML reports covering all the tests we have (e.g. `streaming/target/test-reports/org.apache.spark.streaming.util.WriteAheadLogSuite.xml`). These reports contain interesting information about whether tests succeeded or fa

Re: spark kafka batch integration

2014-12-15 Thread Cody Koeninger
For an alternative take on a similar idea, see https://github.com/koeninger/spark-1/tree/kafkaRdd/external/kafka/src/main/scala/org/apache/spark/rdd/kafka An advantage of the approach I'm taking is that the lower and upper offsets of the RDD are known in advance, so it's deterministic. I haven't

Spark Web Site

2014-12-15 Thread Pei Chen
Hi Spark Dev, The cTAKES community was looking at revamping their web site and really liked the clean look of the Spark one. Is it just using Bootstrap and static html pages checked into SVN? Or are you using the Apache CMS somehow? Mind if we borrow the layout? --Pei

Re: Archiving XML test reports for analysis

2014-12-15 Thread shane knapp
right now, the following logs are archived on to the master: local log_files=$( find .\ -name "unit-tests.log" -o\ -path "./sql/hive/target/HiveCompatibilitySuite.failed" -o\ -path "./sql/hive/target/HiveCompatibilitySuite.hiveFailed" -o\ -path "./sql/hive/target/Hive

Re: Spark Web Site

2014-12-15 Thread Matei Zaharia
It's just Bootstrap checked into SVN and built using Jekyll. You can check out the raw source files from SVN from https://svn.apache.org/repos/asf/spark. IMO it's fine if you guys use the layout, but just make sure it doesn't look exactly the same because otherwise both sites will look like they

Re: Spark JIRA Report

2014-12-15 Thread Andrew Ash
Nick, Putting the N most stale issues into a report like your latest one does seem like a good way to tackle the wall of text effect that I'm worried about. On Sun, Dec 14, 2014 at 12:28 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > Taking after Andrew’s suggestion, perhaps the rep

Test failures after Jenkins upgrade

2014-12-15 Thread Patrick Wendell
Hey All, It appears that a single test suite is failing after the jenkins upgrade: "org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDDSuite". My guess is the suite is not resilient in some way to differences in the environment (JVM, OS version, or something else). I'm going to disable the

Re: Test failures after Jenkins upgrade

2014-12-15 Thread Josh Rosen
There’s a JIRA for this: https://issues.apache.org/jira/browse/SPARK-4826 And two open PRs: https://github.com/apache/spark/pull/3695 https://github.com/apache/spark/pull/3701 We might be close to fixing this via one of those PRs, so maybe we should try using one of those instead? On December

Re: Test failures after Jenkins upgrade

2014-12-15 Thread Patrick Wendell
Ah cool Josh - I think for some reason we are hitting this every time now. Since this is holding up a bunch of other patches, I just pushed something ignoring the tests as a hotfix. Even waiting for a couple hours is really expensive productivity-wise given the frequency with which we run tests. We

Re: Test failures after Jenkins upgrade

2014-12-15 Thread Josh Rosen
Opened one more PR for this: https://github.com/apache/spark/pull/3704 On December 15, 2014 at 10:59:00 AM, Patrick Wendell (pwend...@gmail.com) wrote: Ah cool Josh - I think for some reason we are hitting this every time now. Since this is holding up a bunch of other patches, I just pushed

Re: CrossValidator API in new spark.ml package

2014-12-15 Thread Xiangrui Meng
Yes, regularization path could be viewed as training multiple models at once. -Xiangrui On Sat, Dec 13, 2014 at 6:53 AM, DB Tsai wrote: > Okay, I got it. In Estimator, fit(dataset: SchemaRDD, paramMaps: > Array[ParamMap]): Seq[M] can be overwritten to implement > regularization path. Correct me i

Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-15 Thread Xiangrui Meng
Hi Krishna, Thanks for providing the notebook! I tried and found that the problem is with PySpark's zip. I created a JIRA to track the issue: https://issues.apache.org/jira/browse/SPARK-4841 -Xiangrui On Thu, Dec 11, 2014 at 1:55 PM, Krishna Sankar wrote: > K-Means iPython notebook & data attac

Re: Spark JIRA Report

2014-12-15 Thread Nicholas Chammas
OK, that's good. Another approach we can take to controlling the number of stale JIRA issues is writing a bot that simply closes issues after N days of inactivity and prompts people to reopen the issue if it's still valid. I believe Sean Owen proposed that at one point (?). I wonder if that might

Re: Archiving XML test reports for analysis

2014-12-15 Thread Nicholas Chammas
How about all of them ? How much data per day would it roughly be if we uploaded all the logs for all these builds? Also, would Databricks be willing to offer up an S3 bucket for this purpose? Nick On Mon Dec 15 2014 at 11:48:44 AM shane knapp

Re: Archiving XML test reports for analysis

2014-12-15 Thread shane knapp
i have no problem w/storing all of the logs. :) i also have no problem w/donated S3 buckets. :) On Mon, Dec 15, 2014 at 2:39 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > > How about all of them ? > How > much data per day would

Re: Tachyon in Spark

2014-12-15 Thread Jun Feng Liu
Thanks the response. I got the point - sounds like todays Spark linage dose not push to Tachyon linage. Would be good to see how it works. Jun Feng Liu. Haoyuan Li

Re: Governance of the Jenkins whitelist

2014-12-15 Thread Patrick Wendell
Hey Andrew, The list of admins is maintained by the Amplab as part of their donation of this infrastructure. The reason why we need to have admins is that the pull request builder will fetch and then execute arbitrary user code, so we need to do a security audit before we can approve testing new p