Re: Dataframe.fillna from 1.3.0

2015-04-24 Thread Olivier Girardot
done : https://github.com/apache/spark/pull/5683 and https://issues.apache.org/jira/browse/SPARK-7118 thx Le ven. 24 avr. 2015 à 07:34, Olivier Girardot < o.girar...@lateral-thoughts.com> a écrit : > I'll try thanks > > Le ven. 24 avr. 2015 à 00:09, Reynold Xin a écrit : > >> You can do it simil

Re: Should we let everyone set Assignee?

2015-04-24 Thread Steve Loughran
I actually think the assignee JIRA issue is a minor detail; what really matters is do things get in and how. So far, in the bits I've worked on, I've not encountered any problems. And as I've stated in the hadoop-dev lists, my main concern there is long-standing patches that languish because n

Re: Contributing Documentation Changes

2015-04-24 Thread Sean Owen
I think that your own tutorials and such should live on your blog. The goal isn't to pull in a bunch of external docs to the site. On Fri, Apr 24, 2015 at 12:57 AM, madhu phatak wrote: > Hi, > As I was reading contributing to Spark wiki, it was mentioned that we can > contribute external links t

Design docs: consolidation and discoverability

2015-04-24 Thread Punyashloka Biswal
Dear Spark devs, Right now, design docs are stored on Google docs and linked from tickets. For someone new to the project, it's hard to figure out what subjects are being discussed, what organization to follow for new feature proposals, etc. Would it make sense to consolidate future design docs i

Re: Design docs: consolidation and discoverability

2015-04-24 Thread Sean Owen
That would require giving wiki access to everyone or manually adding people any time they make a doc. I don't see how this helps though. They're still docs on the internet and they're still linked from the central project JIRA, which is what you should follow. On Apr 24, 2015 8:14 AM, "Punyashlok

Stackoverflow in createDataFrame.

2015-04-24 Thread Jan-Paul Bultmann
Hey, I get a stack overflow when calling the following method on SQLContext. def createDataFrame(rowRDD: JavaRDD[Row], columns: java.util.List[String]):  DataFrame = {     createDataFrame(rowRDD.rdd, columns.toSeq)   }

Re: Stackoverflow in createDataFrame.

2015-04-24 Thread Jan-Paul Bultmann
Sorry, missed that issue :) > On 24 Apr 2015, at 15:17, yash datta wrote: > > This is already reported : > > https://issues.apache.org/jira/browse/SPARK-6999 > > On 24 Apr 2015 18:11, "Jan-Paul Bultmann" > wrot

Re: Stackoverflow in createDataFrame.

2015-04-24 Thread yash datta
This is already reported : https://issues.apache.org/jira/browse/SPARK-6999 On 24 Apr 2015 18:11, "Jan-Paul Bultmann" wrote: > Hey, > I get a stack overflow when calling the following method on SQLContext. > > def createDataFrame(rowRDD: JavaRDD[Row], columns: > java.util.List[String]): DataFram

Re: Design docs: consolidation and discoverability

2015-04-24 Thread Cody Koeninger
My 2 cents - I'd rather see design docs in github pull requests (using plain text / markdown). That doesn't require changing access or adding people, and github PRs already allow for conversation / email notifications. Conversation is already split between jira and github PRs. Having a third str

Re: Contributing Documentation Changes

2015-04-24 Thread madhu phatak
Hi, I understand that. The following page http://spark.apache.org/documentation.html has a external tutorials,blogs section which points to other blog pages. I wanted to add there. Regards, Madhukara Phatak http://datamantra.io/ On Fri, Apr 24, 2015 at 5:17 PM, Sean Owen wrote: > I think th

Re: Should we let everyone set Assignee?

2015-04-24 Thread Ted Yu
bq. get newly created JIRAs posted onto a list (dev?) +1 On Fri, Apr 24, 2015 at 3:02 AM, Steve Loughran wrote: > > I actually think the assignee JIRA issue is a minor detail; what really > matters is do things get in and how. > > So far, in the bits I've worked on, I've not encountered any pro

Re: Should we let everyone set Assignee?

2015-04-24 Thread Patrick Wendell
It's a bit of a digression - but Steve's suggestion that we have a mailing list for new issues is a great idea and we can do it easily. We could nave new-issues@s.a.o or something (we already have issues@s.a.o). - Patrick On Fri, Apr 24, 2015 at 9:50 AM, Ted Yu wrote: > bq. get newly created JIR

Re: Should we let everyone set Assignee?

2015-04-24 Thread Reynold Xin
I like that idea (having a new-issues list instead of directly forwarding them to dev). On Fri, Apr 24, 2015 at 11:08 AM, Patrick Wendell wrote: > It's a bit of a digression - but Steve's suggestion that we have a > mailing list for new issues is a great idea and we can do it easily. > We could

Jenkins down

2015-04-24 Thread shane knapp
jenkins is currently unreachable. i'm not entirely sure why, as i can't ssh in to the box and see what's going on. i've filed a ticket and will let everyone know when i have more information. shane

Re: Design docs: consolidation and discoverability

2015-04-24 Thread Reynold Xin
I'd love to see more design discussions consolidated in a single place as well. That said, there are many practical challenges to overcome. Some of them are out of our control: 1. For large features, it is fairly common to open a PR for discussion, close the PR taking some feedback into account, a

Re: Jenkins down

2015-04-24 Thread shane knapp
looks like we had a power failure on campus, and our datacenter is working to bring things back up: http://systemstatus.berkeley.edu/ On Fri, Apr 24, 2015 at 11:24 AM, shane knapp wrote: > jenkins is currently unreachable. i'm not entirely sure why, as i can't > ssh in to the box and see what'

Re: [discuss] new Java friendly InputSource API

2015-04-24 Thread Mingyu Kim
I see. So, the difference is that the InputSource is instantiated on the driver side and gets sent to the executors, whereas Hadoop’s InputFormats are instantiated via reflection on the executors. That makes sense. Thanks for the clarification! Mingyu From: Reynold Xin mailto:r...@databricks.c

Re: Design docs: consolidation and discoverability

2015-04-24 Thread Sean Owen
I think it's OK to have design discussions on github, as emails go to ASF lists. After all, loads of PR discussions happen there. It's easy for anyone to follow. I also would rather just discuss on Github, except for all that noise. It's not great to put discussions in something like Google Docs

Re: Design docs: consolidation and discoverability

2015-04-24 Thread Punyashloka Biswal
The Gradle dev team keep their design documents *checked into* their Git repository -- see https://github.com/gradle/gradle/blob/master/design-docs/build-comparison.md for example. The advantages I see to their approach are: - design docs stay on ASF property (since Github is synced to the

Re: Should we let everyone set Assignee?

2015-04-24 Thread Ulanov, Alexander
+1 for new issues in dev list Отправлено с iPhone > 24 апр. 2015 г., в 11:13, Reynold Xin написал(а): > > I like that idea (having a new-issues list instead of directly forwarding > them to dev). > > > On Fri, Apr 24, 2015 at 11:08 AM, Patrick Wendell > wrote: > >> It's a bit of a digressio

Re: Should we let everyone set Assignee?

2015-04-24 Thread Mridul Muralidharan
This is a great suggestion - definitely makes sense to have it. Regards, Mridul On Fri, Apr 24, 2015 at 11:08 AM, Patrick Wendell wrote: > It's a bit of a digression - but Steve's suggestion that we have a > mailing list for new issues is a great idea and we can do it easily. > We could nave new

Re: Design docs: consolidation and discoverability

2015-04-24 Thread Sandy Ryza
I think there are maybe two separate things we're talking about? 1. Design discussions and in-progress design docs. My two cents are that JIRA is the best place for this. It allows tracking the progression of a design across multiple PRs and contributors. A piece of useful feedback that I've go

Re: Design docs: consolidation and discoverability

2015-04-24 Thread Punyashloka Biswal
Sandy, doesn't keeping (in-progress) design docs in Git satisfy the history requirement? Referring back to my Gradle example, it seems that https://github.com/gradle/gradle/commits/master/design-docs/build-comparison.md is a really good way to see why the design doc evolved the way it did. When kee

Re: Design docs: consolidation and discoverability

2015-04-24 Thread Sean Owen
Only catch there is it requires commit access to the repo. We need a way for people who aren't committers to write and collaborate (for point #1) On Fri, Apr 24, 2015 at 3:56 PM, Punyashloka Biswal wrote: > Sandy, doesn't keeping (in-progress) design docs in Git satisfy the history > requirement?

Re: Design docs: consolidation and discoverability

2015-04-24 Thread Cody Koeninger
Why can't pull requests be used for design docs in Git if people who aren't committers want to contribute changes (as opposed to just comments)? On Fri, Apr 24, 2015 at 2:57 PM, Sean Owen wrote: > Only catch there is it requires commit access to the repo. We need a > way for people who aren't co

Re: Dataframe.fillna from 1.3.0

2015-04-24 Thread Reynold Xin
The changes look good to me. Jenkins is somehow not responding. Will merge once Jenkins comes back happy. On Fri, Apr 24, 2015 at 2:38 AM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > done : https://github.com/apache/spark/pull/5683 and > https://issues.apache.org/jira/browse/SPA

Re: Design docs: consolidation and discoverability

2015-04-24 Thread Patrick Wendell
Using our ASF git repository as a working area for design docs, it seems potentially concerning to me. It's difficult process wise because all commits need to go through committers and also, we'd pollute our git history a lot with random incremental design updates. The git history is used a lot by

Re: Jenkins down

2015-04-24 Thread shane knapp
ok, power has been restored and jenkins is back up. we might be taking things down again to fix up some power mis-cabling (jon and i are in the colo, and the jenkins master wasn't on the UPS and needs to be). more updates as they come. sorry for the inconvenience. On Fri, Apr 24, 2015 at 11:33

[SQL][Feature] Access row by column name instead of index

2015-04-24 Thread Shuai Zheng
Hi All, I want to ask whether there is a plan to implement the feature to access the Row in sql by name? Currently we can only allow to access a row by index (there is a python version api of access by name, but none for java) Regards, Shuai --

Re: [SQL][Feature] Access row by column name instead of index

2015-04-24 Thread Reynold Xin
Can you elaborate what you mean by that? (what's already available in Python?) On Fri, Apr 24, 2015 at 2:24 PM, Shuai Zheng wrote: > Hi All, > > I want to ask whether there is a plan to implement the feature to access > the Row in sql by name? Currently we can only allow to access a row by > in

Re: Issue of running partitioned loading (RDD) in Spark External Datasource on Mesos

2015-04-24 Thread Yang Lei
forward to dev. On Mon, Apr 20, 2015 at 10:46 AM, Yang Lei wrote: > I implemented two kinds of DataSource, one load data during buildScan, > the other returning my RDD class with partition information for future > loading. > > My RDD's compute gets actorSystem from SparkEnv.get.actorSystem, the

Re: [SQL][Feature] Access row by column name instead of index

2015-04-24 Thread Michael Armbrust
Already done :) https://github.com/apache/spark/commit/2e8c6ca47df14681c1110f0736234ce76a3eca9b On Fri, Apr 24, 2015 at 2:37 PM, Reynold Xin wrote: > Can you elaborate what you mean by that? (what's already available in > Python?) > > > On Fri, Apr 24, 2015 at 2:24 PM, Shuai Zheng > wrote: > >

RE: [SQL][Feature] Access row by column name instead of index

2015-04-24 Thread Shuai Zheng
Great, That is exactly what I want. Now I will wait patiently for 1.4.0 J Regards, Shuai From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Friday, April 24, 2015 5:57 PM To: Reynold Xin Cc: Shuai Zheng; dev Subject: Re: [SQL][Feature] Access row by column name instead o

Re: Jenkins down

2015-04-24 Thread shane knapp
ok, jenkins is back up and building. we have a few things to mop up here (ganglia is sad), but i think we'll be good for the afternoon. shane On Fri, Apr 24, 2015 at 2:17 PM, shane knapp wrote: > ok, power has been restored and jenkins is back up. we might be taking > things down again to fix

Re: Jenkins down

2015-04-24 Thread Reynold Xin
Thanks for looking into this, Shane. On Fri, Apr 24, 2015 at 3:18 PM, shane knapp wrote: > ok, jenkins is back up and building. we have a few things to mop up here > (ganglia is sad), but i think we'll be good for the afternoon. > > shane > > On Fri, Apr 24, 2015 at 2:17 PM, shane knapp wrote:

Re: Jenkins down

2015-04-24 Thread York, Brennon
Ditto to Reynold. Thanks a bunch for all the updates and work Shane! On 4/24/15, 3:25 PM, "Reynold Xin" wrote: >Thanks for looking into this, Shane. > >On Fri, Apr 24, 2015 at 3:18 PM, shane knapp wrote: > >> ok, jenkins is back up and building. we have a few things to mop up >>here >> (gangli

Re: Jenkins down

2015-04-24 Thread shane knapp
thanks everyone! happy friday! :) On Fri, Apr 24, 2015 at 3:37 PM, York, Brennon wrote: > Ditto to Reynold. Thanks a bunch for all the updates and work Shane! > > On 4/24/15, 3:25 PM, "Reynold Xin" wrote: > > >Thanks for looking into this, Shane. > > > >On Fri, Apr 24, 2015 at 3:18 PM, shane

Re: Design docs: consolidation and discoverability

2015-04-24 Thread Punyashloka Biswal
Okay, I can understand wanting to keep Git history clean, and avoid bottlenecking on committers. Is it reasonable to establish a convention of having a label, component or (best of all) an issue type for issues that are associated with design docs? For example, if we used the existing "Brainstormin

Re: Design docs: consolidation and discoverability

2015-04-24 Thread Sean Owen
I know I recently used Google Docs from a JIRA, so am guilty as charged. I don't think there are a lot of design docs in general, but the ones I've seen have simply pushed docs to a JIRA. (I did the same, mirroring PDFs of the Google Doc.) I don't think this is hard to follow. I think you can do w

Re: Issue of running partitioned loading (RDD) in Spark External Datasource on Mesos

2015-04-24 Thread Reynold Xin
This looks like a specific Spray configuration issue (or how Spray reads config files). Maybe Spray is reading some local config file that doesn't exist on your executors? You might need to email the Spray list. On Fri, Apr 24, 2015 at 2:38 PM, Yang Lei wrote: > forward to dev. > > On Mon, Apr

Call for papers at In-Memory Computing Summit SF 2015

2015-04-24 Thread Konstantin Boudnik
Guys, I wanted to reach out to make sure you're aware about the coming industry first in-memory computing summit (http://imcsummit.org) that takes please in San Francisco on June 29-30, 2015. CFP closes on April 30th, so if you want to participate in the - submit your proposal by the deadline. It

Re: Issue of running partitioned loading (RDD) in Spark External Datasource on Mesos

2015-04-24 Thread Yang Lei
The configure is in the jar I passed in. And if I do not create my own RDD for partitioned loading, everything is fine, in which case the task is run in executor right? So it seems some special call path before triggering my RDD compute makes the configure 'lost'. I will try to see if I can d