ORC separate project

2015-03-19 Thread Owen O'Malley
All, Over the last year, there has been a fair number of projects that want to integrate with ORC, but don't want a dependence on Hive's exec jar. Additionally, we've been working on a C++ reader (and soon writer) and it would be great to host them both in the same project. Toward that end, I'd

Re: ORC separate project

2015-03-19 Thread Xuefu Zhang
Hi Owen, I'd like to get involved. Thanks, Xuefu On Thu, Mar 19, 2015 at 2:44 PM, Owen O'Malley wrote: > All, >Over the last year, there has been a fair number of projects that want > to integrate with ORC, but don't want a dependence on Hive's exec jar. > Additionally, we've been working

Re: ORC separate project

2015-03-19 Thread Nick Dimiduk
This is a great plan, +1! On Thursday, March 19, 2015, Owen O'Malley wrote: > All, >Over the last year, there has been a fair number of projects that want > to integrate with ORC, but don't want a dependence on Hive's exec jar. > Additionally, we've been working on a C++ reader (and soon wri

Re: ORC separate project

2015-03-20 Thread Lefty Leverenz
Count me in. -- Lefty On Thu, Mar 19, 2015 at 9:19 PM, Nick Dimiduk wrote: > This is a great plan, +1! > > > On Thursday, March 19, 2015, Owen O'Malley wrote: > >> All, >>Over the last year, there has been a fair number of projects that want >> to integrate with ORC, but don't want a depe

Re: ORC separate project

2015-03-20 Thread Mostafa Mokhtar
Hi Owen, Please add me as well. Thanks Mostafa On 3/19/15, 3:21 PM, "Xuefu Zhang" wrote: >Hi Owen, > >I'd like to get involved. > >Thanks, >Xuefu > >On Thu, Mar 19, 2015 at 2:44 PM, Owen O'Malley wrote: > >> All, >>Over the last year, there has been a fair number of projects that >>want >

RE: ORC separate project

2015-03-23 Thread Lalam, Chinna R
Hi Owen, I'd like to get involved. Please add me as well. Thanks, Chinna Rao Lalam -- Forwarded message -- From: Owen O'Malley mailto:omal...@apache.org>> Date: Fri, Mar 20, 2015 at 3:14 AM Subject: ORC separate project To: "dev@hive.apache.org<ma

Re: ORC separate project

2015-03-31 Thread Owen O'Malley
All, Moving this forward, I'll submit a resolution to the Apache board for the next meeting. One of the concerns that has been mentioned is how to deal with the vectorization and SARG APIs. I'd like to propose that we pull the minimal set of classes in a new Hive module named "storage-api". This

Re: ORC separate project

2015-04-01 Thread Carl Steinbach
Hi Owen, I think you're referring to the following questions I asked last week on the PMC mailing list: 1) How much if any of the code for vectorization/sargs/ACID will migrate over to the new ORC project. 2) Will Hive contributors encounter situations where they are required to make changes to

Re: ORC separate project

2015-04-01 Thread Alan Gates
Carl Steinbach April 1, 2015 at 0:01 Hi Owen, I think you're referring to the following questions I asked last week on the PMC mailing list: 1) How much if any of the code for vectorization/sargs/ACID will migrate over to the new ORC project. 2) Will Hive contr

Re: ORC separate project

2015-04-01 Thread Owen O'Malley
On Wed, Apr 1, 2015 at 10:10 AM, Alan Gates wrote: > > > Carl Steinbach > April 1, 2015 at 0:01 > > Hi Owen, > > I think you're referring to the following questions I asked last week on > the PMC mailing list: > > 1) How much if any of the code for vectorization/sargs/ACID will migrate > over

Re: ORC separate project

2015-04-01 Thread Nick Dimiduk
I think the storage-api would be very helpful for HBase integration as well. On Wed, Apr 1, 2015 at 11:22 AM, Owen O'Malley wrote: > > > On Wed, Apr 1, 2015 at 10:10 AM, Alan Gates wrote: > >> >> >> Carl Steinbach >> April 1, 2015 at 0:01 >> >> Hi Owen, >> >> I think you're referring to the

Re: ORC separate project

2015-04-02 Thread Edward Capriolo
To reiterate, one thing I want to avoid is having hive rely on code that sits in several tiny silos across Apache projects, or Apache Licensed but not ASF projects. Hive is a mature TLP with a large number of committers and it would not be a good situation if often work gets bottle necked because c

Re: ORC separate project

2015-04-02 Thread Szehon Ho
I also agree with this goal. As such, I think we should first see the proposal (JIRA?) for the storage-api refactoring and other related work of Orc separating as TLP before the actual separation happens, to make sure the separation is not done in a way taking us further from this goal. It may ve

Re: ORC separate project

2015-04-03 Thread Xuefu Zhang
I actually have a different thought to share along the same line. ORC is not a subproject in Hive. I'm not sure if it's the best we can do by making a surgery on Hive in order to make ORC a TLP, Not only may this bring instability to Hive, but also it also makes Hive depend an incubating project.

Re: ORC separate project

2015-04-03 Thread Alan Gates
A couple of points: 1) ORC isn't going into the incubator. The proposal before the board is for it to go straight to TLP. There's no graduation to depend on. 2) As currently proposed Hive would not depend on ORC to build. Hive users who wished to used ORC would obviously need to pull in ORC

Re: ORC separate project

2015-04-03 Thread Lefty Leverenz
> > Hive users who wished to use ORC would obviously need to pull in ORC > artifacts in addition to Hive. > What would happen with Hive features that (currently) only work with ORC? Would they be extended to work with other file formats and stay in Hive? What about future features -- would they ha

Re: ORC separate project

2015-04-03 Thread Lefty Leverenz
I guess I'm echoing previous concerns, in less technical language. (Should have reread the thread before sending.) -- Lefty On Fri, Apr 3, 2015 at 4:25 PM, Lefty Leverenz wrote: > Hive users who wished to use ORC would obviously need to pull in ORC >> artifacts in addition to Hive. >> > > What

Re: ORC separate project

2015-04-03 Thread Thejas Nair
On Fri, Apr 3, 2015 at 1:25 PM, Lefty Leverenz wrote: > Hive users who wished to use ORC would obviously need to pull in ORC >> artifacts in addition to Hive. >> > > What would happen with Hive features that (currently) only work with ORC? > Would they be extended to work with other file formats

Re: ORC separate project

2015-04-06 Thread Brock Noland
Hey guys, Good discussion here. One point of order, I feel like this should be a [DISCUSS] thread. Some folks filter on that specific text as it's quite standard in Apache to use that subject prefix for big issues like this one. Brock On Fri, Apr 3, 2015 at 3:59 PM, Thejas Nair wrote: > On Fri,

Re: ORC separate project

2015-04-06 Thread Lefty Leverenz
Is there a way to change this to a DISCUSS thread? Or could everything be copied into a new thread? Or just start a new thread with a reference to this one? -- Lefty On Tue, Apr 7, 2015 at 2:26 AM, Brock Noland wrote: > Hey guys, > > Good discussion here. One point of order, I feel like this

Re: ORC separate project

2015-04-07 Thread Xuefu Zhang
If I understood Allen's #2 comment, we are moving existing ORC code out of Hive and make it a separate project, which I definitely missed. Since existing Hive PMC has governance on the code, I would expect it's still the case even after the spinoff. Obviously the proposal doesn't reflect this. Tha

Re: ORC separate project

2015-04-07 Thread Lefty Leverenz
Actually not so -- a spin-off project would have its own PMC and the Hive PMC wouldn't have any say-so. Of course, there would be some overlap of the two PMCs. (I'm not even sure if the PMC has governance of code, technically. That might belong to the committers or the development community. We

Re: [DISCUSS] ORC separate project

2015-04-08 Thread Owen O'Malley
On Mon, Apr 6, 2015 at 11:26 PM, Brock Noland wrote: > Hey guys, > > Good discussion here. One point of order, I feel like this should be a > [DISCUSS] thread. Ok, I've edited the subject on this reply. At the very least, this will hit people's filters. .. Owen

Re: [DISCUSS] ORC separate project

2015-04-08 Thread Owen O'Malley
On Tue, Apr 7, 2015 at 8:49 PM, Xuefu Zhang wrote: > If I understood Allen's #2 comment, we are moving existing ORC code out of > Hive and make it a separate project, which I definitely missed. > I'm sorry that wasn't clear. Yes, most of the code that is currently in org.apache.hadoop.hive.ql.io

Re: [DISCUSS] ORC separate project

2015-04-10 Thread Xuefu Zhang
To Lefty's comment - Yes, anyone can take Apache code and make another project at will. However, for changes made to an existing project as part of that process, such as what Owen described for ORC in Hive, it is certainly something that Hive PMC can control or vote on. Nevertheless, that's not my

Re: [DISCUSS] ORC separate project

2015-04-10 Thread Gopal Vijayaraghavan
On 4/10/15, 8:05 PM, "Xuefu Zhang" wrote: >To Owen's explanation - Thanks. I guess my major concern is that we >seemingly are breaking apart Hive's integrity and making it hard to >release >and maintain due to increasing number of external dependents. Let's say >that Hive depends on a certain v

Re: [DISCUSS] ORC separate project

2015-04-11 Thread Lefty Leverenz
Speaking of the C++ ORC reader and writer, could they be included in the Hive project or do they have to be separate because they aren't Java code? By the way, gmail thwarts adding [DISCUSS] to the subject line. It shows up in the mail archives, although pre- & post-DISCUSS threads are separate.

Re: [DISCUSS] ORC separate project

2015-04-13 Thread Sergey Shelukhin
IMHO there are 2 separate concerns, forking ORC and Hive using ³new² ORC. The first one does not really require vote, as discussed on private/board - anyone can fork part of code (in this case, at least). Then, for Hive switching to ³new² ORC, I¹m not sure that requires a vote either. We didn¹t vot

Re: [DISCUSS] ORC separate project

2015-04-15 Thread Owen O'Malley
On Mon, Apr 13, 2015 at 10:43 PM, Sergey Shelukhin wrote: > The 2nd concern about fixing issue quickly doesn¹t make sense - it can > happen with any dependency. What if guava or Kryo or Spark or Tez have a > bug? We can still ship Hive as long as the dependency can be updated to > correct version

Re: [DISCUSS] ORC separate project

2015-04-15 Thread Owen O'Malley
On Mon, Apr 13, 2015 at 10:43 PM, Sergey Shelukhin wrote: > The 2nd concern about fixing issue quickly doesn¹t make sense - it can > happen with any dependency. What if guava or Kryo or Spark or Tez have a > bug? We can still ship Hive as long as the dependency can be updated to > correct version