Prior to opening a JIRA for the doc change, I thought we could discuss here, unless I am misinterpreting how to use JIRA. My thought is this is more than a dev only thing, the documentation, and this "exception" to documenting Avro like this is due to how Avro support was explained, and I was thinking for this case, we establish this alternative doc page in sync with JIRA to help bring this issue around. By putting discussion here, I am hoping to reach more than JIRA users in asking for opinion/thought on the subject.
John On Sat, Apr 2, 2016 at 10:43 AM, Bob Rumsby <brum...@maprtech.com> wrote: > Hi John, > I recommend opening a JIRA for the suggested doc updates. Others may have > different opinions on what to document. > > Thanks, > Bob > > On Sat, Apr 2, 2016 at 6:38 AM, John Omernik <j...@omernik.com> wrote: > > > This has been an interesting topic, and I am sorry I could not > participate > > more since my original post due to traveling. Stefán is obviously > > frustrated, and I can empathize with him. Being in a position of making > > architectural decisions as well, it can be difficult to help define a > > strategy for your org based on available documentation, be willing to > > working through problems (these are "new" projects), and feel like you > are > > yelling in a canyon. The level of frustration there is real. I do > think, > > as mentioned, the documentation for Avro should be updated ASAP. > > > > To that end, here is a recommendation: Avro needs to be called out as > > experimental. On the documentation page, under "Query Data -> Querying a > > File System, let's add "Querying Avro Files". On this page, I think we > > should, in the first paragraph, state Avro Support has been moved to > > experimental, and as of now the Drill project is working through the > > following problems with Avro files. Basically, let's take Stefán's list, > > and outline the problems, the JIRAs, and the errors that coming up, as > well > > as outline what works and how it works. I will be willing to work on this > > with Stefán. My reasoning is this: obviously Avro support has been > implied > > in the docs thus far, others who may have chosen Avro may be going down a > > path like Stefán based on the documentation, and may end up in a similar > > frustrated state. I want to avoid that. This situation has caused > community > > tension, and does nothing for the project if we don't look to fix it. > Yes, > > this is a different approach then other "experimental" type features in > > Drill, but I feel in order to avoid this situation particularly on Avro, > it > > makes sense to call this out. > > > > Now, this does not fix Stefán's current problem. As a user and community > > member who doesn't code Java, I often struggle to balance asking for > > help/changes with the fact that I personally can't force that change or > > write the change myself, and thus am looking for ways contribute other > > ways. Stefán has been contributing, and I do think we need to > acknowledge > > that. We are all busy, we all have commitments, from the developer side, > to > > those with day jobs, and even Stefán in his job. We all do; in this > > situation it's easy to point fingers and send the blame around, and > yet, I > > don't think any individual completely to blame; there is a confluence of > > situations that has contributed here. Frustrations are high, but we can > > handle this, and I think we should be able to handle it in a way that > ends > > positively for Drill, for the community, and for Stefán. To that end, > here > > are my suggestions for discussion: > > > > > > 1. My documentation suggestion above. It puts it clearly out there > that > > Avro is experimental, and lets users know the risks of Avro. As the > > issues > > get knocked off the list, we can track there as well as JIRAs. While > > this > > is "extra" work, and one may ask "why can't we just use JIRA?". I > think > > since the documentation in the past has been wrong on this, in > response > > we > > should use the documentation in this special case to pull out of the > > situation. I commit to helping this by facilitating the Avro page, I > > just > > need discussion and approval to go this route, and someone who has > > access > > to change the pages to work with me. In addition, it may help pull > > others > > in who have Java/Avro knowledge into contributing to some of the > fixes. > > 2. Let's ensure going forward we consider the challenges of new > features > > like this and making them as experimental for a while. I think for > new > > plugins/readers we could develop a process where we mark as > experimental > > for a number of releases to help work out test cases from users. The > > issues that are brought up by users will help identify bugs as well as > > test > > cases we can use in the code to not only ensure solid interfaces, but > > help > > prevent regressions in future releases. > > 3. I know this one will be asking a lot, if 1 and 2 seem reasonable, > > let's roll up our sleeves on the Avro stuff. Identify those "I can't > > use > > this" issues and separate from "I really want this" issues for > > prioritization, and work to resolve the issues starting with the > > blockers. > > For the Drill project, "our bad" on the supported nature of Avro in > the > > docs, and instead of pulling back and forth on resources, commitments, > > etc, > > on user lists, (which in my opinion really hurts community) we say , > > "this > > sucks, it puts everyone in a bad position, let's steer out of this and > > get > > on track". Based on some of the responses, I don't think this is > > unreasonable thus far. I think Stefán, while I don't speak for him, > > understands the nature of what the community can provide to "him" and > > that > > the community doesn't work for him, at the same time, this is a really > > good > > opportunity for us to band together, and right the course here. > > > > I welcome discussion here. Jacques and Julian, I know that there are some > > challenges around topics like this, and you've outlined them, and I can't > > disagree with your points. At the same time, I don't think anyone is > saying > > the project path, the project itself, or anything Dremio, MapR, or > > individual committers are doing is at fault or should be responsible for > > fixing stuff on their own. I think as I've stated before we have a > > confluence of little things that have added up, and in the end looking > for > > a community solution is our best path. > > > > Cheers, > > > > John > > > > > > > > > > On Sat, Apr 2, 2016 at 1:37 AM, Stefán Baxter <ste...@activitystream.com > > > > wrote: > > > > > Hi Jason, > > > > > > Thank you for writing this up, it's appreciated. > > > > > > First things first. We would be more than happy to help on these Avro > > > related issues but the Drill code base is quite complex, with a fairly > > > steep learning curve, and lately a lot of my time has been spent on > > dealing > > > with the repercussions of having decided to use Avro for fresh/inbound > > > data. (I realize some here might not see this a contribution but I beg > > to > > > differ. Any project requires regular users to put in the time to adapt > > > new/unhardened projects to their solutions and in the case of using > Avro > > > with Drill it's been more like testing and duck-taping than a "simple > > > adaption of free software") > > > > > > I find these Avro problems interesting for other reasons as well: > > > > > > - They raises the question of the commitment behind accepting a > plugin > > > like this (and not marking it experimental) > > > > > > - There are design decision the I think are very wrong > > > - enforcing schema looks to me like a serious violation of the, no > > where > > > to be found, "Drill Manifesto" that I have asked about > > > - see the original entry > > > > > > - The level of noise required to get feedback on a topic like this > > > - I apologize to everyone but ask them to appreciate that this > > > provocative approach was by no means the first option > > > > > > As a "user" I'm obviously not a person that can call for or insist on > > > having these things address but perhaps that changes with time. > > > > > > Now on towards fixing the outstanding bugs. If someone can point us in > > the > > > reght direction and discuss the best approach to fixing each bug then > we > > > can at least try to help (and we do so gladly). > > > > > > It's at least clear to me that many users of Drill, those working on > > > streaming data, need the support for a schema capable format to store > > their > > > inbound/fresh data before it's converted into Parquet. > > > Currently there seems to be no real alternative. > > > > > > So, If we can help then we are willing and I suggest that, if you want, > > we > > > take this to Jira and try to work ir from there. > > > > > > Regards, > > > -Stefán > > > > > > > > > > > > > > > > > > On Fri, Apr 1, 2016 at 9:56 PM, Jason Altekruse <ja...@dremio.com> > > wrote: > > > > > > > I take some responsibility for your lack of response on this, > because I > > > had > > > > said I would try to take a look at the dirN issue that has been > > > outstanding > > > > for some time with Avro. This might have prevented others from > jumping > > in > > > > to help and I will work on communicating when I don't have time to > work > > > on > > > > something that I raise my hand for. > > > > > > > > That being said, there are lots of parts of Drill that still need > > > > attention. I do think that you are the only active user of the Avro > > > support > > > > that I know of. Even though that is the case, I have been trying to > > make > > > > the feature useable for you and and other possible users, like John. > > > > > > > > One thing that would likely be worth discussing as a follow up to > this > > is > > > > our expectations for code quality we accept from contributors. There > > were > > > > several issues with Avro when it was merged, and no one ever really > > took > > > on > > > > the task of fully testing it. > > > > I do know there is another issue around a lack of responses of recent > > > > requests, but I'm tabling that for a little bit. I would like to see > it > > > > discussed, but I want to scope this discussion for now. > > > > > > > > I don't think the plugin is far from fully complete, and I have been > > > > working to improve the tests each time I fix an issue with it. I > think > > it > > > > would be very useful for us to define a clear set of criteria for a > > > feature > > > > like a format plugin to be considered fully tested and ready for > > > inclusion > > > > in the core project. I think this would have the benefit of both > > helping > > > > users to avoid issues, as well as give a clearer definition of the > task > > > of > > > > writing a format plugin. This is a community contribution that should > > be > > > > easier and more strongly encouraged than it is today, and could > really > > > help > > > > new users adopt Drill if they are using other data formats. > > > > > > > > Jason Altekruse > > > > Software Engineer at Dremio > > > > Apache Drill Committer > > > > > > > > On Fri, Apr 1, 2016 at 1:42 PM, Stefán Baxter < > > ste...@activitystream.com > > > > > > > > wrote: > > > > > > > > > Yes Parth, you are 100% right and we are willing to help. > > > > > > > > > > The relationship one builds with a community also depends on the > > > > > "wipe/feeling" of the community and I know it reflects on me here, > as > > > > well > > > > > as the community, that many of my attempts to help and get help > have > > > not > > > > > been fruitful. > > > > > > > > > > I also acknowledge that I this topic get's me frustrated and that > my > > > > > manners could easily improve but it's not as if that is a "first > > > > response" > > > > > but an eventual state caused by indifference on one side and the > > > > > determination to get some response on the other. > > > > > > > > > > Marking Avro as experimental is a considered towards new users and > > > > > something I wish was in place before we decided to depend on it and > > > spend > > > > > all this time on trying to make it work. > > > > > > > > > > Ideally, for us, the decision would be to support Avro properly. > > > > > > > > > > My +1 for improving Avro support so that it can truly be used as an > > > > interim > > > > > file format before data is converted to Parquet. (I see no real > > > > alternative > > > > > here) > > > > > > > > > > - Stefán > > > > > > > > > > > > > > > On Fri, Apr 1, 2016 at 8:25 PM, Parth Chandra <par...@apache.org> > > > wrote: > > > > > > > > > > > +1 on marking Avro experimental. > > > > > > > > > > > > @Stefan, we have been trying to help you as much as our time > > > permits. I > > > > > > know that I held up the 1.6 release while Jason fixed the issues > > that > > > > you > > > > > > brought up. As was said earlier, this is personal time we are > > > spending > > > > to > > > > > > help users in the community, so providing an immediate response > to > > > > > everyone > > > > > > is difficult. Ultimately, it boils down to the relationships one > > > builds > > > > > > within the community. Folks with shared goals help each other and > > > > > everyone > > > > > > benefits. > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Apr 1, 2016 at 11:10 AM, Jacques Nadeau < > > jacq...@dremio.com> > > > > > > wrote: > > > > > > > > > > > > > Stefan, > > > > > > > > > > > > > > It makes sense to me to mark the Avro plugin experimental. > > Clearly, > > > > > there > > > > > > > are bugs. I also want to note your requirements and > expectations > > > > > haven't > > > > > > > always been in alignment with what the Avro plugin developers > > > > > > > built/envisioned (especially around schemas). As part of trying > > to > > > > > > address > > > > > > > these gaps, I'd like to ask again for you to provide actual > data > > > and > > > > > > tests > > > > > > > cases so we make sure that the Avro plugin includes those as > > future > > > > > test > > > > > > > cases. (This is absolutely the best way to ensure that the > > project > > > > > > > continues to work for your use case.) > > > > > > > > > > > > > > The bigger issue I see here is that you expect the community to > > > spend > > > > > > time > > > > > > > doing what you want. You have already received a lot of that > via > > > free > > > > > > > support and numerous bug fixes by myself, Jason and others. You > > > need > > > > to > > > > > > > remember: this community is run by a bunch of volunteers. > > Everybody > > > > > here > > > > > > > has a day job. A lot of time I spend in the community is at the > > > cost > > > > of > > > > > > my > > > > > > > personal life. For others, it is the same. > > > > > > > > > > > > > > This is a good place to ask for help but you should never > demand > > > it. > > > > If > > > > > > you > > > > > > > want paid support, I know Ted offered this from MapR and I'm > sure > > > if > > > > > you > > > > > > > went that route, your issues would get addressed very quickly. > If > > > you > > > > > > don't > > > > > > > want to go that route, then I suggest that you help by creating > > > more > > > > > > > example data and test cases and focusing on what are the most > > > > important > > > > > > > issues that you need to solve. From there, you can continue to > > > expect > > > > > > that > > > > > > > people will help you--as they can. There are no guarantees in > > open > > > > > > source. > > > > > > > Everything comes through the kindness and shared goals of those > > in > > > > the > > > > > > > community. > > > > > > > > > > > > > > thanks, > > > > > > > Jacques > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Jacques Nadeau > > > > > > > CTO and Co-Founder, Dremio > > > > > > > > > > > > > > On Fri, Apr 1, 2016 at 5:43 AM, Stefán Baxter < > > > > > ste...@activitystream.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > Is it at all possible that we are the only company trying to > > use > > > > Avro > > > > > > > with > > > > > > > > Drill to some serious extent? > > > > > > > > > > > > > > > > We continue to coma across all sorts of embarrassing > > shortcomings > > > > > like > > > > > > > the > > > > > > > > one we are dealing with now where a schema change exception > is > > > > thrown > > > > > > > even > > > > > > > > when working with a single Avro file (that has the same > > schema). > > > > > > > > > > > > > > > > Can a non project member call for a discussion on this topic > > and > > > > the > > > > > > > level > > > > > > > > of support that is offered for Avro in Drill? > > > > > > > > > > > > > > > > My discussion topics would be: > > > > > > > > > > > > > > > > - Strange schema validation that ... : > > > > > > > > ... currently fails on single file > > > > > > > > ... prevents dirX variables to work > > > > > > > > ... would require Drill to scan all Avro files to > establish > > > > schema > > > > > > > (even > > > > > > > > when pruning would be used) > > > > > > > > ... would ALWAY fail for old queries if the an old Avro > > file, > > > > > > > containing > > > > > > > > the original fields, was removed and could not be scanned > > > > > > > > ... does not rhyme with the "eliminate ETL" and "Evolving > > > > Schema" > > > > > > > goals > > > > > > > > of Drill > > > > > > > > > > > > > > > > - Simple union types do not work to declare nullable > fields > > > > > > > > > > > > > > > > - Drill can not read Parquet that is created by > > > parquet-mr-avro > > > > > > > > > > > > > > > > - What is the intention for Avro in Drill > > > > > > > > - Should we select to use some other format to > buffer/badge > > > data > > > > > > > before > > > > > > > > creating a Parquet file for it? > > > > > > > > > > > > > > > > - The culture here regarding talking about boring/hard > > topics > > > > like > > > > > > > this > > > > > > > > - Where serious complaints/issues are met with silence > > > > > > > > - I know full well that my frustration shines through here > > and > > > > > that > > > > > > it > > > > > > > > not helping but this Drill+Avro mess is really getting too > > > much > > > > > for > > > > > > us > > > > > > > > to > > > > > > > > handle > > > > > > > > > > > > > > > > Look forward do discuss this here or during the next hangout. > > > > > > > > > > > > > > > > Regards, > > > > > > > > -Stefán (or ... mr. old & frustrated) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >