Prior to opening a JIRA for the doc change, I thought we could discuss
here, unless I am misinterpreting how to use JIRA. My thought is this is
more than a dev only thing, the documentation, and this "exception" to
documenting Avro like this is due to how  Avro support was explained, and I
was thinking for this case, we establish this alternative doc page in sync
with JIRA to help bring this issue around.  By putting discussion here, I
am hoping to reach more than JIRA users in asking for opinion/thought on
the subject.

John

On Sat, Apr 2, 2016 at 10:43 AM, Bob Rumsby <brum...@maprtech.com> wrote:

> Hi John,
> I recommend opening a JIRA for the suggested doc updates. Others may have
> different opinions on what to document.
>
> Thanks,
> Bob
>
> On Sat, Apr 2, 2016 at 6:38 AM, John Omernik <j...@omernik.com> wrote:
>
> > This has been an interesting topic, and I am sorry I could not
> participate
> > more since my original post due to traveling.  Stefán is obviously
> > frustrated, and I can empathize with him. Being in a position of making
> > architectural decisions as well, it can be difficult to help define a
> > strategy for your org based on available documentation, be willing to
> > working through problems (these are "new" projects), and feel like you
> are
> > yelling in a canyon.  The level of frustration there is real.  I do
> think,
> > as mentioned, the documentation for Avro should be updated ASAP.
> >
> > To that end, here is a recommendation:  Avro needs to be called out as
> > experimental.  On the documentation page, under "Query Data -> Querying a
> > File System, let's add "Querying Avro Files".  On this page,  I think we
> > should, in the first paragraph, state Avro Support has been moved to
> > experimental, and as of now the Drill project is working through the
> > following problems with Avro files. Basically, let's take Stefán's list,
> > and outline the problems, the JIRAs, and the errors that coming up, as
> well
> > as outline what works and how it works. I will be willing to work on this
> > with Stefán. My reasoning is this:  obviously Avro support has been
> implied
> > in the docs thus far, others who may have chosen Avro may be going down a
> > path like Stefán based on the documentation, and may end up in a similar
> > frustrated state. I want to avoid that. This situation has caused
> community
> > tension, and does nothing for the project if we don't look to fix it.
> Yes,
> > this is a different approach then other "experimental" type features in
> > Drill, but I feel in order to avoid this situation particularly on Avro,
> it
> > makes sense to call this out.
> >
> > Now, this does not fix Stefán's current problem.  As a user and community
> > member who doesn't code Java, I often struggle to balance asking for
> > help/changes with the fact that  I personally can't force that change or
> > write the change myself, and thus am looking for ways contribute other
> > ways.  Stefán has been contributing, and I do think we need to
> acknowledge
> > that. We are all busy, we all have commitments, from the developer side,
> to
> > those with day jobs, and even Stefán in his job.  We all do; in this
> > situation it's easy to point fingers and send the blame around,  and
> yet, I
> > don't think any individual completely to blame; there is a confluence of
> > situations that has contributed here.  Frustrations are high, but we can
> > handle this, and I think we should be able to handle it in a way that
> ends
> > positively for Drill, for the community, and for Stefán.  To that end,
> here
> > are my suggestions for discussion:
> >
> >
> >    1. My documentation suggestion above. It puts it clearly out there
> that
> >    Avro is experimental, and lets users know the risks of Avro. As the
> > issues
> >    get knocked off the list, we can track there as well as JIRAs. While
> > this
> >    is "extra" work, and one may ask "why can't we just use JIRA?".  I
> think
> >    since the documentation in the past has been wrong on this, in
> response
> > we
> >    should use the documentation in this special case to pull out of the
> >    situation.  I commit to helping this by facilitating the Avro page, I
> > just
> >    need discussion and approval to go this route, and someone who has
> > access
> >    to change the pages to work with me. In addition, it may help pull
> > others
> >    in who have Java/Avro knowledge into contributing to some of the
> fixes.
> >    2. Let's ensure going forward we consider the challenges of new
> features
> >    like this and making them as experimental for a while.  I think for
> new
> >    plugins/readers we could develop a process where we mark as
> experimental
> >    for a number of releases to help work out test cases from users.   The
> >    issues that are brought up by users will help identify bugs as well as
> > test
> >    cases we can use in the code to not only ensure solid interfaces, but
> > help
> >    prevent regressions in future releases.
> >    3. I know this one will be asking a lot, if 1 and 2 seem reasonable,
> >    let's roll up our sleeves on the Avro stuff.  Identify those "I can't
> > use
> >    this" issues  and separate from "I really want this" issues for
> >    prioritization, and work to resolve the issues starting with the
> > blockers.
> >    For the Drill project, "our bad" on the supported nature of Avro in
> the
> >    docs, and instead of pulling back and forth on resources, commitments,
> > etc,
> >    on user lists, (which in my opinion really hurts community) we say ,
> > "this
> >    sucks, it puts everyone in a bad position, let's steer out of this and
> > get
> >    on track".  Based on some of the responses, I don't think this is
> >    unreasonable thus far. I think Stefán, while I don't speak for him,
> >    understands the nature of what the community can provide to "him" and
> > that
> >    the community doesn't work for him, at the same time, this is a really
> > good
> >    opportunity for us to band together, and right the course here.
> >
> > I welcome discussion here. Jacques and Julian, I know that there are some
> > challenges around topics like this, and you've outlined them, and I can't
> > disagree with your points. At the same time, I don't think anyone is
> saying
> > the project path, the project itself, or anything Dremio, MapR, or
> > individual  committers are doing is at fault or should be responsible for
> > fixing stuff on their own.  I think as I've stated before we have a
> > confluence of little things that have added up, and in the end looking
> for
> > a community solution is our best path.
> >
> > Cheers,
> >
> > John
> >
> >
> >
> >
> > On Sat, Apr 2, 2016 at 1:37 AM, Stefán Baxter <ste...@activitystream.com
> >
> > wrote:
> >
> > > Hi Jason,
> > >
> > > Thank you for writing this up, it's appreciated.
> > >
> > > First things first. We would be more than happy to help on these Avro
> > > related issues but the Drill code base is quite complex, with a fairly
> > > steep learning curve, and lately a lot of my time has been spent on
> > dealing
> > > with the repercussions of having decided to use Avro for fresh/inbound
> > > data.  (I realize some here might not see this a contribution but I beg
> > to
> > > differ. Any project requires regular users to put in the time to adapt
> > > new/unhardened projects to their solutions and in the case of using
> Avro
> > > with Drill it's been more like testing and duck-taping than a "simple
> > > adaption of free software")
> > >
> > > I find these Avro problems interesting for other reasons as well:
> > >
> > >    - They raises the question of the commitment behind accepting a
> plugin
> > >    like this (and not marking it experimental)
> > >
> > >    - There are design decision the I think are very wrong
> > >    - enforcing schema looks to me like a serious violation of the, no
> > where
> > >    to be found, "Drill Manifesto" that I have asked about
> > >    - see the original entry
> > >
> > >    - The level of noise required to get feedback on a topic like this
> > >    - I apologize to everyone but ask them to appreciate that this
> > >    provocative approach was by no means the first option
> > >
> > > As a "user" I'm obviously not a person that can call for or insist on
> > > having these things address but perhaps that changes with time.
> > >
> > > Now on towards fixing the outstanding bugs.  If someone can point us in
> > the
> > > reght direction and discuss the best approach to fixing each bug then
> we
> > > can at least try to help (and we do so gladly).
> > >
> > > It's at least clear to me that many users of Drill, those working on
> > > streaming data, need the support for a schema capable format to store
> > their
> > > inbound/fresh data before it's converted into Parquet.
> > > Currently there seems to be no real alternative.
> > >
> > > So, If we can help then we are willing and I suggest that, if you want,
> > we
> > > take this to Jira and try to work ir from there.
> > >
> > > Regards,
> > >  -Stefán
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Apr 1, 2016 at 9:56 PM, Jason Altekruse <ja...@dremio.com>
> > wrote:
> > >
> > > > I take some responsibility for your lack of response on this,
> because I
> > > had
> > > > said I would try to take a look at the dirN issue that has been
> > > outstanding
> > > > for some time with Avro. This might have prevented others from
> jumping
> > in
> > > > to help and I will work on communicating when I don't have time to
> work
> > > on
> > > > something that I raise my hand for.
> > > >
> > > > That being said, there are lots of parts of Drill that still need
> > > > attention. I do think that you are the only active user of the Avro
> > > support
> > > > that I know of. Even though that is the case, I have been trying to
> > make
> > > > the feature useable for you and and other possible users, like John.
> > > >
> > > > One thing that would likely be worth discussing as a follow up to
> this
> > is
> > > > our expectations for code quality we accept from contributors. There
> > were
> > > > several issues with Avro when it was merged, and no one ever really
> > took
> > > on
> > > > the task of fully testing it.
> > > > I do know there is another issue around a lack of responses of recent
> > > > requests, but I'm tabling that for a little bit. I would like to see
> it
> > > > discussed, but I want to scope this discussion for now.
> > > >
> > > > I don't think the plugin is far from fully complete, and I have been
> > > > working to improve the tests each time I fix an issue with it. I
> think
> > it
> > > > would be very useful for us to define a clear set of criteria for a
> > > feature
> > > > like a format plugin to be considered fully tested and ready for
> > > inclusion
> > > > in the core project. I think this would have the benefit of both
> > helping
> > > > users to avoid issues, as well as give a clearer definition of the
> task
> > > of
> > > > writing a format plugin. This is a community contribution that should
> > be
> > > > easier and more strongly encouraged than it is today, and could
> really
> > > help
> > > > new users adopt Drill if they are using other data formats.
> > > >
> > > > Jason Altekruse
> > > > Software Engineer at Dremio
> > > > Apache Drill Committer
> > > >
> > > > On Fri, Apr 1, 2016 at 1:42 PM, Stefán Baxter <
> > ste...@activitystream.com
> > > >
> > > > wrote:
> > > >
> > > > > Yes Parth, you are 100% right and we are willing to help.
> > > > >
> > > > > The relationship one builds with a community also depends on the
> > > > > "wipe/feeling" of the community and I know it reflects on me here,
> as
> > > > well
> > > > > as the community, that many of my attempts to help and get help
> have
> > > not
> > > > > been fruitful.
> > > > >
> > > > > I also acknowledge that I this topic get's me frustrated and that
> my
> > > > > manners could easily improve but it's not as if that is a "first
> > > > response"
> > > > > but an eventual state caused by indifference on one side and the
> > > > > determination to get some response on the other.
> > > > >
> > > > > Marking Avro as experimental is a considered towards new users and
> > > > > something I wish was in place before we decided to depend on it and
> > > spend
> > > > > all this time on trying to make it work.
> > > > >
> > > > > Ideally, for us, the decision would be to support Avro properly.
> > > > >
> > > > > My +1 for improving Avro support so that it can truly be used as an
> > > > interim
> > > > > file format before data is converted to Parquet. (I see no real
> > > > alternative
> > > > > here)
> > > > >
> > > > > - Stefán
> > > > >
> > > > >
> > > > > On Fri, Apr 1, 2016 at 8:25 PM, Parth Chandra <par...@apache.org>
> > > wrote:
> > > > >
> > > > > > +1 on marking Avro experimental.
> > > > > >
> > > > > > @Stefan, we have been trying to help you as much as our time
> > > permits. I
> > > > > > know that I held up the 1.6 release while Jason fixed the issues
> > that
> > > > you
> > > > > > brought up. As was said earlier, this is personal time we are
> > > spending
> > > > to
> > > > > > help users in the community, so providing an immediate response
> to
> > > > > everyone
> > > > > > is difficult. Ultimately, it boils down to the relationships one
> > > builds
> > > > > > within the community. Folks with shared goals help each other and
> > > > > everyone
> > > > > > benefits.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Apr 1, 2016 at 11:10 AM, Jacques Nadeau <
> > jacq...@dremio.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Stefan,
> > > > > > >
> > > > > > > It makes sense to me to mark the Avro plugin experimental.
> > Clearly,
> > > > > there
> > > > > > > are bugs. I also want to note your requirements and
> expectations
> > > > > haven't
> > > > > > > always been in alignment with what the Avro plugin developers
> > > > > > > built/envisioned (especially around schemas). As part of trying
> > to
> > > > > > address
> > > > > > > these gaps, I'd like to ask again for you to provide actual
> data
> > > and
> > > > > > tests
> > > > > > > cases so we make sure that the Avro plugin includes those as
> > future
> > > > > test
> > > > > > > cases. (This is absolutely the best way to ensure that the
> > project
> > > > > > > continues to work for your use case.)
> > > > > > >
> > > > > > > The bigger issue I see here is that you expect the community to
> > > spend
> > > > > > time
> > > > > > > doing what you want. You have already received a lot of that
> via
> > > free
> > > > > > > support and numerous bug fixes by myself, Jason and others. You
> > > need
> > > > to
> > > > > > > remember: this community is run by a bunch of volunteers.
> > Everybody
> > > > > here
> > > > > > > has a day job. A lot of time I spend in the community is at the
> > > cost
> > > > of
> > > > > > my
> > > > > > > personal life. For others, it is the same.
> > > > > > >
> > > > > > > This is a good place to ask for help but you should never
> demand
> > > it.
> > > > If
> > > > > > you
> > > > > > > want paid support, I know Ted offered this from MapR and I'm
> sure
> > > if
> > > > > you
> > > > > > > went that route, your issues would get addressed very quickly.
> If
> > > you
> > > > > > don't
> > > > > > > want to go that route, then I suggest that you help by creating
> > > more
> > > > > > > example data and test cases and focusing on what are the most
> > > > important
> > > > > > > issues that you need to solve. From there, you can continue to
> > > expect
> > > > > > that
> > > > > > > people will help you--as they can. There are no guarantees in
> > open
> > > > > > source.
> > > > > > > Everything comes through the kindness and shared goals of those
> > in
> > > > the
> > > > > > > community.
> > > > > > >
> > > > > > > thanks,
> > > > > > > Jacques
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Jacques Nadeau
> > > > > > > CTO and Co-Founder, Dremio
> > > > > > >
> > > > > > > On Fri, Apr 1, 2016 at 5:43 AM, Stefán Baxter <
> > > > > ste...@activitystream.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Is it at all possible that we are the only company trying to
> > use
> > > > Avro
> > > > > > > with
> > > > > > > > Drill to some serious extent?
> > > > > > > >
> > > > > > > > We continue to coma across all sorts of embarrassing
> > shortcomings
> > > > > like
> > > > > > > the
> > > > > > > > one we are dealing with now where a schema change exception
> is
> > > > thrown
> > > > > > > even
> > > > > > > > when working with a single Avro file (that has the same
> > schema).
> > > > > > > >
> > > > > > > > Can a non project member call for a discussion on this topic
> > and
> > > > the
> > > > > > > level
> > > > > > > > of support that is offered for Avro in Drill?
> > > > > > > >
> > > > > > > > My discussion topics would be:
> > > > > > > >
> > > > > > > >    - Strange schema validation that ... :
> > > > > > > >    ... currently fails on single file
> > > > > > > >    ... prevents dirX variables to work
> > > > > > > >    ... would require Drill to scan all Avro files to
> establish
> > > > schema
> > > > > > > (even
> > > > > > > >    when pruning would be used)
> > > > > > > >    ... would ALWAY fail for old queries if the an old Avro
> > file,
> > > > > > > containing
> > > > > > > >    the original fields, was removed and could not be scanned
> > > > > > > >    ... does not rhyme with the "eliminate ETL" and "Evolving
> > > > Schema"
> > > > > > > goals
> > > > > > > >    of Drill
> > > > > > > >
> > > > > > > >    - Simple union types do not work to declare nullable
> fields
> > > > > > > >
> > > > > > > >    - Drill can not read Parquet that is created by
> > > parquet-mr-avro
> > > > > > > >
> > > > > > > >    - What is the intention for Avro in Drill
> > > > > > > >    - Should we select to use some other format to
> buffer/badge
> > > data
> > > > > > > before
> > > > > > > >    creating a Parquet file for it?
> > > > > > > >
> > > > > > > >    - The culture here regarding talking about boring/hard
> > topics
> > > > like
> > > > > > > this
> > > > > > > >    - Where serious complaints/issues are met with silence
> > > > > > > >    - I know full well that my frustration shines through here
> > and
> > > > > that
> > > > > > it
> > > > > > > >    not helping but this Drill+Avro mess is really getting too
> > > much
> > > > > for
> > > > > > us
> > > > > > > > to
> > > > > > > >    handle
> > > > > > > >
> > > > > > > > Look forward do discuss this here or during the next hangout.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > >  -Stefán (or ... mr. old & frustrated)
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to