Hi Jason,

Thank you for writing this up, it's appreciated.

First things first. We would be more than happy to help on these Avro
related issues but the Drill code base is quite complex, with a fairly
steep learning curve, and lately a lot of my time has been spent on dealing
with the repercussions of having decided to use Avro for fresh/inbound
data.  (I realize some here might not see this a contribution but I beg to
differ. Any project requires regular users to put in the time to adapt
new/unhardened projects to their solutions and in the case of using Avro
with Drill it's been more like testing and duck-taping than a "simple
adaption of free software")

I find these Avro problems interesting for other reasons as well:

   - They raises the question of the commitment behind accepting a plugin
   like this (and not marking it experimental)

   - There are design decision the I think are very wrong
   - enforcing schema looks to me like a serious violation of the, no where
   to be found, "Drill Manifesto" that I have asked about
   - see the original entry

   - The level of noise required to get feedback on a topic like this
   - I apologize to everyone but ask them to appreciate that this
   provocative approach was by no means the first option

As a "user" I'm obviously not a person that can call for or insist on
having these things address but perhaps that changes with time.

Now on towards fixing the outstanding bugs.  If someone can point us in the
reght direction and discuss the best approach to fixing each bug then we
can at least try to help (and we do so gladly).

It's at least clear to me that many users of Drill, those working on
streaming data, need the support for a schema capable format to store their
inbound/fresh data before it's converted into Parquet.
Currently there seems to be no real alternative.

So, If we can help then we are willing and I suggest that, if you want, we
take this to Jira and try to work ir from there.

Regards,
 -Stefán





On Fri, Apr 1, 2016 at 9:56 PM, Jason Altekruse <ja...@dremio.com> wrote:

> I take some responsibility for your lack of response on this, because I had
> said I would try to take a look at the dirN issue that has been outstanding
> for some time with Avro. This might have prevented others from jumping in
> to help and I will work on communicating when I don't have time to work on
> something that I raise my hand for.
>
> That being said, there are lots of parts of Drill that still need
> attention. I do think that you are the only active user of the Avro support
> that I know of. Even though that is the case, I have been trying to make
> the feature useable for you and and other possible users, like John.
>
> One thing that would likely be worth discussing as a follow up to this is
> our expectations for code quality we accept from contributors. There were
> several issues with Avro when it was merged, and no one ever really took on
> the task of fully testing it.
> I do know there is another issue around a lack of responses of recent
> requests, but I'm tabling that for a little bit. I would like to see it
> discussed, but I want to scope this discussion for now.
>
> I don't think the plugin is far from fully complete, and I have been
> working to improve the tests each time I fix an issue with it. I think it
> would be very useful for us to define a clear set of criteria for a feature
> like a format plugin to be considered fully tested and ready for inclusion
> in the core project. I think this would have the benefit of both helping
> users to avoid issues, as well as give a clearer definition of the task of
> writing a format plugin. This is a community contribution that should be
> easier and more strongly encouraged than it is today, and could really help
> new users adopt Drill if they are using other data formats.
>
> Jason Altekruse
> Software Engineer at Dremio
> Apache Drill Committer
>
> On Fri, Apr 1, 2016 at 1:42 PM, Stefán Baxter <ste...@activitystream.com>
> wrote:
>
> > Yes Parth, you are 100% right and we are willing to help.
> >
> > The relationship one builds with a community also depends on the
> > "wipe/feeling" of the community and I know it reflects on me here, as
> well
> > as the community, that many of my attempts to help and get help have not
> > been fruitful.
> >
> > I also acknowledge that I this topic get's me frustrated and that my
> > manners could easily improve but it's not as if that is a "first
> response"
> > but an eventual state caused by indifference on one side and the
> > determination to get some response on the other.
> >
> > Marking Avro as experimental is a considered towards new users and
> > something I wish was in place before we decided to depend on it and spend
> > all this time on trying to make it work.
> >
> > Ideally, for us, the decision would be to support Avro properly.
> >
> > My +1 for improving Avro support so that it can truly be used as an
> interim
> > file format before data is converted to Parquet. (I see no real
> alternative
> > here)
> >
> > - Stefán
> >
> >
> > On Fri, Apr 1, 2016 at 8:25 PM, Parth Chandra <par...@apache.org> wrote:
> >
> > > +1 on marking Avro experimental.
> > >
> > > @Stefan, we have been trying to help you as much as our time permits. I
> > > know that I held up the 1.6 release while Jason fixed the issues that
> you
> > > brought up. As was said earlier, this is personal time we are spending
> to
> > > help users in the community, so providing an immediate response to
> > everyone
> > > is difficult. Ultimately, it boils down to the relationships one builds
> > > within the community. Folks with shared goals help each other and
> > everyone
> > > benefits.
> > >
> > >
> > >
> > > On Fri, Apr 1, 2016 at 11:10 AM, Jacques Nadeau <jacq...@dremio.com>
> > > wrote:
> > >
> > > > Stefan,
> > > >
> > > > It makes sense to me to mark the Avro plugin experimental. Clearly,
> > there
> > > > are bugs. I also want to note your requirements and expectations
> > haven't
> > > > always been in alignment with what the Avro plugin developers
> > > > built/envisioned (especially around schemas). As part of trying to
> > > address
> > > > these gaps, I'd like to ask again for you to provide actual data and
> > > tests
> > > > cases so we make sure that the Avro plugin includes those as future
> > test
> > > > cases. (This is absolutely the best way to ensure that the project
> > > > continues to work for your use case.)
> > > >
> > > > The bigger issue I see here is that you expect the community to spend
> > > time
> > > > doing what you want. You have already received a lot of that via free
> > > > support and numerous bug fixes by myself, Jason and others. You need
> to
> > > > remember: this community is run by a bunch of volunteers. Everybody
> > here
> > > > has a day job. A lot of time I spend in the community is at the cost
> of
> > > my
> > > > personal life. For others, it is the same.
> > > >
> > > > This is a good place to ask for help but you should never demand it.
> If
> > > you
> > > > want paid support, I know Ted offered this from MapR and I'm sure if
> > you
> > > > went that route, your issues would get addressed very quickly. If you
> > > don't
> > > > want to go that route, then I suggest that you help by creating more
> > > > example data and test cases and focusing on what are the most
> important
> > > > issues that you need to solve. From there, you can continue to expect
> > > that
> > > > people will help you--as they can. There are no guarantees in open
> > > source.
> > > > Everything comes through the kindness and shared goals of those in
> the
> > > > community.
> > > >
> > > > thanks,
> > > > Jacques
> > > >
> > > >
> > > > --
> > > > Jacques Nadeau
> > > > CTO and Co-Founder, Dremio
> > > >
> > > > On Fri, Apr 1, 2016 at 5:43 AM, Stefán Baxter <
> > ste...@activitystream.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Is it at all possible that we are the only company trying to use
> Avro
> > > > with
> > > > > Drill to some serious extent?
> > > > >
> > > > > We continue to coma across all sorts of embarrassing shortcomings
> > like
> > > > the
> > > > > one we are dealing with now where a schema change exception is
> thrown
> > > > even
> > > > > when working with a single Avro file (that has the same schema).
> > > > >
> > > > > Can a non project member call for a discussion on this topic and
> the
> > > > level
> > > > > of support that is offered for Avro in Drill?
> > > > >
> > > > > My discussion topics would be:
> > > > >
> > > > >    - Strange schema validation that ... :
> > > > >    ... currently fails on single file
> > > > >    ... prevents dirX variables to work
> > > > >    ... would require Drill to scan all Avro files to establish
> schema
> > > > (even
> > > > >    when pruning would be used)
> > > > >    ... would ALWAY fail for old queries if the an old Avro file,
> > > > containing
> > > > >    the original fields, was removed and could not be scanned
> > > > >    ... does not rhyme with the "eliminate ETL" and "Evolving
> Schema"
> > > > goals
> > > > >    of Drill
> > > > >
> > > > >    - Simple union types do not work to declare nullable fields
> > > > >
> > > > >    - Drill can not read Parquet that is created by parquet-mr-avro
> > > > >
> > > > >    - What is the intention for Avro in Drill
> > > > >    - Should we select to use some other format to buffer/badge data
> > > > before
> > > > >    creating a Parquet file for it?
> > > > >
> > > > >    - The culture here regarding talking about boring/hard topics
> like
> > > > this
> > > > >    - Where serious complaints/issues are met with silence
> > > > >    - I know full well that my frustration shines through here and
> > that
> > > it
> > > > >    not helping but this Drill+Avro mess is really getting too much
> > for
> > > us
> > > > > to
> > > > >    handle
> > > > >
> > > > > Look forward do discuss this here or during the next hangout.
> > > > >
> > > > > Regards,
> > > > >  -Stefán (or ... mr. old & frustrated)
> > > > >
> > > >
> > >
> >
>

Reply via email to