Reviewed the spec; many comments posted. Three primary comments for the 
community to consider.

1. The design conflicts with the Drill-on-YARN project. Is this a specific fix 
for one unique problem, or is it worth expanding the solution to work with 
Drill-on-YARN deployments? Might be hard to make the two work together later. 
See comments in docs for details.

2. Have we, by chance, looked at how other projects handle code distribution? 
Spark, Storm and others automatically deploy code across the cluster; no manual 
distribution to each node. The key difference between Drill and others is that, 
for Storm, say, code is associated with a job (“topology” in Storm terms.) But, 
in Drill, functions are global and have no obvious life cycle that suggests 
when the code can be unloaded.

3. Have considered the class loader, dependency and name space isolation issues 
addressed by such products as Tomcat (web apps) or Eclipse (plugins)? Putting 
user code in the same namespace as Drill code  is quick & dirty. It turns out, 
however, that doing so leads to problems that require long, frustrating 
debugging sessions to resolve.

Addressing item 1 might expand scope a bit. Addressing items 2 and 3 are a big 
increase in scope, so I won’t be surprised if we leave those issues for later. 
(Though, addressing item 2 might be the best way to address item 1.)

If we want a very simple solution that requires minimal change, perhaps we can 
use an even simpler solution. In the proposed design, the user still must 
distribute code to all the nodes. The primary change is to tell Drill to load 
(or unload) that code. Can accomplish the same result easier simply by having 
Drill periodically scan certain directories looking for new (or removed) jars? 
Still won’t work with YARN, or solve the name space issues, but will work for 
existing non-YARN Drill users without new SQL syntax.

Thanks,

- Paul

> On Jun 16, 2016, at 2:07 PM, Jacques Nadeau <jacq...@dremio.com> wrote:
> 
> Two quick thoughts:
> 
> - (user) In the design document I didn't see any discussion of
> ownership/conflicts or unloading. Would be helpful to see the thinking there
> - (dev) There is a row oriented facade via the
> FieldReader/FieldWriter/ComplexWriter classes. That would be a good place
> to start when trying to implement an alternative interface.
> 
> 
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
> 
> On Thu, Jun 16, 2016 at 11:32 AM, John Omernik <j...@omernik.com> wrote:
> 
>> Honestly, I don't see it as a priority issue. I think some of the ideas
>> around community java UDFs could be a better approach. I'd hate to take
>> away from other work to hack in something like this.
>> 
>> 
>> 
>> On Thu, Jun 16, 2016 at 1:19 PM, Paul Rogers <prog...@maprtech.com> wrote:
>> 
>>> Ted refers to source code transformation. Drill gains its speed from
>> value
>>> vectors. However, VVs are a far cry from the row-based interface that
>> most
>>> mere mortals are accustomed to using. Since VVs are very type specific,
>>> code is typically generated to handle the specifics of each type.
>> Accessing
>>> VVs in Jython may be a bit of a challenge because of the "impedence
>>> mismatch" between how VVs work and the row-and-column view expected by
>> most
>>> (non-Drill) developers.
>>> 
>>> I wonder if we've considered providing a row-oriented "facade" that can
>> be
>>> used by roll-your own data sources and user-defined row transforms? Might
>>> be a hiccup in the fast VV pipeline, but might be handy for users willing
>>> to trade a bit of speed for convenience. With such a facade, the Jython
>> row
>>> transforms that John mentions could be quite simple.
>>> 
>>> On Thu, Jun 16, 2016 at 10:36 AM, Ted Dunning <ted.dunn...@gmail.com>
>>> wrote:
>>> 
>>>> Since UDF's use source code transformation, using Jython would be
>>>> difficult.
>>>> 
>>>> 
>>>> 
>>>> On Thu, Jun 16, 2016 at 9:42 AM, Arina Yelchiyeva <
>>>> arina.yelchiy...@gmail.com> wrote:
>>>> 
>>>>> Hi Charles,
>>>>> 
>>>>> not that I am aware of. Proposed solution doesn't invent anything
>> new,
>>>> just
>>>>> adds possibility to add UDFs without drillbit restart. But
>>> contributions
>>>>> are welcomed.
>>>>> 
>>>>> On Thu, Jun 16, 2016 at 4:52 PM Charles Givre <cgi...@gmail.com>
>>> wrote:
>>>>> 
>>>>>> Arina,
>>>>>> Has there been any discussion about making it possible via Jython
>> or
>>>>>> something for users to write simple UDFs in Python?
>>>>>> My ideal would be to have this capability integrated in the web GUI
>>>> such
>>>>>> that a user could write their UDF (in Python) right there, submit
>> it
>>>> and
>>>>> it
>>>>>> would be deployed to Drill if it passes validation tests.
>>>>>> —C
>>>>>> 
>>>>>> 
>>>>>>> On Jun 16, 2016, at 09:34, Arina Yelchiyeva <
>>>>> arina.yelchiy...@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Hi all!
>>>>>>> 
>>>>>>> I have created Jira to allow dynamic UDFs support in Drill (
>>>>>>> https://issues.apache.org/jira/browse/DRILL-4726). There is a
>> link
>>>> to
>>>>>>> design document in Jira description.
>>>>>>> Comments or suggestions are welcomed.
>>>>>>> 
>>>>>>> Kind regards
>>>>>>> Arina
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 

Reply via email to