Interesting, so the 256 limit is on output, not on input?

Is Drill-6607 enough to track this?  If so, i have one more "feature" to
add to it, not sure if I should include it on 6607 or create a new JIRA and
link them. Basically, I'd like the ability to pass an int to the function.
(Does Java have optional arguments? It must because of substr(data, start)
and substr(data, start, nochars))  Basically string_binary(data) works as
intended (with the limitation fixed) and string_binary(data, 1) would work
on the binary, but replace EVERY character with the hex representation.

And optional third would be to do a format string of some sort so the user
could pick output, but I like the idea of having every character as hex for
analysis.

John

On Tue, Jul 17, 2018 at 3:40 PM, Vlad Rozov <vro...@apache.org> wrote:

> A. +1.
>
> B. Every byte in a binary data may require up to 4 bytes (0xXX) in the
> string representation, so 80 may work, 60 should reliably work.
>
> Thank you,
>
> Vlad
>
>
> On 7/17/18 13:14, John Omernik wrote:
>
>> Yet this works?....
>>
>> string_binary(byte_substr(`data`, 1, 80))
>>
>> On Tue, Jul 17, 2018 at 3:12 PM, John Omernik <j...@omernik.com> wrote:
>>
>> So on B. I found the BYTE_SUBSTR and only send 200 bytes to the
>>> string_binary function, I still get an error.  Something else is
>>> happening
>>> here...
>>>
>>> select `type`, `timestamp`, `src_ip`, `dst_ip`, `src_port`, `dst_port`,
>>> string_binary(byte_substr(`data`, 1, 200)) as mydata from
>>> `user/jomernik/bf2_7306.pcap` limit 10
>>>
>>> I get the same
>>>
>>> Error Id: 213075e7-378a-437f-a5dc-408326f123f3 on
>>> zeta3.brewingintel.com:20005]
>>>
>>> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
>>> IndexOutOfBoundsException: index: 0, length: 379 (expected: range(0,
>>> 256))
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jul 17, 2018 at 12:56 PM, John Omernik <j...@omernik.com> wrote:
>>>
>>> Thanks Vlad a couple of thoughts.
>>>>
>>>>
>>>> A. I think that should be fixed. That seems like a limitation that is
>>>> both unexpected and undocumented.
>>>>
>>>> B.  Is there a way, if my data in the table is returned as binary to
>>>> start with, for me to return the first 256 bytes? I tried substring, and
>>>> tries to force to UTF-8 and I am getting some issues there.
>>>>
>>>> On Tue, Jul 17, 2018 at 10:33 AM, Vlad Rozov <vro...@apache.org> wrote:
>>>>
>>>> In case of DRILL-6607 the issue lies in the implementation of
>>>>> "string_binary" function: it is not prepared to handle incoming data
>>>>> that
>>>>> when converted to a binary string would exceed 256 bytes as it does not
>>>>> reallocate the output buffer. Until the function code is fixed, the
>>>>> only
>>>>> way to avoid the error is either not to use "string_binary" or to use
>>>>> it
>>>>> with the data that meets "string_binary" limitation.
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Vlad
>>>>>
>>>>>
>>>>> On 7/13/18 14:01, Ted Dunning wrote:
>>>>>
>>>>> There are bounds for acceptable behavior for a function like this.
>>>>>> Array
>>>>>> index out of bounds is not acceptable. Aborting with a clean message
>>>>>> about
>>>>>> to true problem might be fine, as would be to return a null.
>>>>>>
>>>>>> On Fri, Jul 13, 2018, 13:46 John Omernik <j...@omernik.com> wrote:
>>>>>>
>>>>>> So, as to the actual problem, I opened a JIRA here:
>>>>>>
>>>>>>> https://issues.apache.org/jira/browse/DRILL-6607
>>>>>>>
>>>>>>> The reason I brought this here is my own curiosity:  Does an issue in
>>>>>>> using
>>>>>>> this function most likely lie in the function code itself not
>>>>>>> handling
>>>>>>> good
>>>>>>> data, or is the issue in the pcap pluglin which produces the data for
>>>>>>> this
>>>>>>> function to consume, I am just curious on how something like this
>>>>>>> could be
>>>>>>> avoided.
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>>
>>>>>>>
>

Reply via email to