Omer

It can certainly be looked at in time for 1.5 but based on recent
progress/discussion 1.5 may well kick off rather soon.  It is
something someone would need to take up and that someone need not be a
committer at this point.  Your analysis of the different provenanace
repositories having no real impact is helpful though as it might
suggest the issue lies above during the generation of the events so
that should help narrow it down.

Thanks
Joe

On Mon, Dec 11, 2017 at 8:51 AM, Omer Hadari <hadari.o...@gmail.com> wrote:
> Hi, we have looked into it some more and tried using
> WriteAheadProvenanceRepository, to no real avail. Sorry for the ping, but do
> you have any other ideas? We don’t mind pursuing them and this problem is
> critical for us. Any chance this could be looked at for 1.5?
>
> Thanks!
>
> On Fri, 24 Nov 2017 at 13:19 Omer Hadari <hadari.o...@gmail.com> wrote:
>>
>> Opened an issue
>> NIFI-4638
>>
>> On Wed, 22 Nov 2017 at 20:39 Omer Hadari <hadari.o...@gmail.com> wrote:
>>>
>>> Yes we were thinking in that direction as well, that’s why I mentioned
>>> the 1ms part. I do not know how events are assigned an ordinal though, so
>>> it’s unclear to me whether the disordering is constant but is usually
>>> “hidden” since there is no rollover, or maybe there is some kind of a race
>>> condition. You’ll probably be able to answer these questions quicker than
>>> me, I’ll open a jira as soon as I get home later today.
>>>
>>> Thanks again!
>>>
>>> On Wed, 22 Nov 2017 at 20:34 Mark Payne <marka...@hotmail.com> wrote:
>>>>
>>>> Omer,
>>>>
>>>> Yes, I think that is sufficient. I think the issue is that the framework
>>>> is creating both the
>>>> ATTRIBUTES_MODIFIED and DROP events, and the generation of these objects
>>>> is
>>>> very fast. But if the timestamp happens to 'rollover' from millisecond 1
>>>> to millisecond 2,
>>>> for example, those events get different timestamps. So I think it's just
>>>> a timing thing that
>>>> will be somewhat difficult to reproduce reliably. But just a description
>>>> of the behavior that
>>>> you're experiencing should be fine.
>>>>
>>>> Thanks
>>>> -Mark
>>>>
>>>> On Nov 22, 2017, at 1:04 PM, Omer Hadari <hadari.o...@gmail.com> wrote:
>>>>
>>>> I’ll be glad to open a jira, though the problem is hardly coherent imo,
>>>> what would you like to see there? Simply “Disordering of drop events” and
>>>> the explanation I have here? Sadly I cannot provide a concrete example 
>>>> since
>>>> the problem does not reproduce.
>>>>
>>>> On Wed, 22 Nov 2017 at 18:23 Joe Witt <joe.w...@gmail.com> wrote:
>>>>>
>>>>> also - awesome find!  And glad you're at such a level with provenance
>>>>> data to catch that.  Thanks Omer!
>>>>>
>>>>> On Wed, Nov 22, 2017 at 11:21 AM, Mark Payne <marka...@hotmail.com>
>>>>> wrote:
>>>>> > Omer,
>>>>> >
>>>>> > This is likely an issue related to the order in which we generate
>>>>> > those events in the framework.
>>>>> > Do you mind filing a JIRA?
>>>>> >
>>>>> > Thanks
>>>>> > -Mark
>>>>> >
>>>>> >
>>>>> >> On Nov 22, 2017, at 10:51 AM, Omer Hadari <hadari.o...@gmail.com>
>>>>> >> wrote:
>>>>> >>
>>>>> >> Hi!
>>>>> >> We’ve been using NiFi for a while now, and we save all provenance
>>>>> >> events for logging purposes and such. We encountered an issue while 
>>>>> >> looking
>>>>> >> at lineages of some flow files, which showed drop events as if they 
>>>>> >> happened
>>>>> >> before another event, that in fact preceded it (and indeed has a lower 
>>>>> >> event
>>>>> >> ordinal).
>>>>> >>
>>>>> >> For example in a split json processor, the original FlowFile is
>>>>> >> dropped after all splits happen and are assigned fragment counts, but 
>>>>> >> still
>>>>> >> the timestamp of the drop event is earlier than the timestamp of the
>>>>> >> attributes modified event. That causes the graph to look as if the
>>>>> >> attributes modified event comes out of the drop event, which doesn’t 
>>>>> >> really
>>>>> >> make sense to us (should it?). It’s probably worth noting that the drop
>>>>> >> event ordinal is higher than the attributes modified event ordinal. 
>>>>> >> Also we
>>>>> >> noticed that
>>>>> >> 1. This only happens every once per a few thousand events.
>>>>> >> 2. This does not reproduce by replaying.
>>>>> >> 3. The drop event’s timestamp is earlier by 1ms in the cases we
>>>>> >> encountered, and the ordinal is always larger by one.
>>>>> >>
>>>>> >> This might be an error with the split json processor or a more
>>>>> >> general one. We’d love any clues or corrections to misconceptions we 
>>>>> >> might
>>>>> >> have (maybe this is not a problem and drop events can precede other 
>>>>> >> events?)
>>>>> >>
>>>>> >> Thank you!
>>>>> >
>>>>
>>>>
>

Reply via email to