Hi Prabhu,

In that case, yes, as your assumption, even the latest archive exceeds
500MB, the latest archive is saved, as long as it was written to disk
successfully.

After that, when user updates NiFi flow, before new archive is
created, the previous one will be removed, because max.storage
exceeds. Then the latest will be archived.

Let's simulate the scenario with the to-be-updated logic by NIFI-3373,
in which the size of flow.xml keeps increasing:

# CASE-1

archive.max.storage=10MB
archive.max.count = 5

Time | flow.xml | archives | archive total |
t1 | f1 5MB  | f1 | 5MB
t2 | f2 5MB  | f1, f2 | 10MB
t3 | f3 5MB  | f1, f2, f3 | 15MB
t4 | f4 10MB | f2, f3, f4 | 20MB
t5 | f5 15MB | f4, f5 | 25MB
t6 | f6 20MB | f6 | 20MB
t7 | f7 25MB | t7 | 25MB

* t3: f3 can is archived even total exceeds 10MB. Because f1 + f2 <=
10MB. WAR message starts to be logged from this point, because total
archive size > 10MB.
* t4: The oldest f1 is removed, because f1 + f2 + f3 > 10MB.
* t5: Even if flow.xml size exceeds max.storage, the latest archive is
created. f4 are kept because f4 <= 10MB.
* t6: f4 and f5 are removed because f4 + f5 > 10MB, and also f5 > 10MB.

In this case, NiFi will keep logging WAR (or should be ERR??) message
indicating archive storage size is exceeding limit, from t3.
After t6, even if archive.max.count = 5, NiFi will only keep the
latest flow.xml.

# CASE-2

If you'd like to keep at least 5 archives no matter what, then set
blank max.storage and max.time.

archive.max.storage=
archive.max.time=
archive.max.count = 5 // Only limit archives by count

Time | flow.xml | archives | archive total |
t1 | f1 5MB  | f1 | 5MB
t2 | f2 5MB  | f1, f2 | 10MB
t3 | f3 5MB  | f1, f2, f3 | 15MB
t4 | f4 10MB | f1, f2, f3, f4 | 25MB
t5 | f5 15MB | f1, f2, f3, f4, f5 | 40MB
t6 | f6 20MB | f2, f3, f4, f5, f6 | 55MB
t7 | f7 25MB | f3, f4, f5, f6, (f7) | 50MB, (75MB)
t8 | f8 30MB | f3, f4, f5, f6 | 50MB

* From t6, oldest archive is removed to keep number of archives <= 5
* At t7, if the disk has only 60MB space, f7 won't be archived. And
after this point, archive mechanism stop working (Trying to create new
archive, but keep getting exception: no space left on device).

In either case above, once flow.xml has grown to that size, some human
intervention would be needed.
Do those simulation look reasonable?

Thanks,
Koji

On Thu, Jan 19, 2017 at 5:48 PM, prabhu Mahendran
<prabhuu161...@gmail.com> wrote:
> Hi Koji,
>
> Thanks for your information.
>
> Actually the task description looks fine. I have one question here, consider
> the storage limit is 500MB, suppose my latest workflow exceeds this limit,
> which behavior is performed with respect to the properties(max.count,
> max.time and max.storage)?? In my assumption latest archive is saved even it
> exceeds 500MB, so what happen from here? Either it will keep on save the
> single latest archive with the large size or it will notify the user to
> increase the size and preserves the latest file till we restart the flow??
> If so what happens if the size is keep on increasing with respect to 500MB,
> it will save archive based on count or only latest archive throughtout nifi
> is in running status??
>
> Many thanks
>
> On Thu, Jan 19, 2017 at 12:47 PM, Koji Kawamura <ijokaruma...@gmail.com>
> wrote:
>>
>> Hi Prabhu,
>>
>> Thank you for the suggestion.
>>
>> Keeping latest N archives is nice, it's simple :)
>>
>> The max.time and max.storage have other benefit and since already
>> released, we should keep existing behavior with these settings, too.
>> I've created a JIRA to add archive.max.count property.
>> https://issues.apache.org/jira/browse/NIFI-3373
>>
>> Thanks,
>> Koji
>>
>> On Thu, Jan 19, 2017 at 2:21 PM, prabhu Mahendran
>> <prabhuu161...@gmail.com> wrote:
>> > Hi Koji,
>> >
>> >
>> > Thanks for your reply,
>> >
>> > Yes. Solution B may meet as I required. Currently if the storage size
>> > meets,
>> > complete folder is getting deleted and the new flow is not tracked in
>> > the
>> > archive folder. This behavior is the drawback here. I need atleast last
>> > workflow to be saved in the archive folder and notify the user to
>> > increase
>> > the size. At the same time till nifi restarts, atleast last complete
>> > workflow should be backed up.
>> >
>> >
>> > My another suggestion is as follows:
>> >
>> >
>> > Regardless of the max.time and max.storage property, Can we have only
>> > few
>> > files in archive(consider only 10 files). Each action from the nifi
>> > canvas
>> > should be tracked here, if the flow.xml.gz archive files count reaches
>> > it
>> > should delete the old first file and save the latest file, so that the
>> > count
>> > 10 is maintained. Here we can maintain the workflow properly and backup
>> > is
>> > also achieved without confusing with max.time and max.storage. Only case
>> > is
>> > that the disk size exceeds, we should notify user about this.
>> >
>> >
>> > Many thanks.
>> >
>> >
>> > On Thu, Jan 19, 2017 at 6:36 AM, Koji Kawamura <ijokaruma...@gmail.com>
>> > wrote:
>> >>
>> >> Hi Prabhu,
>> >>
>> >> Thanks for sharing your experience with flow file archiving.
>> >> The case that a single flow.xml.gz file size exceeds
>> >> archive.max.storage was not considered well when I implemented
>> >> NIFI-2145.
>> >>
>> >> By looking at the code, it currently works as follows:
>> >> 1. The original conf/flow.xml.gz (> 1MB) is archived to conf/archive
>> >> 2. NiFi checks if there's any expired archive files, and delete it if
>> >> any
>> >> 3. NiFi checks if the total size of all archived files, then delete
>> >> the oldest archive. Keep doing this until the total size becomes less
>> >> than or equal to the configured archive.max.storage.
>> >>
>> >> In your case, at step 3, the newly created archive is deleted, because
>> >> its size was grater than archive.max.storage.
>> >> In this case, NiFi only logs INFO level message, and it's hard to know
>> >> what happened from user, as you reported.
>> >>
>> >> I'm going to create a JIRA for this, and fix current behavior by
>> >> either one of following solutions:
>> >>
>> >> A. treat archive.max.storage as a HARD limit. If the original
>> >> flow.xml.gz exceeds configured archive.max.storage in size, then throw
>> >> an IOException, which results a WAR level log message "Unable to
>> >> archive flow configuration as requested due to ...".
>> >>
>> >> B. treat archive.max.storage as a SOFT limit. By not including the
>> >> newly created archive file at the step 2 and 3 above, so that it can
>> >> stay there. Maybe a WAR level log message should be logged.
>> >>
>> >> For greater user experience, I'd prefer solution B, so that it can be
>> >> archived even the flow.xml.gz exceeds archive storage size, since it
>> >> was able to be written to disk, which means the physical disk had
>> >> enough space.
>> >>
>> >> How do you think?
>> >>
>> >> Thanks!
>> >> Koji
>> >>
>> >> On Wed, Jan 18, 2017 at 3:27 PM, prabhu Mahendran
>> >> <prabhuu161...@gmail.com> wrote:
>> >> > i have check below properties used for the backup operations in
>> >> > Nifi-1.0.0
>> >> > with respect to JIRA.
>> >> >
>> >> > https://issues.apache.org/jira/browse/NIFI-2145
>> >> >
>> >> > nifi.flow.configuration.archive.max.time=1 hours
>> >> > nifi.flow.configuration.archive.max.storage=1 MB
>> >> >
>> >> > Since we have two backup operations first one is "conf/flow.xml.gz"
>> >> > and
>> >> > "conf/archive/flow.xml.gz"
>> >> >
>> >> > I have saved archive workflows(conf/archive/flow.xml.gz) as per hours
>> >> > in
>> >> > "max.time" property.
>> >> >
>> >> > At particular time i have reached "1 MB"[set as size of default
>> >> > storage].
>> >> >
>> >> > So it will delete existing conf/archive/flow.xml.gz completely and
>> >> > doesn't
>> >> > write new flow files in conf/archive/flow.xml.gz due to size exceeds.
>> >> >
>> >> > No logs has shows that new flow.xml.gz has higher size than specified
>> >> > storage.
>> >> >
>> >> > Can we able to
>> >> >
>> >> > Why it could delete existing flows and doesn't write new flows due to
>> >> > storage?
>> >> >
>> >> > In this case in one backup operation has failed or not?
>> >> >
>> >> > Thanks,
>> >> >
>> >> > prabhu
>> >
>> >
>
>

Reply via email to