Very appreicate for your helpl. It's very helpful. :)
发件人: Jeff <jtsw...@gmail.com> 收件人: dev@nifi.apache.org 日期: 2017/11/20 22:07 主题: Re: How to delete the data in the flowfile? Hello Boying, Once flowfiles have completed processing, they may still be archived within the content repository for a certain period of time before they age-off. In the NiFi Admin guide, there is a section on Content Repository properties [1] you can set in nifi.properties, through which you can tweak how much space is used to archive, how long flowfiles are archived, or to disable archiving completely. Lowering the "nifi.content.repository.archive.max.retention.period" and "nifi.content.repository.archive.max.usage.percentage" properties can help limit the amount of disk space the content repository uses for archived flowfiles. You can disable content archiving by setting "nifi.content.repository.archive.enabled" to false if you prefer to have no archive at all. If your flow uses a processor like PutFile to place a flowfile in a temporary directory to do further processing on it, or to allow "backups" of the flowfile for various stages of processing, then your flow must be designed to clean up those files after they are no longer needed. There are several ways to do this, one of them being Wait/Notify processors. There's a blog that Koji has written [2] with some examples on how to use the Wait and Notify processors, and the concepts covered in the blog should be usable in your case where you might want to use the Wait/Notify processors to signal that flowfiles that are no longer needed that have been explicitly archived/copied by processors like "PutFile" can be removed. Please let me know if neither of these solutions help with disk space issues while using your flow. If you provide your flow as an example, we can take a look at other ways to try to minimize disk usage. [1] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#file-system-content-repository-properties [2] http://ijokarumawak.github.io/nifi/2017/02/02/nifi-notify-batch/#alternative-solution-waitnotify On Mon, Nov 20, 2017 at 3:16 AM <l...@china-inv.cn> wrote: > Hi, All, > > We use NiFi to import data from Oracle database to Hive. > > The first step is to extract all data from the Oracle database and persist > it into the flowfile > which will then 'flow' into other processors to do further processing. > > After persisting the data into the Hive, we found that the data persisted > in the first step were not > deteled. This will occupied a lot of disk spaces. > > So is there any way to tell NiFi to delete those data after the next > processor has finished reading the data? > > Thanks > > Boying > > > > > 本邮件内容包含保密信息。如阁下并非拟发送的收件人,请您不要阅读、保存、对 外 > 披露或复制本邮件的任何内容,或者打开本邮件的任何附件。请即回复邮件告知发 件 > 人,并立刻将该邮件及其附件从您的电脑系统中全部删除,不胜感激。 > > > This email message may contain confidential and/or privileged information. > If you are not the intended recipient, please do not read, save, forward, > disclose or copy the contents of this email or open any file attached to > this email. We will be grateful if you could advise the sender immediately > by replying this email, and delete this email and any attachment or links > to this email completely and immediately from your computer system. > > > > 本邮件内容包含保密信息。如阁下并非拟发送的收件人,请您不要阅读、保存、对外 披露或复制本邮件的任何内容,或者打开本邮件的任何附件。请即回复邮件告知发件 人,并立刻将该邮件及其附件从您的电脑系统中全部删除,不胜感激。 This email message may contain confidential and/or privileged information. If you are not the intended recipient, please do not read, save, forward, disclose or copy the contents of this email or open any file attached to this email. We will be grateful if you could advise the sender immediately by replying this email, and delete this email and any attachment or links to this email completely and immediately from your computer system.