Hello,
I've got ~15 million FlowFiles, each roughly 4KB, totally in about 55GB of
data on my canvas.

However, the content repository (on it's own partition) is completely full
with 350GB of data.  I'm pretty certain the way Content Claims store the
data is responsible for this.  In previous experience, we've had files that
are larger, and haven't seen this as much.

My guess is that as data was streaming through and being added to a claim,
it isn't always released as the small files leaves the canvas.

We've run into this issue enough times that I figure there's probably a
"best practice for small files" for the content claims settings.

These are our current settings:
nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
nifi.content.claim.max.appendable.size=1 MB
nifi.content.claim.max.flow.files=100
nifi.content.repository.directory.default=/var/nifi/repositories/content
nifi.content.repository.archive.max.retention.period=12 hours
nifi.content.repository.archive.max.usage.percentage=50%
nifi.content.repository.archive.enabled=true
nifi.content.repository.always.sync=false

https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#content-repository


There's 1024 folders on the disk (0-1023) for the Content Claims.
Each file inside the folders are roughly  2MB to 8 MB (Which is odd because
I thought the max appendable size would make this no larger than 1MB.)

Is there a way to expand the number of folders and/or reduce the amount of
individual FlowFiles that are stored in the claims?

I'm hoping there might be a best practice out there though.

Thanks,
Ryan

Reply via email to