Gentle People,

I understand that we need to have enough space in content repo to transfer 
400GB kind of CSV. I have two questions:

  1.  Can this space for content repository be external S3 instead of extra 
volume and I will try to update below properties value.
  2.  Is there any way we can directly move data without holding it into 
content repo. Disk I/O of 400GB will be less performant.
I am planning to create a Persistent Volume of 500GB, mount it to my nifi 
server and configure below properties to use that mount path.
nifi.content.repository.directory.default=../content_repository

From: Mike Thomsen <mikerthom...@gmail.com>
Sent: Thursday, May 11, 2023 11:17 PM
To: users@nifi.apache.org
Subject: [EXT] Re: Need Help in migrating Giant CSV from S3 to SFTP

Nilesh,

The issue is you're running out of space on the disk. Ask your devops team to 
provision a lot more space for that the partition where the content repository 
resides. Adding an extra 500GB should give you more than enough space to cover 
it and a little buffer in case you want to do something else with it that 
mutates the data.

On Tue, May 9, 2023 at 2:53 PM Joe Witt 
<joe.w...@gmail.com<mailto:joe.w...@gmail.com>> wrote:
Nilesh,

These processors generally are not memory sensitive as they should only ever 
have small amounts in memory at a time so it is likely this should work well up 
to 100s of GB objects and so on.  We of course dont really test that much but 
it is technically reasonable and designed as such.  So what would be the 
bottleneck?  It is exactly what Eric is flagging.

You will need a large content repository available large enough to hold as much 
data in flight as you'll have at any one time.  It looks like you have single 
files as large as 400GB with some being 100s or 10s of GB as well and I'm 
guessing many can happen at/around one time.  So you'll need a far larger 
content repository than you're currently using.  It shows that free space on 
any single node is on average 140GB which means you have very little head room 
for what you're trying to do.  You should try to have a TB or more available 
for this kind of case (per node).

You mention it fails but please provide information showing how/the logs.

Also please do not use load balancing on every connection.  You want to use 
that feature selectively/by design choices.  For now - I'd avoid it entirely or 
just use it between listing and fetching.  But certainly not after fetching 
given how massive the content is that would have to be shuffled around.

Thanks

On Tue, May 9, 2023 at 9:07 AM Kumar, Nilesh via users 
<users@nifi.apache.org<mailto:users@nifi.apache.org>> wrote:
Hi Eric

I see following for my content Repository. Can you please help me on how to 
tweak it further. I have deployed nifi on K8s with 3 replica pod cluster, with 
no limit of resource. But I guess the pod  cpu/memory will be throttled by node 
capacity itself. I noticed that single I have one single file as 400GB all the 
load goes to any one of the node that picks up the transfer. I wanted to know 
if we can do this any other way of configuring the flow. If not please tell me 
the metrics for nifi to tweak.
[cid:image001.png@01D984CB.DE2ABA70]

From: Eric Secules <esecu...@gmail.com<mailto:esecu...@gmail.com>>
Sent: Tuesday, May 9, 2023 9:26 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>; Kumar, Nilesh 
<nileshkum...@deloitte.com<mailto:nileshkum...@deloitte.com>>
Subject: [EXT] Re: Need Help in migrating Giant CSV from S3 to SFTP

Hi Nilesh,

Check the size of your content repository. If you want to transfer a 400GB file 
through nifi, your content repository must be greater than 400GB, someone else 
might have a better idea of how much bigger you need. But generally it all 
depends on how many of these big files you want to transfer at the same time. 
You can check the content repository metrics in the Node Status from the 
hamburger menu in the top right corner of the canvas.

-Eric

On Tue., May 9, 2023, 8:42 a.m. Kumar, Nilesh via users, 
<users@nifi.apache.org<mailto:users@nifi.apache.org>> wrote:

Hi Team,

I want to move a very large file like 400GB from S3 to SFTP. I have used listS3 
-> FetchS3 -> putSFTP. This works for smaller files till 30GB but fails for 
larger(100GB) files. Is there any way to configure this flow so that it handles 
very large single file. If there is any template that exists please share.

My configuration are all standard processor configuration.



Thanks,

Nilesh




This message (including any attachments) contains confidential information 
intended for a specific individual and purpose, and is protected by law. If you 
are not the intended recipient, you should delete this message and any 
disclosure, copying, or distribution of this message, or the taking of any 
action based on it, by you is strictly prohibited.

Deloitte refers to a Deloitte member firm, one of its related entities, or 
Deloitte Touche Tohmatsu Limited ("DTTL"). Each Deloitte member firm is a 
separate legal entity and a member of DTTL. DTTL does not provide services to 
clients. Please see www.deloitte.com/about<http://www.deloitte.com/about> to 
learn more.

v.E.1

Reply via email to