1. No, it needs to be local storage. 2. That would be a question for AWS because you would need a S3 feature that allows you to tell S3 to open a SFTP connection and manage the transfer for you.
400GB is really not that big of a deal on a decent AWS setup, though. The internal transfer rates between S3 and EC2 and EKS are very high. On Fri, May 12, 2023 at 2:53 AM Kumar, Nilesh via users < users@nifi.apache.org> wrote: > Gentle People, > > > > I understand that we need to have enough space in content repo to transfer > 400GB kind of CSV. I have two questions: > > 1. Can this space for content repository be external S3 instead of > extra volume and I will try to update below properties value. > 2. Is there any way we can directly move data without holding it into > content repo. Disk I/O of 400GB will be less performant. > > I am planning to create a Persistent Volume of 500GB, mount it to my nifi > server and configure below properties to use that mount path. > > nifi.content.repository.directory.default=../content_repository > > > > *From:* Mike Thomsen <mikerthom...@gmail.com> > *Sent:* Thursday, May 11, 2023 11:17 PM > *To:* users@nifi.apache.org > *Subject:* [EXT] Re: Need Help in migrating Giant CSV from S3 to SFTP > > > > Nilesh, > > > > The issue is you're running out of space on the disk. Ask your devops team > to provision a lot more space for that the partition where the content > repository resides. Adding an extra 500GB should give you more than enough > space to cover it and a little buffer in case you want to do something else > with it that mutates the data. > > > > On Tue, May 9, 2023 at 2:53 PM Joe Witt <joe.w...@gmail.com> wrote: > > Nilesh, > > > > These processors generally are not memory sensitive as they should > only ever have small amounts in memory at a time so it is likely this > should work well up to 100s of GB objects and so on. We of course dont > really test that much but it is technically reasonable and designed as > such. So what would be the bottleneck? It is exactly what Eric is > flagging. > > > > You will need a large content repository available large enough to hold as > much data in flight as you'll have at any one time. It looks like you have > single files as large as 400GB with some being 100s or 10s of GB as well > and I'm guessing many can happen at/around one time. So you'll need a far > larger content repository than you're currently using. It shows that free > space on any single node is on average 140GB which means you have very > little head room for what you're trying to do. You should try to have a TB > or more available for this kind of case (per node). > > > > You mention it fails but please provide information showing how/the logs. > > > > Also please do not use load balancing on every connection. You want to > use that feature selectively/by design choices. For now - I'd avoid it > entirely or just use it between listing and fetching. But certainly not > after fetching given how massive the content is that would have to be > shuffled around. > > > > Thanks > > > > On Tue, May 9, 2023 at 9:07 AM Kumar, Nilesh via users < > users@nifi.apache.org> wrote: > > Hi Eric > > > > I see following for my content Repository. Can you please help me on how > to tweak it further. I have deployed nifi on K8s with 3 replica pod > cluster, with no limit of resource. But I guess the pod cpu/memory will be > throttled by node capacity itself. I noticed that single I have one single > file as 400GB all the load goes to any one of the node that picks up the > transfer. I wanted to know if we can do this any other way of configuring > the flow. If not please tell me the metrics for nifi to tweak. > > > > *From:* Eric Secules <esecu...@gmail.com> > *Sent:* Tuesday, May 9, 2023 9:26 PM > *To:* users@nifi.apache.org; Kumar, Nilesh <nileshkum...@deloitte.com> > *Subject:* [EXT] Re: Need Help in migrating Giant CSV from S3 to SFTP > > > > Hi Nilesh, > > > > Check the size of your content repository. If you want to transfer a 400GB > file through nifi, your content repository must be greater than 400GB, > someone else might have a better idea of how much bigger you need. But > generally it all depends on how many of these big files you want to > transfer at the same time. You can check the content repository metrics in > the Node Status from the hamburger menu in the top right corner of the > canvas. > > > > -Eric > > > > On Tue., May 9, 2023, 8:42 a.m. Kumar, Nilesh via users, < > users@nifi.apache.org> wrote: > > Hi Team, > > I want to move a very large file like 400GB from S3 to SFTP. I have used > listS3 -> FetchS3 -> putSFTP. This works for smaller files till 30GB but > fails for larger(100GB) files. Is there any way to configure this flow so > that it handles very large single file. If there is any template that > exists please share. > > My configuration are all standard processor configuration. > > > > Thanks, > > Nilesh > > > > > > > > This message (including any attachments) contains confidential information > intended for a specific individual and purpose, and is protected by law. If > you are not the intended recipient, you should delete this message and any > disclosure, copying, or distribution of this message, or the taking of any > action based on it, by you is strictly prohibited. > > Deloitte refers to a Deloitte member firm, one of its related entities, or > Deloitte Touche Tohmatsu Limited ("DTTL"). Each Deloitte member firm is a > separate legal entity and a member of DTTL. DTTL does not provide services > to clients. Please see www.deloitte.com/about to learn more. > > v.E.1 > >