1. No, it needs to be local storage.
2. That would be a question for AWS because you would need a S3 feature
that allows you to tell S3 to open a SFTP connection and manage the
transfer for you.

400GB is really not that big of a deal on a decent AWS setup, though. The
internal transfer rates between S3 and EC2 and EKS are very high.

On Fri, May 12, 2023 at 2:53 AM Kumar, Nilesh via users <
users@nifi.apache.org> wrote:

> Gentle People,
>
>
>
> I understand that we need to have enough space in content repo to transfer
> 400GB kind of CSV. I have two questions:
>
>    1. Can this space for content repository be external S3 instead of
>    extra volume and I will try to update below properties value.
>    2. Is there any way we can directly move data without holding it into
>    content repo. Disk I/O of 400GB will be less performant.
>
> I am planning to create a Persistent Volume of 500GB, mount it to my nifi
> server and configure below properties to use that mount path.
>
> nifi.content.repository.directory.default=../content_repository
>
>
>
> *From:* Mike Thomsen <mikerthom...@gmail.com>
> *Sent:* Thursday, May 11, 2023 11:17 PM
> *To:* users@nifi.apache.org
> *Subject:* [EXT] Re: Need Help in migrating Giant CSV from S3 to SFTP
>
>
>
> Nilesh,
>
>
>
> The issue is you're running out of space on the disk. Ask your devops team
> to provision a lot more space for that the partition where the content
> repository resides. Adding an extra 500GB should give you more than enough
> space to cover it and a little buffer in case you want to do something else
> with it that mutates the data.
>
>
>
> On Tue, May 9, 2023 at 2:53 PM Joe Witt <joe.w...@gmail.com> wrote:
>
> Nilesh,
>
>
>
> These processors generally are not memory sensitive as they should
> only ever have small amounts in memory at a time so it is likely this
> should work well up to 100s of GB objects and so on.  We of course dont
> really test that much but it is technically reasonable and designed as
> such.  So what would be the bottleneck?  It is exactly what Eric is
> flagging.
>
>
>
> You will need a large content repository available large enough to hold as
> much data in flight as you'll have at any one time.  It looks like you have
> single files as large as 400GB with some being 100s or 10s of GB as well
> and I'm guessing many can happen at/around one time.  So you'll need a far
> larger content repository than you're currently using.  It shows that free
> space on any single node is on average 140GB which means you have very
> little head room for what you're trying to do.  You should try to have a TB
> or more available for this kind of case (per node).
>
>
>
> You mention it fails but please provide information showing how/the logs.
>
>
>
> Also please do not use load balancing on every connection.  You want to
> use that feature selectively/by design choices.  For now - I'd avoid it
> entirely or just use it between listing and fetching.  But certainly not
> after fetching given how massive the content is that would have to be
> shuffled around.
>
>
>
> Thanks
>
>
>
> On Tue, May 9, 2023 at 9:07 AM Kumar, Nilesh via users <
> users@nifi.apache.org> wrote:
>
> Hi Eric
>
>
>
> I see following for my content Repository. Can you please help me on how
> to tweak it further. I have deployed nifi on K8s with 3 replica pod
> cluster, with no limit of resource. But I guess the pod  cpu/memory will be
> throttled by node capacity itself. I noticed that single I have one single
> file as 400GB all the load goes to any one of the node that picks up the
> transfer. I wanted to know if we can do this any other way of configuring
> the flow. If not please tell me the metrics for nifi to tweak.
>
>
>
> *From:* Eric Secules <esecu...@gmail.com>
> *Sent:* Tuesday, May 9, 2023 9:26 PM
> *To:* users@nifi.apache.org; Kumar, Nilesh <nileshkum...@deloitte.com>
> *Subject:* [EXT] Re: Need Help in migrating Giant CSV from S3 to SFTP
>
>
>
> Hi Nilesh,
>
>
>
> Check the size of your content repository. If you want to transfer a 400GB
> file through nifi, your content repository must be greater than 400GB,
> someone else might have a better idea of how much bigger you need. But
> generally it all depends on how many of these big files you want to
> transfer at the same time. You can check the content repository metrics in
> the Node Status from the hamburger menu in the top right corner of the
> canvas.
>
>
>
> -Eric
>
>
>
> On Tue., May 9, 2023, 8:42 a.m. Kumar, Nilesh via users, <
> users@nifi.apache.org> wrote:
>
> Hi Team,
>
> I want to move a very large file like 400GB from S3 to SFTP. I have used
> listS3 -> FetchS3 -> putSFTP. This works for smaller files till 30GB but
> fails for larger(100GB) files. Is there any way to configure this flow so
> that it handles very large single file. If there is any template that
> exists please share.
>
> My configuration are all standard processor configuration.
>
>
>
> Thanks,
>
> Nilesh
>
>
>
>
>
>
>
> This message (including any attachments) contains confidential information
> intended for a specific individual and purpose, and is protected by law. If
> you are not the intended recipient, you should delete this message and any
> disclosure, copying, or distribution of this message, or the taking of any
> action based on it, by you is strictly prohibited.
>
> Deloitte refers to a Deloitte member firm, one of its related entities, or
> Deloitte Touche Tohmatsu Limited ("DTTL"). Each Deloitte member firm is a
> separate legal entity and a member of DTTL. DTTL does not provide services
> to clients. Please see www.deloitte.com/about to learn more.
>
> v.E.1
>
>

Reply via email to