Thank you both very much, Bryan and Mike. Mike, had you considered the
approach mentioned by Bryan - a Reader processor to infer schema  -  and
found it wasn't suitable for your use case, for some reason? For instance,
perhaps you were employing a version of Apache NiFi that did not afford
access to a CsvReader or InferAvroSchema processor?
Jim

On Thu, Apr 6, 2023 at 9:30 AM Mike Sofen <[email protected]> wrote:

> Hi James,
>
>
>
> I don’t have time to go into details, but I had nearly the same scenario
> and solved it by using Nifi as the file processing piece only, sending
> valid CSV files (valid as in CSV formatting) and leveraged Postgres to land
> the CSV data into pre-built staging tables and from there did content
> validations and packaging into jsonb for storage into a single target
> table.
>
>
>
> In my case, an external file source had to “register” a single file (to
> allow creating the matching staging table) prior to sending data.  I used
> Nifi for that pre-staging step to derive the schema for the staging table
> for a file and I used a complex stored procedure to handle a massive amount
> of logic around the contents of a file when processing the actual files
> prior to storing into the destination table.
>
>
>
> Nifi was VERY fast and efficient in this, as was Postgres.
>
>
>
> Mike Sofen
>
>
>
> *From:* James McMahon <[email protected]>
> *Sent:* Thursday, April 06, 2023 4:35 AM
> *To:* users <[email protected]>
> *Subject:* Handling CSVs dynamically with NiFi
>
>
>
> We have a task requiring that we transform incoming CSV files to JSON. The
> CSVs vary in schema.
>
>
>
> There are a number of interesting flow examples out there illustrating how
> one can set up a flow to handle the case where the CSV schema is well known
> and fixed, but none for the generalized case.
>
>
>
> The structure of the incoming CSV files will not be known in advance in
> our use case. Our nifi flow must be generalized because I cannot configure
> and rely on a service that defines a specific fixed Avro schema registry.
> An Avro schema registry seems to presume an awareness of the CSV
> structure in advance. We don't have that luxury in this use case, with CSVs
> arriving from many different providers and so characterized by schemas that
> are unknown.
>
>
>
> What is the best way to get around this challenge? Does anyone know of an
> example where NiFi builds the schema on the fly as CSVs arrive for
> processing, dynamically defining the Avro schema for the CSV?
>
>
>
> Thanks in advance for any thoughts.
>

Reply via email to