Thank you for the advice Diego!

We had come across this type of multi-schema text input/output capability in 
Talend and I was hoping we could create our own plugins to accomplish something 
similar here.

From: Diego Mainou <[email protected]>
Sent: Monday, March 27, 2023 3:44 PM
To: users <[email protected]>; Austin, Justin 
<[email protected]>
Subject: Re: Custom plugin - multi-schema text input


[EXTERNAL EMAIL]
Hi Justin,

It seems to me that you are wanting to do too many things with one step and 
that you will struggle to find a piece of software cheap or  expensive that  
does what you are describing in one step.

ETL tools are good but they are not magical even ai needs to be trained.

Best practice is to separate acquisition from business logic.
So my recommendation would be to grab those files and acquire them in their 
native state + governance (e.g. a load id) before you do anything to them.

Further, because you are dealing with many files of  distinct nature you may 
wish to segregate the "acquisition" from the loading
E.g. by creating:

  *   A generic and reusable component that 'copies/moves' the files from 
wherever they are located into your landing zone.
  *   A bespoke component that acquires either a specific file or a specific 
file types e.g. JSON and outputs to a generic format. E.g. a serialised file
  *   A generic and reusable component that grabs files of the generic format 
and loads into a table containing the raw data plus governance.
The above will result in files from all walks of life being loaded into your 
staging database in their raw state. This is very important for governance 
purposes.

Potentially your next step is to create a generic and reusable component that 
utilises metadata injection to parse JSON into columns + governance.
Rinse and repeat for xml, csv, etc.

The next step being the mapping of your data and your dimensions. Once you have 
your sk's you can the drop the values that were used to map those sk's. etc, 
etc etc.

Diego


[Image removed by 
sender.]<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.bizcubed.com.au%2F&data=05%7C01%7Cjustin.austin%40venturesolutions.com%7Cd55619e0740d4e106a1908db2f14bcf9%7C335a532847a0444489f8552b2e6caeea%7C0%7C0%7C638155538351967357%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Nj%2Bv9hwEsTjnYrocT%2FcfSfWOwqZwnA3PWxDEpkCefPk%3D&reserved=0>
Diego Mainou
Product Manager
M. +61 415 152 091
E. [email protected]<mailto:[email protected]>
www.bizcubed.com.au<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.bizcubed.com.au%2F&data=05%7C01%7Cjustin.austin%40venturesolutions.com%7Cd55619e0740d4e106a1908db2f14bcf9%7C335a532847a0444489f8552b2e6caeea%7C0%7C0%7C638155538351967357%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Nj%2Bv9hwEsTjnYrocT%2FcfSfWOwqZwnA3PWxDEpkCefPk%3D&reserved=0>

[Image removed by 
sender.]<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.bizcubed.com.au%2F&data=05%7C01%7Cjustin.austin%40venturesolutions.com%7Cd55619e0740d4e106a1908db2f14bcf9%7C335a532847a0444489f8552b2e6caeea%7C0%7C0%7C638155538351967357%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Nj%2Bv9hwEsTjnYrocT%2FcfSfWOwqZwnA3PWxDEpkCefPk%3D&reserved=0>

________________________________
From: "Austin, Justin via users" <[email protected]>
To: "users" <[email protected]>
Sent: Tuesday, 28 March, 2023 1:41:06 AM
Subject: Custom plugin - multi-schema text input

Hi Hop users,

We're evaluating whether HOP is the right tool to solve a common problem for 
our business.

We encounter hundreds of different file formats containing similar layers of 
one-to-many hierarchy (simplified example below).  Getting this to work using 
out-of-box inputs/outputs and transform components results in a 
complex/convoluted set of workflows & pipelines.  Since we run into this so 
often, we would like to develop a plugin with a custom "input" component that 
reads the input file, inserts some ID fields for relationships, and exposes 
multiple output rowsets (one for each schema/row type) that can be mapped to 
separate downstream transformations. Eventually we'd like to make another 
custom "output" component that can accept multiple inputs to load them where we 
need them with hierarchy preserved (JSON, relational DB, etc.).

After reviewing the plugin documentation and samples, I'm still not sure 
whether this is possible.  It seems that the relevant plugin base classes 
assume there will always be a single schema (IRowMeta) and single rowset shared 
by all input and output connections/hops.  I believe we would require a single 
"transform" to have multiple IRowMeta and multiple rowsets and the ability to 
select a specific one for any given hop to a downstream transform/component.

Is there a good path to accomplishing this with a HOP plugin?  Or perhaps a 
better approach to the problem with existing Hop features?

Thanks!

Example file:
REC|Jane Smith|03-20-2003
ADDR|123 Main Street|Apartment 321|Anytown|US|55555
ACT|987654321|$4321.56|02-01-2023|03-02-2023
DTL|debit|$23.45|02-05-2023
DTL|debit|$143.20|02-13-2023
DTL|credit|$652.02|02-14-2023
DTL|debit|$8.78|02-28-2023
ACT|56789123|$7894.56|02-01-2023|03-02-2023
DTL|credit|$0.28|02-14-2023
REC|John Jacobs|03-20-2003
ADDR|876 Big Avenue||Anywhere|US|55556
ACT|5632178|$2256.79|02-01-2023|03-02-2023
DTL|credit|$0.02|02-14-2023



Reply via email to