Hi Irene, regarding the mapping from data asset columns to business term, 
you have shown how it is done manually. Is it possible to automate the 
process, for example, using the fuzzy matching algorithm in the data 
quality report? Thanks!  

On Wednesday, April 8, 2020 at 3:37:51 PM UTC-4 Irene Polikoff wrote:

> Hi Fan Li,
>
> In this case, you would represent each spreadsheet as a dataset in a data 
> asset collection. Each column will become a dataset element.
>
> For 6.4, we have added an Import feature that will create this information 
> from a spreadsheet. It will perform some profiling of data to populate 
> metadata.
>
>
> If you are interested, may be we can arrange for a way for you to test it 
> and provide input. 
>
> Without this feature, you could create dataset instances for each 
> spreadsheet, then use the plain spreadsheet importer to create data 
> elements for each dataset. Creating input for the importer will require 
> manipulating your data. Take the first row of your spreadsheet that lists 
> the column names and turn it into columns.
>
> With respect to connecting the data elements from different datasets, I 
> would not use crosswalks. Crosswalks are primarily about mapping different 
> reference data, different glossaries, taxonomies, etc. You would not create 
> a separate data asset collection for each dataset. At least, I do not think 
> you would. You would most likely use a single data asset collection.
>
> Then is a question of whether you would map the similar data elements to 
> each other or whether you would map all of them to a common business term. 
> For example, you may have different data elements (spreadsheet columns) 
> capturing gender information. It would make sense to create a business term 
> Gender and map all of them to it - in the hub and spoke type of approach. 
> EDG has some capabilities suggesting such mappings based on the available 
> data and rules about a business term. And, yes, SHACL is used for this.
>
> Mapping one element to another make sense if you are trying to capture 
> lineage e.g., data from one dataset is used for another dataset and your 
> goal is to capture this.
>
> Coming back to your question on “giving domain experts a visual tool to 
> create data mapping”, I would probably organize the editor UI for data 
> assets to display data elements and drag and drop to map. I am showing an 
> example in the screenshot below. My first panel contains business terms. My 
> second panel contains data elements. I use it to select a data element to 
> be shown on a form. Then, I could drag and drop from the business term 
> table the relevant term to map the data element two. If you were doing 
> mapping between data elements, you could have in the first panel data 
> elements from one dataset and in another panel data elements from another 
> dataset. 
>
>
> Alternatively or additionally, you could also do batch editing. For 
> example, you could select data elements from different datasets that 
> represent let’s say gender and batch edit all of them in one step to 
> connect them to the same business term - as opposed to doing one by one 
> mapping. There are various ways to accomplish this. For example, you could 
> use the Asset List panel. You could drag and drop different data elements 
> (using search to find them) into a list in order to assemble everything you 
> want to edit as a group. Then select all of them for editing. If you are 
> familiar with Basket in TBC, Asset Lists are similar to TBC baskets, but 
> they are collaborative. Users can name them and store them on the server to 
> share with other users for collaborative work and discussion.
>
> Hopefully, this gives you some useful information.
>
> Regards,
>
> Irene
>
> On Apr 7, 2020, at 12:12 PM, Fan Li <lifa...@gmail.com> wrote:
>
> Hi Irene,
>
> Each spreadsheet represents data we received from a different customer. I 
> would like to capture its metadata/descriptors such as column names, data 
> types, number of records etc in EDG. As customers use slightly different 
> terminologies, I also need to map the column names to a single schema so I 
> can merge the data for reporting purpose.
>
>
> On Tuesday, April 7, 2020 at 10:13:21 AM UTC-4, Irene Polikoff wrote:
> Hi Fan Li.
>
> On Apr 7, 2020, at 7:51 AM, Fan Li <lifa...@gmail.com> wrote:
>
> I should have added that the immediate objectives are:
> • Give domain experts a visual tool to create data mapping
> • Use SHACL to describe & validate the harmonized data structure
>
>
> On Tuesday, April 7, 2020 at 7:45:22 AM UTC-4, Fan Li wrote:
> Hi TopBraid Community,
>
> I have a use case where I need to map data sources (spreadsheets) of 
> different formats into a single schema. I was wondering how I should use 
> EDG on the data modeling aspect of this task.
> • Should I use "Data Assets" to model each data source?
>
> What kind of information are you planning to import into EDG? Is it some 
> data in spreadsheets e.g., the actual information about lets say products 
> or companies? Or do these spreadsheets contain information about data 
> sources e.g., what datasets you have, what are the fields in each dataset, 
> how many records in each dataset, etc.?
>
> It would be useful if you could provide an example.
>
> • Should I use "Crosswalks" for schema mapping?
> • Is there a concrete example I can follow?
> Any guidance is appreciated!
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "TopBraid Suite Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to topbrai...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/topbraid-users/80a1e323-0d3e-4a1d-bb8b-33898253242a%40googlegroups.com
> .
>
>
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "TopBraid Suite Users" group.
>
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to topbraid-user...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/topbraid-users/358116ab-b2eb-4a2f-be3f-213c77253725%40googlegroups.com
> .
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to topbraid-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/topbraid-users/38588ec3-ba26-40ed-b552-2db07474aa8bn%40googlegroups.com.

Reply via email to