Hi Holger

>. I can see how a manual process (like a spreadsheet import wizard) could help 
>setting the datatypes, and then you could define a target ontology into which 
>the text files will be imported (if you add a boolean property 
>sml:importToInputGraph and set it to true at your sml:SoncertSpreadsheetToRDF 
>module, then it will try to reuse existing properties with matching column 
>names). We could also improve the Semantic Tables back-end to use column index 
>of existing properties, similar to the Semantic XML back-end.

This would be great. (see below as well)

> But, another approach would be to spice up the original files and convert 
> them to Excel, with explicit datatypes set.

I haven't tried the Excel import yet, I should give it a go. Most of
the files I am receiving are in delimited format. I could convert them
to Excel and set the column types but in several cases I have many
10's of files to import (all in the same format).  I also have the
problem that some files will be longer than 65K rows and won't fit
into 1 file. I'm not sure which is the lesser of evils... converting
each one to an Excel file or just writing a SM script to handle the
data types. I'm currently working on the later, but I will give the
Excel Import a go, just so I have increased my knowledge of how
different modules work.

>
> So, how do other tools do that? I think your use case is very relevant for 
> many users, and we have to figure out the best way of addressing it (balanced 
> with our usual resource limitations of course - we cannot do everything!).

Now there's a question:). I'm not sure if you mean SW tools or any
tool.

Most of the general pipe-lining tools work a bit like Excel in that a
wizard dynamically looks at the first x rows and makes a guess at the
datatype and uses it in the import. This can lead to mistakes of
course.

Some, SW tools and this is my preferred option, match the column names
with properties in a supplied ontology and use the property range
described in the ontology. If the SM component did this it would be a
great help.
Others take this a bit further and allow you to create a sort of
mapping between the columns and an ontology, and save this as a
template.

I have written (badly) a component in a pipelining tool that allows
you specify the namespaces, properties and classes that you want to
create based on columns, Including the network relationships through
object properties. (i.e. It doesn't just create one class). It takes
text input and gives RDF.
It does everything in one pass and I've created 50Gb of RDF data in
one pass. This is my cruncher tool, purely designed to get the best
response from little user input but its ugly to use.


Cheers

Phil

--

You received this message because you are subscribed to the Google Groups 
"TopBraid Composer Users" group.
To post to this group, send email to topbraid-composer-us...@googlegroups.com.
To unsubscribe from this group, send email to 
topbraid-composer-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/topbraid-composer-users?hl=en.


Reply via email to