Hi folks

I'm having an issue importing/converting a local html file using sparql 
motion scripts as opposed to a web file at a URL. A web html file works 
fine in my tests.

I can use the importXHTML module on a web URL fine  e.g.  
wwww.examplesite.com/htmlfile
But if I try point it at a local file it fails. I've tried the following 
http protocol without success. e.g. file:///fileLocation/htmlfile  
 (specifiying .html file type makes no difference and also with or without 
.html type added on disc). I also tried 
file://localhost/fileLocation/htmlfile with no success.

I also tried converting the html file to xhtml using oxgenXML but this made 
no change.

is there some aspect of tidy function or something else at play that either 
I'm missing or can't control for a local file ?

As a related question on debugging this. Is it possible to see more info 
anywhere about these modules other than the basic info in TBCME help and at 
the SPIN vocab files which are only of limited help ? e.g. more details on 
the underlying classes and signatures ? 


I then tried using another route such as the convertXMLtoRDF module.  The 
usage note for the module says that the smlxmlType can be set to XHTML so 
that it "treats input as html source". see ref below. 


" sml:xml: The XML document that shall be converted to RDF. To avoid 
character encoding issues, we strongly recommend this value to be a 
reference to an already parsed XML document, and not a literal. In other 
words, use "Add SPARQL expression" from the drop down menu and enter 
?varName and do not use a string value such as {?varName}. The actual 
document parsing should be handled by predecessing modules such as 
sml:ImportXMLFromURL.
 

sml:xmlType (xsd:string): [Optional] An (optional) type indicator for the 
Semantic XML conversion. Current supported values are "XHTML" (treats the 
input as HTML source, and may run a tidy algorithm in case the HTML is not 
well-formed XHTML).  "

I experimented with a few ways of processing the html to xml such as 
importTextFile and importXMLfile but I asssume because the html is not 
valid xml this doesn't work.

e.g.

warnings:ImportTextFile_2
  a sml:ImportTextFile ;
  sm:next warnings:Convert_html_XMLToRDF_2 ;
  sm:nodeX 617 ;
  sm:nodeY 39 ;
  sm:outputVariable "textOut" ;
  sml:sourceFilePath "mfu@id=4851.txt" ;
  rdfs:label "Import text file xml test" ;
.

# the xmlToRDF below fails. A character encoding issue by the looks. 
exception message Caused by: 
org.topbraid.spin.sparqlmotion.modules.SMException: 
org.xml.sax.SAXParseException; lineNumber: 9; columnNumber: 43; The 
reference to entity "l" must end with the ';' delimiter.

warnings:Convert_html_XMLToRDF_2
  a sml:ConvertXMLToRDF ;
  sm:nodeX 601 ;
  sm:nodeY 272 ;
  sml:baseURI "www.example2.com" ;
  sml:xml [
      sp:varName "textOut" ;
    ] ;
  sml:xmlType "xhtml" ;
  rdfs:label "Convert html XMLTo RDF 2" ;
.

-- 
You received this message because you are subscribed to the Google Groups 
"TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to topbraid-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/topbraid-users/17f9a123-98b7-49e2-bc67-11524e0e1911%40googlegroups.com.

Reply via email to