[ 
https://issues.apache.org/jira/browse/UIMA-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608048#action_12608048
 ] 

Adam Lally commented on UIMA-1080:
----------------------------------

This doesn't seem to handle spaces in the file path.  For example if you run 
the document analyzer with this input dir:
C:\Program Files\apache-uima\examples\data

Then the output files are produced with the generic names doc0, doc1, etc., 
indicating that the filename wasn't extracted from the URI.  As I recall, the 
URI class is much less lenient than the URL class when it comes to spaces.

This might be considered a problem with the FileSystemCollectionReader, which 
populates the SourceDocumenInformation.uri field.  Perhaps it should not be 
putting spaces in there.  However, I am somewhat nervous about changing this to 
URL-encode the uri, since I think it is likely there's some user code out there 
that is relying on the  current behavior.

Also, whatever change is applied to XmiWriterCasConsumer probably should also 
be applied to XCasWriterCasConsumer.  And there are also example versions of 
these classes in the uimaj-examples project.

> [Patch] Wrong usage of URL in XmiWriterCasConsumer
> --------------------------------------------------
>
>                 Key: UIMA-1080
>                 URL: https://issues.apache.org/jira/browse/UIMA-1080
>             Project: UIMA
>          Issue Type: Improvement
>          Components: InternalTools
>    Affects Versions: 2.2.2
>            Reporter: Richard Eckart
>            Priority: Minor
>         Attachments: UIMA-1080.patch
>
>
> The XmiWriterCasConsumer wraps the value of 
> SourceDocumentInformation.getUri() in an URL to extract the path. This only 
> works if the value returned by getUri() is actually an URL starting with 
> http, ftp or some other known protocol. It does not work if a framework user 
> puts some self-defined URIs in there, such as annolab://default/myfile. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to