[ https://issues.apache.org/jira/browse/UIMA-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608048#action_12608048 ]
Adam Lally commented on UIMA-1080: ---------------------------------- This doesn't seem to handle spaces in the file path. For example if you run the document analyzer with this input dir: C:\Program Files\apache-uima\examples\data Then the output files are produced with the generic names doc0, doc1, etc., indicating that the filename wasn't extracted from the URI. As I recall, the URI class is much less lenient than the URL class when it comes to spaces. This might be considered a problem with the FileSystemCollectionReader, which populates the SourceDocumenInformation.uri field. Perhaps it should not be putting spaces in there. However, I am somewhat nervous about changing this to URL-encode the uri, since I think it is likely there's some user code out there that is relying on the current behavior. Also, whatever change is applied to XmiWriterCasConsumer probably should also be applied to XCasWriterCasConsumer. And there are also example versions of these classes in the uimaj-examples project. > [Patch] Wrong usage of URL in XmiWriterCasConsumer > -------------------------------------------------- > > Key: UIMA-1080 > URL: https://issues.apache.org/jira/browse/UIMA-1080 > Project: UIMA > Issue Type: Improvement > Components: InternalTools > Affects Versions: 2.2.2 > Reporter: Richard Eckart > Priority: Minor > Attachments: UIMA-1080.patch > > > The XmiWriterCasConsumer wraps the value of > SourceDocumentInformation.getUri() in an URL to extract the path. This only > works if the value returned by getUri() is actually an URL starting with > http, ftp or some other known protocol. It does not work if a framework user > puts some self-defined URIs in there, such as annolab://default/myfile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.