[ https://issues.apache.org/jira/browse/UIMA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jérôme Rocheteau updated UIMA-1502: ----------------------------------- Attachment: wst.patch This is a patch that makes possible to use the Whitespace tokenizer whatever the way sofas are set through collection readers. > Using getSofaDataStream instead of getDocumentText > -------------------------------------------------- > > Key: UIMA-1502 > URL: https://issues.apache.org/jira/browse/UIMA-1502 > Project: UIMA > Issue Type: Improvement > Components: Sandbox-WhitespaceTokenizer > Reporter: Jérôme Rocheteau > Priority: Minor > Attachments: wst.patch > > Original Estimate: 0.17h > Remaining Estimate: 0.17h > > I would like to known if it could be better to get the CAS text content by > calling the getSofaDataStream method of the CAS class instead of getting it > by the getDocumentText one. > Actually, CAS sofas can be set either by calling the setSofaDataString method > (aka setDocumentText), or by calling the setSofaDataArray one, or by calling > the setSofaDataURI one. However, the getDocumentText method (aka > getSofaDataString) provides the content of CASes whose sofas are only set by > the first method whereas the getSofaDataStream method retieves content > whatever the called method. A method able to get String from an InputStream > is then needed. > Am I wrong in thinking it's an Improvement? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.