Hi Tim, No worries, I should have added the issue details to my account request. I have resubmitted.
Kind regards, Ruairidh On Wed, 25 Sept 2024 at 17:48, Tim Allison <[email protected]> wrote: > Thank you for raising this issue. Please re-request a jira account, and > we'll accept it. Sorry about that. > > On Wed, Sep 25, 2024 at 11:06 AM Ruairidh Williamson < > [email protected]> wrote: > >> Hello, >> >> We are using tika to extract text from XPS files and have hit an issue >> where whitespace is not emitted where we would expect. See the attached >> example file where opening the file it visually has a large gap between "x" >> and "abcde1234f" but when extracted by tika it calls `characters` with "x" >> and then `characters` on "abcde1234f". We would expect a >> `ignorableWhitespace` in between those calls but we don't get one. >> >> I've taken a look through the XPS source code and think I've identified >> the issue and how to fix it. I would like to submit a pull request on >> github. The contribution requirements say I must have a tika issue open >> first. My request to make an ASF account was denied so if anyone is able to >> create an issue for me I will create my pull request against that. >> >> Any help or feedback would be appreciated. >> >> Kind regards, >> Ruairidh >> >> >> Next DLP, Huckletree West, Mediaworks, 191 Wood Ln, London W12 7FP. >> Company number 13785405. > > -- [image: Next DLP] <https://www.nextdlp.com/> *Ruairidh Williamson* Software Engineer [email protected] www.nextdlp.com [image: LinkedIn] <https://www.linkedin.com/company/nextdlp/> [image: Twitter] <https://twitter.com/next_dlp> [image: “Vimeo”] <https://vimeo.com/nextdlp> -- Next DLP, Huckletree West, Mediaworks, 191 Wood Ln, London W12 7FP. Company number 13785405.
