Hi Tim,

No worries, I should have added the issue details to my account request. I
have resubmitted.

Kind regards,
Ruairidh

On Wed, 25 Sept 2024 at 17:48, Tim Allison <[email protected]> wrote:

> Thank you for raising this issue. Please re-request a jira account, and
> we'll accept it. Sorry about that.
>
> On Wed, Sep 25, 2024 at 11:06 AM Ruairidh Williamson <
> [email protected]> wrote:
>
>> Hello,
>>
>> We are using tika to extract text from XPS files and have hit an issue
>> where whitespace is not emitted where we would expect. See the attached
>> example file where opening the file it visually has a large gap between "x"
>> and "abcde1234f" but when extracted by tika it calls `characters` with "x"
>> and then `characters` on "abcde1234f". We would expect a
>> `ignorableWhitespace` in between those calls but we don't get one.
>>
>> I've taken a look through the XPS source code and think I've identified
>> the issue and how to fix it. I would like to submit a pull request on
>> github. The contribution requirements say I must have a tika issue open
>> first. My request to make an ASF account was denied so if anyone is able to
>> create an issue for me I will create my pull request against that.
>>
>> Any help or feedback would be appreciated.
>>
>> Kind regards,
>> Ruairidh
>>
>>
>> Next DLP, Huckletree West, Mediaworks, 191 Wood Ln, London W12 7FP.
>> Company number 13785405.
>
>

-- 

[image: Next DLP] <https://www.nextdlp.com/>
*Ruairidh Williamson*
Software Engineer
[email protected]
www.nextdlp.com
[image: LinkedIn] <https://www.linkedin.com/company/nextdlp/> [image:
Twitter] <https://twitter.com/next_dlp> [image: “Vimeo”]
<https://vimeo.com/nextdlp>

-- 
Next DLP, Huckletree West, Mediaworks, 191 Wood Ln, London W12 7FP. Company 
number 13785405.

Reply via email to