Saurabh Patil created TIKA-2650: ----------------------------------- Summary: Soft-hyphen is not extracted properly Key: TIKA-2650 URL: https://issues.apache.org/jira/browse/TIKA-2650 Project: Tika Issue Type: Bug Components: app Affects Versions: 1.18 Reporter: Saurabh Patil Attachments: Peter Rabbit.pdf
We are tring to extract text from PDF. if PDF having any big word at the end of line then after half word there is soft hyphen and remaining word goes to next line. but which extracting these text TIKA automatically replace hyphen with space. -- This message was sent by Atlassian JIRA (v7.6.3#76005)