missing spaces in text extraction of BodyContentHandler
-------------------------------------------------------

                 Key: TIKA-532
                 URL: https://issues.apache.org/jira/browse/TIKA-532
             Project: Tika
          Issue Type: Bug
    Affects Versions: 0.8
            Reporter: Reinhard Schwab
             Fix For: 0.8


BodyContentHandler works fine to extract the text from pages,
except this page:

http://www.lucidimagination.com/developers/whitepapers/whats-new-solr-14

there is a selection,
the text returned by BodyContentHandler contains

"...Country: *
  -- Select a Country -- United 
StatesCanadaArgentinaAustraliaBrazilChinaFranceGermanyIndiaIndonesiaItalyJapanMexicoRussiaSaudi"

to have a space between the country names would be favourable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to