I have a PDF file that uses imbedded underscores to identify headers. It also uses lots of zero length spaces which additionally confuses things. So if a period represents a zero length space, I might get back a string from PDFBox text parsing that is something like.
n.orm.al. _H.E__A.D_E_.R where there is 'normal' text and 'header' text in the same string. It is pretty ugly, but that's what there is. I can scan that correctly, but I would like identity the Header text as such, and consider it equivalent to Bold text. I was looking into a way to do that with the TextPosition, but since it is Final there is no way to add a field to contain that piece of information. It is not a flag to apply to the whole string, just the characters that are underlined. Could you perhaps suggest an elegant way to do that. Thanks

