This is an error being thrown by Apache PDFBox/Tika. You're seeing it now
because Solr 4.x uses a different Tika version than Solr 3.x.

It looks like this error is thrown when you parse a PDF with Tika, and a
font in that PDF doesn't have a ToUnicode mapping.
https://issues.apache.org/jira/browse/PDFBOX-1408

Another user reported that this might be related to special characters, but
PDFBox developers haven't been able to reproduce the bug.
https://issues.apache.org/jira/browse/PDFBOX-1706

Since this isn't an issue in the Solr code, if you're concerned about it,
you'll probably have better luck asking the PDFBox developers directly, via
Jira or their mailing list.


On Tue, Feb 16, 2016 at 12:08 PM, Joseph Hagerty <joa...@gmail.com> wrote:

> Does literally nobody else see this error in their logs? I see this error
> hundreds of times per day, in occasional bursts. Should I file this as a
> bug?
>
> On Mon, Feb 15, 2016 at 4:56 PM, Joseph Hagerty <joa...@gmail.com> wrote:
>
> > After migrating from 3.5 to 4.10.3, I'm seeing the following error with
> > alarming regularity in the master's error log:
> >
> > 2/15/2016, 4:32:22 PM ERROR PDSimpleFont Can't determine the width of the
> > space character using 250 as default
> > I can't seem to glean much information about this one from the web. Has
> > anyone else fought this error?
> >
> > In case this helps, here's some technical/miscellaneous info:
> >
> > - I'm running a master-slave set-up.
> >
> > - I rely on the ERH (tika/solr-cell/whatever) for extracting plaintext
> > from .docs and .pdfs. I'm guessing that PDSimpleFont is a component of
> > this, but I don't know the first thing about it.
> >
> > - I have the clients specifying 'autocommit=6s' in their requests, which
> I
> > realize is a pretty aggressive commit interval, but so far that hasn't
> > caused any problems I couldn't surmount.
> >
> > - There are north of 11 million docs in my index, which is 36 gigs thick.
> > The storage volume is only 10% full.
> >
> > - When I migrated from 3.5 to 4.10.3, I correctly performed a reindex due
> > to incompatibility between versions.
> >
> > - Both master and slave are running on AWS instances, C4.4XL's (16 cores,
> > 30 gigs of RAM).
> >
> > So far, I have been unable to reproduce this error on my own: I can only
> > observe it in the logs. I haven't been able to tie it to any specific
> > document.
> >
> > Let me know if further information would be helpful.
> >
> >
> >
> >
>
>
> --
> - Joe
>

Reply via email to