I'm running the latest version of 1.4 We just rebuilt it last week. Is that patch included?
And where would it get multiple titles from? How do I tell what the titles are so I can see if they're valid or not? On Tue, Nov 1, 2011 at 4:33 PM, Markus Jelsma <[email protected]>wrote: > This should work around the problem in most cases. The parser can output > two > titles of which one is actually empty. This patch (in 1.4) skips empty > titles. > > If this doesn't work you really have two _valid_ titles coming from your > document. > > https://issues.apache.org/jira/browse/NUTCH-1004 > > > It looks like the issue I'm encountering is the same one as here. > > > > > http://lucene.472066.n3.nabble.com/multiple-values-encountered-for-non-mult > > iValued-field-title-td1446817.html > > > > I'm not really sure what the linked bug is since that involves the HTML > > parser and I'm seeing this problem with a PDF file. > > > > On Tue, Nov 1, 2011 at 3:41 PM, Bai Shen <[email protected]> > wrote: > > > I'm getting an exception when I try to commit to Solr. Looking at the > > > Solr log, it's showing that title is getting multiple values when it's > > > not a multivalue field. None of my code does anything with the title, > > > so I'm not sure why this is happening. > > > > > > How can I look at the pending commit and determine why and/or delete > the > > > extraneous values? The document in question is a pdf if that makes a > > > difference. >

