Hi,

Just as a side note, the latest 1.4 development version can be found at
trunk SVN repository

https://svn.apache.org/repos/asf/nutch/trunk/

On Tue, Nov 1, 2011 at 8:47 PM, Bai Shen <[email protected]> wrote:

> I'm running the latest version of 1.4  We just rebuilt it last week.  Is
> that patch included?
>
> And where would it get multiple titles from?  How do I tell what the titles
> are so I can see if they're valid or not?
>
> On Tue, Nov 1, 2011 at 4:33 PM, Markus Jelsma <[email protected]
> >wrote:
>
> > This should work around the problem in most cases. The parser can output
> > two
> > titles of which one is actually empty. This patch (in 1.4) skips empty
> > titles.
> >
> > If this doesn't work you really have two _valid_ titles coming from your
> > document.
> >
> > https://issues.apache.org/jira/browse/NUTCH-1004
> >
> > > It looks like the issue I'm encountering is the same one as here.
> > >
> > >
> >
> http://lucene.472066.n3.nabble.com/multiple-values-encountered-for-non-mult
> > > iValued-field-title-td1446817.html
> > >
> > > I'm not really sure what the linked bug is since that involves the HTML
> > > parser and I'm seeing this problem with a PDF file.
> > >
> > > On Tue, Nov 1, 2011 at 3:41 PM, Bai Shen <[email protected]>
> > wrote:
> > > > I'm getting an exception when I try to commit to Solr.  Looking at
> the
> > > > Solr log, it's showing that title is getting multiple values when
> it's
> > > > not a multivalue field.  None of my code does anything with the
> title,
> > > > so I'm not sure why this is happening.
> > > >
> > > > How can I look at the pending commit and determine why and/or delete
> > the
> > > > extraneous values?  The document in question is a pdf if that makes a
> > > > difference.
> >
>



-- 
*Lewis*

Reply via email to