Not quite. It relies on a scoring filter; check IndexerMapReduce code around 
line 181.

On Wednesday 12 October 2011 15:35:08 Marek Bachmann wrote:
> Not sure. What does it do? I thought solrindex would take the score
> directly from the crawldb?
> 
> On 12.10.2011 15:32, Markus Jelsma wrote:
> > Are you using the scoring-link plugin?
> > 
> > On Wednesday 12 October 2011 15:18:12 Marek Bachmann wrote:
> >> Hey Folks,
> >> 
> >> sorry for this second request to this topic. I managed to figure out
> >> that the problem is nutch related.
> >> 
> >> Once again: I have a set of urls( ~182k ) fetched, parsed and ranked via
> >> WebGraph. All went very well.
> >> 
> >> After that I want to index them to solr. This works fine too, except
> >> that the boost isn't set.
> >> 
> >> I have debugged this issue for an example url:
> >> 
> >> nutch@hrz-pc318:/nutch/dumps/dbdump$ cat part-00001 | grep -A 9
> >> http://www.mathematik.uni-kassel.de/~fgcaadm/fachgruppe-computeralgebra.
> >> de/ JdM/beitrag-hohenwarter/bezier3cons.html
> >> 
> >> http://www.mathematik.uni-kassel.de/~fgcaadm/fachgruppe-computeralgebra.
> >> de/ JdM/beitrag-hohenwarter/bezier3cons.html Version: 7
> >> Status: 2 (db_fetched)
> >> Fetch time: Fri Oct 14 14:03:18 CEST 2011
> >> Modified time: Thu Jan 01 01:00:00 CET 1970
> >> Retries since fetch: 0
> >> Retry interval: 603450 seconds (6 days)
> >> Score: 0.16124992
> >> Signature: 02ab7d9e6655082ff139e8a9c9afb97f
> >> Metadata: _pst_: success(1), lastModified=0
> >> 
> >> You see the score isn't 1.0
> >> I ran the solrindex command an logged the traffic via tcpmon, here is
> >> the extract of the document which is send to solr:
> >> 
> >> POST /solr/update?wt=javabin&version=2 HTTP/1.1
> >> User-Agent:
> >> Solr[org.apache.solr.client.solrj.impl.CommonsHttpSolrServer] 1.0
> >> Host: localhost:8080
> >> Transfer-Encoding: chunked
> >> Content-Type: application/xml; charset=UTF-8
> >> 
> >> 2000
> >> <add>
> >> 
> >>           <doc boost="1.0">
> >>           
> >>                   <field name="site">
> >>                   
> >>                           www.mathematik.uni-kassel.de
> >>                   
> >>                   </field>
> >>                   <field name="host">
> >>                   
> >>                           www.mathematik.uni-kassel.de
> >>                   
> >>                   </field>
> >>                   <field name="lastModified">
> >>                   
> >>                           2008-03-03T13:22:14.000Z
> >>                   
> >>                   </field>
> >>                   <field name="segment">
> >>                   
> >>                           20111007135815
> >>                   
> >>                   </field>
> >>                   <field name="digest">
> >>                   
> >>                           02ab7d9e6655082ff139e8a9c9afb97f
> >>                   
> >>                   </field>
> >>                   <field name="tstamp">
> >>                   
> >>                           2011-10-07T12:25:48.230Z
> >>                   
> >>                   </field>
> >>                   <field name="date">
> >>                   
> >>                           2008-03-03T13:22:14.000Z
> >>                   
> >>                   </field>
> >>                   <field name="type">
> >>                   
> >>                           text/html
> >>                   
> >>                   </field>
> >>                   <field name="id">
> >> 
> >> http://www.mathematik.uni-kassel.de/~fgcaadm/fachgruppe-computeralgebra.
> >> de/ JdM/beitrag-hohenwarter/bezier3cons.html</field>
> >> 
> >>                   <field name="url">
> >> 
> >> http://www.mathematik.uni-kassel.de/~fgcaadm/fachgruppe-computeralgebra.
> >> de/ JdM/beitrag-hohenwarter/bezier3cons.html</field>
> >> 
> >>                   <field name="anchor">
> >>                   
> >>                           bezier3cons.html
> >>                   
> >>                   </field>
> >>                   <field name="content">
> >>                   
> >>                           [...]
> >>                   
> >>                   </field>
> >>                   <field name="title">
> >>                   
> >>                           Kubische Bézierkurve - GeoGebra Dynamisches
> >> 
> >> Arbeitsblatt
> >> 
> >>                   </field>
> >>                   <field name="boost">
> >>                   
> >>                           1.0
> >>                   
> >>                   </field>
> >>                   <field name="contentLength">
> >>                   
> >>                           1570
> >>                   
> >>                   </field>
> >>           
> >>           </doc>
> >>           [...]
> >> 
> >> </add>
> >> 
> >> So the boost is set to 1.0. I can't help myself why this happens. Need
> >> your help. :)

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to