Not quite. It relies on a scoring filter; check IndexerMapReduce code around line 181.
On Wednesday 12 October 2011 15:35:08 Marek Bachmann wrote: > Not sure. What does it do? I thought solrindex would take the score > directly from the crawldb? > > On 12.10.2011 15:32, Markus Jelsma wrote: > > Are you using the scoring-link plugin? > > > > On Wednesday 12 October 2011 15:18:12 Marek Bachmann wrote: > >> Hey Folks, > >> > >> sorry for this second request to this topic. I managed to figure out > >> that the problem is nutch related. > >> > >> Once again: I have a set of urls( ~182k ) fetched, parsed and ranked via > >> WebGraph. All went very well. > >> > >> After that I want to index them to solr. This works fine too, except > >> that the boost isn't set. > >> > >> I have debugged this issue for an example url: > >> > >> nutch@hrz-pc318:/nutch/dumps/dbdump$ cat part-00001 | grep -A 9 > >> http://www.mathematik.uni-kassel.de/~fgcaadm/fachgruppe-computeralgebra. > >> de/ JdM/beitrag-hohenwarter/bezier3cons.html > >> > >> http://www.mathematik.uni-kassel.de/~fgcaadm/fachgruppe-computeralgebra. > >> de/ JdM/beitrag-hohenwarter/bezier3cons.html Version: 7 > >> Status: 2 (db_fetched) > >> Fetch time: Fri Oct 14 14:03:18 CEST 2011 > >> Modified time: Thu Jan 01 01:00:00 CET 1970 > >> Retries since fetch: 0 > >> Retry interval: 603450 seconds (6 days) > >> Score: 0.16124992 > >> Signature: 02ab7d9e6655082ff139e8a9c9afb97f > >> Metadata: _pst_: success(1), lastModified=0 > >> > >> You see the score isn't 1.0 > >> I ran the solrindex command an logged the traffic via tcpmon, here is > >> the extract of the document which is send to solr: > >> > >> POST /solr/update?wt=javabin&version=2 HTTP/1.1 > >> User-Agent: > >> Solr[org.apache.solr.client.solrj.impl.CommonsHttpSolrServer] 1.0 > >> Host: localhost:8080 > >> Transfer-Encoding: chunked > >> Content-Type: application/xml; charset=UTF-8 > >> > >> 2000 > >> <add> > >> > >> <doc boost="1.0"> > >> > >> <field name="site"> > >> > >> www.mathematik.uni-kassel.de > >> > >> </field> > >> <field name="host"> > >> > >> www.mathematik.uni-kassel.de > >> > >> </field> > >> <field name="lastModified"> > >> > >> 2008-03-03T13:22:14.000Z > >> > >> </field> > >> <field name="segment"> > >> > >> 20111007135815 > >> > >> </field> > >> <field name="digest"> > >> > >> 02ab7d9e6655082ff139e8a9c9afb97f > >> > >> </field> > >> <field name="tstamp"> > >> > >> 2011-10-07T12:25:48.230Z > >> > >> </field> > >> <field name="date"> > >> > >> 2008-03-03T13:22:14.000Z > >> > >> </field> > >> <field name="type"> > >> > >> text/html > >> > >> </field> > >> <field name="id"> > >> > >> http://www.mathematik.uni-kassel.de/~fgcaadm/fachgruppe-computeralgebra. > >> de/ JdM/beitrag-hohenwarter/bezier3cons.html</field> > >> > >> <field name="url"> > >> > >> http://www.mathematik.uni-kassel.de/~fgcaadm/fachgruppe-computeralgebra. > >> de/ JdM/beitrag-hohenwarter/bezier3cons.html</field> > >> > >> <field name="anchor"> > >> > >> bezier3cons.html > >> > >> </field> > >> <field name="content"> > >> > >> [...] > >> > >> </field> > >> <field name="title"> > >> > >> Kubische Bézierkurve - GeoGebra Dynamisches > >> > >> Arbeitsblatt > >> > >> </field> > >> <field name="boost"> > >> > >> 1.0 > >> > >> </field> > >> <field name="contentLength"> > >> > >> 1570 > >> > >> </field> > >> > >> </doc> > >> [...] > >> > >> </add> > >> > >> So the boost is set to 1.0. I can't help myself why this happens. Need > >> your help. :) -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

