See also my comment on your PageRank question, which I take is the
same issue. Namely, you might have a look into the external file
stuff. I've used that to store oft-changing boost information that
then factors into the score.
-Grant
On Apr 24, 2009, at 12:03 PM, Michael McCandless wrote:
I think something like this (NOTE: not at all tested, and I have no
real experience with function queries):
ValueSource vals = new MyPageRankScores(...);
ValueSourceQuery prQuery = new ValueSourceQuery(vals);
Query realQuery = get-user's-query;
Query q = new CustomScoreQuery(realQuery, prQuery);
TopDocs hits = searcher.search(q, 10);
MyPageRankScores is your class, subclassing ValueSource and
implementing the
getValues method.
You could subclass CustomScoreQuery if you want to tweak just how the
"real" Query scores and your page-rank scores are combined.
Mike
On Fri, Apr 24, 2009 at 5:20 AM, Marcus Herou
<marcus.he...@tailsweep.com> wrote:
Yes I am thinking of something like that.
Could you elaborate on how that would look like pseudo wise ?
Kindly
//Marcus
On Fri, Apr 24, 2009 at 9:05 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:
Could function queries be used here? EG you could implement a
ValueSource that pulls in the external scores?
Mike
On Thu, Apr 23, 2009 at 4:01 PM, Marcus Herou
<marcus.he...@tailsweep.com> wrote:
Hi.
Confusing subject eh ? Trying to become a little clearer in a few
sentences.
We have a Solr/Lucene index where each document is a Blog Entry.
We have
just implemented the PageRank algorithm for Blogs and are about
to add a
column to the index called score and perhaps adjust the document
boost.
We have as well decided that it is the blog itself and not the
individual
pages that are to be ranked so all entries belonging to one blog
will
receive the same score.
I have not found a way to apply a document score without actually
re-indexing all fields in the affected entries (could very well
be 100%
at
every PageRank recalculation) and this will of course take hell
of a long
time to reindex which effectively will render the process useless
since
it
would take a week or of reindexing as of current and will take
more and
more
time. (100M blog entries as of current and rapidly increasing).
Guess we have run into the issue where we have some "static" data
which
we
do not want to touch at all but we want to update certain "dynamic"
fields.
Lucene is not a database I know but is there a way to implement
external
search-time scoring or update individual fields ? Would there be a
possibilty to do some kind of join (parallell searches separate
index
types)
? or send the result to a separate sorting algorithm ? Hmmm....
Perhaps a
subclass of Sort ? Grasping at straws here folks...
Hope anyone of the core experts can help us.
Cheers
//Marcus Herou
--
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.he...@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/
--
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.he...@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search