Thanks Erick.

Your summary about doc IDs is much helpful.

I tested the second level sort with a small set of data (10K records) and
didn't see much of a significant impact.  I will test with a 10m records at
some time later.

Steve

On Mon, Aug 24, 2015 at 11:03 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Getting the most recent doc first in the case of a tie
> will _not_ "just happen". I don't think you really get the
> nuance here...
>
> You index doc1, and doc2 later. Let's
> claim that doc1 gets internal Lucene doc ID of 1 and
> doc2 gets an internal doc ID of 2. So far you're golden.
> Let's further claim that doc1 is in a different segment than
> doc2. Sometime later, as you add/update/delete docs,
> segments are merged and doc1 and doc2 may or may
> not be in the merged segment. At that point, doc1 can get an
> internal Lucene doc ID of, say, 823 and doc2 can get an internal
> doc ID of, say 64. So their relative order is changed.
>
> You have to have a secondary sort criteria then. And it has to be
> something monotonically increasing by time that won't ever change
> like internal doc IDs can. Adding a timestamp
> to every doc is certainly an option. Adding your own counter
> is also reasonable.
>
> But this is a _secondary_ sort, so it's not even consulted if the
> first sort (score) is not a tie. You can get a sense of how this would
> affect your query time/CPU usage/RAM by must specifying
> sort=score desc,id asc
> where id is your <uniqueKey> field. This won't do what you want,
> but it will simulate it without having to re-index.
>
> Best,
> Erick
>
> On Mon, Aug 24, 2015 at 11:54 AM, Steven White <swhite4...@gmail.com>
> wrote:
> > Thanks Hoss.
> >
> > I understand the dynamic nature of doc-IDs.  All that I care about is the
> > most recent docs be at the top of the hit list when there is a tie.  From
> > your reply, it is not clear if that's what happens.  If not, then I have
> to
> > sort, but this is something I want to avoid so it won't add cost to my
> > queries (CPU and RAM).
> >
> > Can you help me answer those two questions?
> >
> > Steve
> >
> > On Mon, Aug 24, 2015 at 2:16 PM, Chris Hostetter <
> hossman_luc...@fucit.org>
> > wrote:
> >
> >>
> >> : A follow up question.  Is the sub-sorting on the lucene internal doc
> IDs
> >> : ascending or descending order?  That is, do the most recently index
> doc
> >>
> >> you can not make any generic assumptions baout hte order of the internal
> >> lucene doc IDS -- the secondary sort on the internal IDs is stable (and
> >> FWIW: ascending) for static indexes, but as mentioned before: the
> *actual*
> >> order hte the IDS changes as the index changes -- if there is an index
> >> merge, the ids can be totally different and docs can be re-arranged
> into a
> >> diff order...
> >>
> >> : > However, internal Lucene Ids can change when index changes. (merges,
> >> : > updates etc).
> >>
> >> ...
> >>
> >> : show up first in this set of docs that have tied score?  If not, who
> can
> >> I
> >> : have the most recent be first?  Do I have to sort on lucene's internal
> >> doc
> >>
> >> add a "timestamp" or "counter" field when you index your documents that
> >> means whatevery you want it to mean (order added, order updated, order
> >> according to some external sort criteria from some external system) and
> >> then do an explicit sort on that.
> >>
> >>
> >> -Hoss
> >> http://www.lucidworks.com/
> >>
>

Reply via email to