Hi Martin,
thanks for sharing your experience with EFF and saving me a lot of time
figuring it out myself, I was afraid of exactly this kind of problems.

Mikhail, thanks for expanding the thread with even more useful informations!

Simone


2012/11/20 Martin Koch <m...@issuu.com>

> Solr 4.0 does support using EFFs, but it might not give you what you're
> hoping fore.
>
> We tried using Solr Cloud, and have given up again.
>
> The EFF is placed in the parent of the index directory in each core; each
> core reads the entire EFF and picks out the IDs that it is responsible for.
>
> In the current 4.0.0 release of solr, solr blocks (doesn't answer queries)
> while re-reading the EFF. Even worse, it seems that the time to re-read the
> EFF is multiplied by the number of cores in use (i.e. the EFF is re-read by
> each core sequentially). The contents of the EFF become active after the
> first EXTERNAL commit (commitWithin does NOT work here) after the file has
> been updated.
>
> In our case, the EFF was quite large - around 450MB - and we use 16 shards,
> so when we triggered an external commit to force re-reading, the whole
> system would block for several (10-15) minutes. This won't work in a
> production environment. The reason for the size of the EFF is that we have
> around 7M documents in the index; each document has a 45 character ID.
>
> We got some help to try to fix the problem so that the re-read of the EFF
> proceeds in the background (see
> here<https://issues.apache.org/jira/browse/SOLR-3985> for
> a fix on the 4.1 branch). However, even though the re-read proceeds in the
> background, the time required to launch solr now takes at least as long as
> re-reading the EFFs. Again, this is not good enough for our needs.
>
> The next issue is that you cannot sort on EFF fields (though you can return
> them as values using &fl=field(my_eff_field). This is also fixed in the 4.1
> branch here <https://issues.apache.org/jira/browse/SOLR-4022>.
>
> So: Even after these fixes, EFF performance is not that great. Our solution
> is as follows: The actual value of the popularity measure (say, reads) that
> we want to report to the user is inserted into the search response
> post-query by our query front-end. This value will then be the
> authoritative value at the time of the query. The value of the popularity
> measure that we use for boosting in the ranking of the search results is
> only updated when the value has changed enough so that the impact on the
> boost will be significant (say, more than 2%). This does require frequent
> re-indexing of the documents that have significant changes in the number of
> reads, but at least we won't have to update a document if it moves from,
> say, 1000000 to 1000001 reads.
>
> /Martin Koch - ISSUU - senior systems architect.
>
> On Mon, Nov 19, 2012 at 3:22 PM, Simone Gianni <simo...@apache.org> wrote:
>
> > Hi all,
> > I'm planning to move a quite big Solr index to SolrCloud. However, in
> this
> > index, an external file field is used for popularity ranking.
> >
> > Does SolrCloud supports external file fields? How does it cope with
> > sharding and replication? Where should the external file be placed now
> that
> > the index folder is not local but in the cloud?
> >
> > Are there otherwise other best practices to deal with the use cases
> > external file fields were used for, like popularity/ranking, in
> SolrCloud?
> > Custom ValueSources going to something external?
> >
> > Thanks in advance,
> > Simone
> >
>

Reply via email to