Hi Martin, thanks for sharing your experience with EFF and saving me a lot of time figuring it out myself, I was afraid of exactly this kind of problems.
Mikhail, thanks for expanding the thread with even more useful informations! Simone 2012/11/20 Martin Koch <m...@issuu.com> > Solr 4.0 does support using EFFs, but it might not give you what you're > hoping fore. > > We tried using Solr Cloud, and have given up again. > > The EFF is placed in the parent of the index directory in each core; each > core reads the entire EFF and picks out the IDs that it is responsible for. > > In the current 4.0.0 release of solr, solr blocks (doesn't answer queries) > while re-reading the EFF. Even worse, it seems that the time to re-read the > EFF is multiplied by the number of cores in use (i.e. the EFF is re-read by > each core sequentially). The contents of the EFF become active after the > first EXTERNAL commit (commitWithin does NOT work here) after the file has > been updated. > > In our case, the EFF was quite large - around 450MB - and we use 16 shards, > so when we triggered an external commit to force re-reading, the whole > system would block for several (10-15) minutes. This won't work in a > production environment. The reason for the size of the EFF is that we have > around 7M documents in the index; each document has a 45 character ID. > > We got some help to try to fix the problem so that the re-read of the EFF > proceeds in the background (see > here<https://issues.apache.org/jira/browse/SOLR-3985> for > a fix on the 4.1 branch). However, even though the re-read proceeds in the > background, the time required to launch solr now takes at least as long as > re-reading the EFFs. Again, this is not good enough for our needs. > > The next issue is that you cannot sort on EFF fields (though you can return > them as values using &fl=field(my_eff_field). This is also fixed in the 4.1 > branch here <https://issues.apache.org/jira/browse/SOLR-4022>. > > So: Even after these fixes, EFF performance is not that great. Our solution > is as follows: The actual value of the popularity measure (say, reads) that > we want to report to the user is inserted into the search response > post-query by our query front-end. This value will then be the > authoritative value at the time of the query. The value of the popularity > measure that we use for boosting in the ranking of the search results is > only updated when the value has changed enough so that the impact on the > boost will be significant (say, more than 2%). This does require frequent > re-indexing of the documents that have significant changes in the number of > reads, but at least we won't have to update a document if it moves from, > say, 1000000 to 1000001 reads. > > /Martin Koch - ISSUU - senior systems architect. > > On Mon, Nov 19, 2012 at 3:22 PM, Simone Gianni <simo...@apache.org> wrote: > > > Hi all, > > I'm planning to move a quite big Solr index to SolrCloud. However, in > this > > index, an external file field is used for popularity ranking. > > > > Does SolrCloud supports external file fields? How does it cope with > > sharding and replication? Where should the external file be placed now > that > > the index folder is not local but in the cloud? > > > > Are there otherwise other best practices to deal with the use cases > > external file fields were used for, like popularity/ranking, in > SolrCloud? > > Custom ValueSources going to something external? > > > > Thanks in advance, > > Simone > > >