Hi everybody, Let's say I had an index with 10M large-ish documents, and as people logged into a website and viewed them the "last viewed date" was updated to the current time. We index a document's last-viewed-date because we allow users to a) search on this last-viewed-date alongside all other searchable criteria, and b) we can order results of any search by the last-viewed-date. The problem is that in a given 5-minute period, we may have many thousands of updated documents (due to this simple last-viewed-date). We have a task that looks for changed documents, loads the full documents, and then feeds them into Solr to update the index, but unfortunately reading these changed documents and continually feeding them to Solr is generating * far* more load on our system (both Solr and the database) than any of the searches. In a given day, *we may have more updates to documents than we have total documents indexed*. (Databases don't handle this well either, the contention on rows for updates slows the database down significantly.) How should we approach this problem? It seems like such a waste of resources to be doing so much work in applications/database/solr only for last-viewed-dates.
Solutions we've looked at include: 1) Update only partial document. --Apparently this isn't supported in Solr yet (we're using nightly Solr 1.4 builds currently). 2) Use "near-real-time updates". --Not supported yet. Also, the "freshness" of the data isn't as much as concern as the sheer volume of changes that we have to make here. For example, we could update Solr less-fequently, but then we'd just have many more documents to update. The data only has to be, say, fresh to within 30 minutes. 3) Use a separate index for the last-viewed-date. --This won't work because we need to search on the last-viewed-date alongside other criteria, and we use it as scoring criteria for all our searches. Any suggestions? Sincerely, Daryl.