Hi, Maybe you could use external field type as an example how to hook up values from DB: https://lucene.apache.org/solr/guide/6_6/working-with-external-files-and-processes.html <https://lucene.apache.org/solr/guide/6_6/working-with-external-files-and-processes.html>
HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 20 Feb 2018, at 20:39, Roman Chyla <roman.ch...@gmail.com> wrote: > > Say there is a high load and I'd like to bring a new machine and let it > replicate the index, if 100gb and more can be shaved, it will have a > significant impact on how quickly the new searcher is ready and added to > the cluster. Impact on the search speed is likely minimal. > > we are investigating the idea of two clusters but i have to say it seems to > me more complex than storing/loading a field from an external source. > having said that, I wonder why this was not done before (maybe it was) and > what the cons are (besides the obvious ones: maintenance and the database > being potential point of failure; well in that case i'd miss highlights - > can live with that...) > > On Tue, Feb 20, 2018 at 10:36 AM, David Hastings < > hastings.recurs...@gmail.com> wrote: > >> Really depends on what you consider too large, and why the size is a big >> issue, since most replication will go at about 100mg/second give or take, >> and replicating a 300GB index is only an hour or two. What i do for this >> purpose is store my text in a separate index altogether, and call on that >> core for highlighting. So for my use case, the primary index with no >> stored text is around 300GB and replicates as needed, and the full text >> indexes with stored text totals around 500GB and are replicating non stop. >> All searching goes against the primary index, and for highlighting i call >> on the full text indexes that have a stupid simple schema. This has worked >> for me pretty well at least. >> >> On Tue, Feb 20, 2018 at 10:27 AM, Roman Chyla <roman.ch...@gmail.com> >> wrote: >> >>> Hello, >>> >>> We have a use case of a very large index (slave-master; for unrelated >>> reasons the search cannot work in the cloud mode) - one of the fields is >> a >>> very large text, stored mostly for highlighting. To cut down the index >> size >>> (for purposes of replication/scaling) I thought I could try to save it >> in a >>> database - and not in the index. >>> >>> Lucene has codecs - one of the methods is for 'stored field', so that >> seems >>> likes a natural path for me. >>> >>> However, I'd expect somebody else before had a similar problem. I googled >>> and couldn't find any solutions. Using the codecs seems really good thing >>> for this particular problem, am I missing something? Is there a better >> way >>> to cut down on index size? (besides solr cloud/sharding, compression) >>> >>> Thank you, >>> >>> Roman >>> >>