I worked for Rakuten and they had a custom stack: Cassandra Hadoop Solr. Index into Cassandra, Hadoop crunches docs into segments with a plugin which get merged into the live Solr index.
On Thu, Aug 14, 2025, 7:02 PM Gus Heck <[email protected]> wrote: > Only good way. ;) I've seen folks feed one doc at a time and force a > commit once per doc. It's not transactional per se, but defintely not bulk > either. > > http://www.needhamsoftware.com (work) > https://a.co/d/b2sZLD9 (my fantasy fiction book) > > On Thu, Aug 14, 2025, 9:14 PM Walter Underwood <[email protected]> > wrote: > > > Short version, bulk upload is the only way to get data into Solr. There > is > > no transactional interface. > > > > wunder > > Walter Underwood > > [email protected] > > http://observer.wunderwood.org/ (my blog) > > > > > On Aug 14, 2025, at 5:53 PM, Gus Heck <[email protected]> wrote: > > > > > > Hi and welcome :) > > > > > > There are a variety of interfaces that you can use to send multiple > > > documents at a time. (you can start on this, page, more info on the > next > > > few pages as well > > > > > > https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-with-update-handlers.html > > > ) > > > > > > Sending in batches using those interfaces is standard practice. > > > > > > If you mean pre-calculating the indexed data to minimize load on the > > > server, then pre-indexed fields are one option. ( > > > > > > https://solr.apache.org/guide/solr/latest/indexing-guide/external-files-processes.html#the-preanalyzedfield-type > > ) > > > - for that JesterJ (a project I (mostly) wrote) has some built in > > support: > > > > > > > > > https://github.com/nsoft/jesterj/blob/master/code/ingest/src/main/java/org/jesterj/ingest/processors/PreAnalyzeFields.java > > > - I've used that particular processor in one project successfully so > far. > > > > > > I've also heard of folks indexing on one system and then copying or > > > replicating indexes to a destination system. (this is a custom > engineered > > > type of thing) > > > > > > All of the above is subject to our commit intervals and/or manual > commit > > > requests (typically only use the manual requests in special cases with > > > careful planning) > > > > > > So there are several possibilities (and also some I haven't mentioned > > > involving streaming expressions), but it would help to have a more > > detailed > > > description of the problem you are trying to solve (as opposed to > asking > > > after the solution you expect to need). > > > > > > -Gus > > > > > > http://www.needhamsoftware.com (work) > > > https://a.co/d/b2sZLD9 (my fantasy fiction book) > > > > >
