I worked for Rakuten and they had a custom stack: Cassandra Hadoop Solr.
Index into Cassandra, Hadoop crunches docs into segments with a plugin
which get merged into the live Solr index.

On Thu, Aug 14, 2025, 7:02 PM Gus Heck <[email protected]> wrote:

> Only good way. ;) I've seen folks feed one doc at a time  and force a
> commit once per doc. It's not transactional per se, but defintely not bulk
> either.
>
> http://www.needhamsoftware.com (work)
> https://a.co/d/b2sZLD9 (my fantasy fiction book)
>
> On Thu, Aug 14, 2025, 9:14 PM Walter Underwood <[email protected]>
> wrote:
>
> > Short version, bulk upload is the only way to get data into Solr. There
> is
> > no transactional interface.
> >
> > wunder
> > Walter Underwood
> > [email protected]
> > http://observer.wunderwood.org/  (my blog)
> >
> > > On Aug 14, 2025, at 5:53 PM, Gus Heck <[email protected]> wrote:
> > >
> > > Hi and welcome :)
> > >
> > > There are a variety of interfaces that you can use to send multiple
> > > documents at a time. (you can start on this, page, more info on the
> next
> > > few pages as well
> > >
> >
> https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-with-update-handlers.html
> > > )
> > >
> > > Sending in batches using those interfaces is standard practice.
> > >
> > > If you mean pre-calculating the indexed data to minimize load on the
> > > server, then pre-indexed fields are one option. (
> > >
> >
> https://solr.apache.org/guide/solr/latest/indexing-guide/external-files-processes.html#the-preanalyzedfield-type
> > )
> > > - for that JesterJ (a project I (mostly) wrote) has some built in
> > support:
> > >
> > >
> >
> https://github.com/nsoft/jesterj/blob/master/code/ingest/src/main/java/org/jesterj/ingest/processors/PreAnalyzeFields.java
> > > - I've used that particular processor in one project successfully so
> far.
> > >
> > > I've also heard of folks indexing on one system and then copying or
> > > replicating indexes to a destination system. (this is a custom
> engineered
> > > type of thing)
> > >
> > > All of the above is subject to our commit intervals and/or manual
> commit
> > > requests (typically only use the manual requests in special cases with
> > > careful planning)
> > >
> > > So there are several possibilities (and also some I haven't mentioned
> > > involving streaming expressions), but it would help to have a more
> > detailed
> > > description of the problem you are trying to solve (as opposed to
> asking
> > > after the solution you expect to need).
> > >
> > > -Gus
> > >
> > > http://www.needhamsoftware.com (work)
> > > https://a.co/d/b2sZLD9 (my fantasy fiction book)
> >
> >
>

Reply via email to