: We have about 5M records ranging in size all coming from a DB source (only 2
: tables). What will be the most efficient way of indexing all of these
: documents? I am looking at DIH but before I go down that road I wanted to

The main question to ask yourself is what your indexing freshness 
requirements are.  

If you have a small amount of data, or if a large percentage of your data 
is changing all the time, and you can tollerate lag in how quickly updates 
to your data make it into the index, then doing complete re/full-builds 
(with DIH or anything else) periodicly is certianly the simplest way to 
go.

If you have a lot of data, or a small percentage of your data is changing 
within the largest interval of time you are willing to wait before your 
index is updated, then a "batch delta indexing" approach like DIH's 
deltaQuery provides is only a little bit more effort on top of 
implementing fullbuilds.

if you really need your index to be updated as soon as the authoritative 
data changes, then having your publishing flow immediately make changes to 
the index by pushing it over HTTP to the /update API is probably your best 
bet.



-Hoss

Reply via email to