: We have about 5M records ranging in size all coming from a DB source (only 2 : tables). What will be the most efficient way of indexing all of these : documents? I am looking at DIH but before I go down that road I wanted to
The main question to ask yourself is what your indexing freshness requirements are. If you have a small amount of data, or if a large percentage of your data is changing all the time, and you can tollerate lag in how quickly updates to your data make it into the index, then doing complete re/full-builds (with DIH or anything else) periodicly is certianly the simplest way to go. If you have a lot of data, or a small percentage of your data is changing within the largest interval of time you are willing to wait before your index is updated, then a "batch delta indexing" approach like DIH's deltaQuery provides is only a little bit more effort on top of implementing fullbuilds. if you really need your index to be updated as soon as the authoritative data changes, then having your publishing flow immediately make changes to the index by pushing it over HTTP to the /update API is probably your best bet. -Hoss