bbi123 [bbar...@gmail.com] wrote:
> We have a requirement to for large data set like Billing data for example.
> The Business wants to do sorting and type ahead functions for it.  For
> example, when I start typing “8164…” they want to list ALL the unique number
> and the associated attributes displayed (name, description, etc).

So either a search for prefix or a lookup with TermsComponent? I do not like 
the "ALL" in the requirements though. What if the prefix matches 5M documents?

> We have about 50TB of files that needs to be indexed. I haven't indexed this
> much data before hence thought of getting your valuable inputs. I am
> thinking of using SOLR cloud and use SSD for faster IO. I might need your
> inputs on hardware requirements too.

The index size it next to impossible to predict without more knowledge. Try and 
acquire just a few GB of content and experiment, so that you can get an idea of 
the final index size. The estimated number of documents and unique values in 
your lookup field are also very valuable to know.

As for storage, the question these days should be "Are there any reasons not to 
use SSDs for index storage"? The amount of RAM needed will have to be 
determined experimentally: Type-ahead does require very low latency and might 
need more caching than normally.

- Toke Eskildsen, State and University Library, Denmark

Reply via email to