ideas for indexing large amount of pdf docs

Rode Gonzalez (libnova) Sat, 13 Aug 2011 01:50:28 -0700

Hi all,

I want to ask about the best way to implement a solution for indexing a 
large amount of pdf documents between 10-60 MB each one. 100 to 1000 users 
connected simultaneously.


I actually have 1 core of solr 3.3.0 and it works fine for a few number of 
pdf docs but I'm afraid about the moment when we enter in production time.

some possibilities:

i. clustering. I have no experience in this, so it will be a bad idea to 
venture into this.

ii. multicore solution. make some kind of hash to choose one core at each 
query (exact queries) and thus reduce the size of the individual indexes to 
consult or to consult all the cores at same time (complex queries).

iii. do nothing more and wait for the catastrophe in the response times :P


Someone with experience can help a bit to decide?

Thanks a lot in advance.

ideas for indexing large amount of pdf docs

Reply via email to