Hi All,

I have huge number of documents to index (say per hr) and within a hr I cannot 
compete it using a single machine. Having them distributed in multiple boxes 
and indexing them in parallel is not an option as my target doc size per hr 
itself can be very huge (3-6M). So I am considering using HDFS and MapReduce to 
do the indexing job within time.

In that regard I have following queries regarding using Solr with Hadoop. 

1. After creating the index using Hadoop whether storing them for query purpose 
again in HDFS would mean additional performance overhead (compared to storing 
them in in actual disk in one machine.) ?

2. What type of change is needed to make Solr wuery read from an index which is 
stored in HDFS ?

Regards,
Sourav

**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are 
not 
to copy, disclose, or distribute this e-mail or its contents to any other 
person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken 
every reasonable precaution to minimize this risk, but is not liable for any 
damage 
you may sustain as a result of any virus in this e-mail. You should carry out 
your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this 
e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***

Reply via email to