Hi All, I have huge number of documents to index (say per hr) and within a hr I cannot compete it using a single machine. Having them distributed in multiple boxes and indexing them in parallel is not an option as my target doc size per hr itself can be very huge (3-6M). So I am considering using HDFS and MapReduce to do the indexing job within time.
In that regard I have following queries regarding using Solr with Hadoop. 1. After creating the index using Hadoop whether storing them for query purpose again in HDFS would mean additional performance overhead (compared to storing them in in actual disk in one machine.) ? 2. What type of change is needed to make Solr wuery read from an index which is stored in HDFS ? Regards, Sourav **************** CAUTION - Disclaimer ***************** This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS******** End of Disclaimer ********INFOSYS***