On 11/22/2011 11:52 PM, Husain, Yavar wrote:
Hi Shawn
That was so great of you to explain the architecture in such a detail. I
enjoyed reading it multiple times.
I have a question here:
You mentioned that we can use crc32(DocumentId)% NumServers. Now actually I am
using that in my data-config.xml in the sql query itself, something like:
For Documents to be indexed on Server 1: select DocumentId,PNum,... from Sample
where crc32(DocumentId)%2=0;
For Documents to be indexed on Server 2: select DocumentId,PNum,... from Sample
where crc32(DocumentId)%2=1;
Will that be a right way? Will it not be a slow query?
Thanks once again.
Those queries look good. Compared to an unqalified SELECT, I'm sure the
crc32 will slow it down, but unless your database hardware is not up to
the job, Solr will probably be more of a bottleneck than the DB.
You can have a generic DIH config and pass the information in with the
dataimport:
url="jdbc:mysql://${dataimporter.request.dbHost}/${dataimporter.request.dbSchema}?zeroDateTimeBehavior=convertToNull"
<snip>
SELECT * FROM ${dataimporter.request.dataView}
WHERE (
(
did > ${dataimporter.request.minDid}
AND did <= ${dataimporter.request.maxDid}
)
${dataimporter.request.extraWhere}
) AND (crc32(did) % ${dataimporter.request.numShards})
IN (${dataimporter.request.modVal})
This is the URL template that will work with the above DIH config:
http://HOST:PORT/solr/CORE/dataimport?command=COMMAND&dbHost=DBSERVER&dbSchema=DBSCHEMA&dataView=DATAVIEW&numShards=NUMSHARDS&modVal=MODVAL&minDid=MINDID&maxDid=MAXDID&extraWhere=EXTRAWHERE
Under normal circumstances extraWhere is blank. It's there for
special-purpose importing.
Thanks,
Shawn