On 11/22/2011 11:52 PM, Husain, Yavar wrote:
Hi Shawn

That was so great of you to explain the architecture in such a detail. I 
enjoyed reading it multiple times.

I have a question here:

You mentioned that we can use crc32(DocumentId)% NumServers. Now actually I am 
using that in my data-config.xml in the sql query itself, something like:

For Documents to be indexed on Server 1: select DocumentId,PNum,... from Sample 
where crc32(DocumentId)%2=0;
For Documents to be indexed on Server 2: select DocumentId,PNum,... from Sample 
where crc32(DocumentId)%2=1;

Will that be a right way? Will it not be a slow query?

Thanks once again.

Those queries look good. Compared to an unqalified SELECT, I'm sure the crc32 will slow it down, but unless your database hardware is not up to the job, Solr will probably be more of a bottleneck than the DB.

You can have a generic DIH config and pass the information in with the dataimport:

url="jdbc:mysql://${dataimporter.request.dbHost}/${dataimporter.request.dbSchema}?zeroDateTimeBehavior=convertToNull"
<snip>
        SELECT * FROM ${dataimporter.request.dataView}
        WHERE (
          (
            did &gt; ${dataimporter.request.minDid}
            AND did &lt;= ${dataimporter.request.maxDid}
          )
          ${dataimporter.request.extraWhere}
        ) AND (crc32(did) % ${dataimporter.request.numShards})
          IN (${dataimporter.request.modVal})

This is the URL template that will work with the above DIH config:

http://HOST:PORT/solr/CORE/dataimport?command=COMMAND&dbHost=DBSERVER&dbSchema=DBSCHEMA&dataView=DATAVIEW&numShards=NUMSHARDS&modVal=MODVAL&minDid=MINDID&maxDid=MAXDID&extraWhere=EXTRAWHERE

Under normal circumstances extraWhere is blank. It's there for special-purpose importing.

Thanks,
Shawn

Reply via email to