[jira] Commented: (SOLR-469) Data Import RequestHandler

patrick o'leary (JIRA) Mon, 09 Jun 2008 12:53:10 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12603662#action_12603662
 ]


patrick o'leary commented on SOLR-469:
--------------------------------------

There's a slight problem using Connector/J for mysql, in that it doesn't fully 
implement the jdbc spec for 
setFetchSize, resulting in all rows in mysql being selected into memory.

Connector/J states that you must pass ?useCursorFetch=true in the connect 
string, but it exposes another mysql bug with server-side parsed queries 
throwing an error of "incorrect key file" on the temp tables generated by the 
cursor, 
as yet there isn't a fix in mysql that I know of.

Something that seems to work is to set the batchSize to Integer.MIN_VALUE:

JdbcDataSource.java
{code}
 if (bsz != null) {
      try {
        batchSize = Integer.parseInt(bsz);
        if (batchSize < 0)
            batchSize = Integer.MIN_VALUE;  // pjaol : setting batchSize to <0 
in dataSource forces connector / j to use Integer.MIN_VALUE
      } catch (NumberFormatException e) {
        LOG.log(Level.WARNING, "Invalid batch size: " + bsz);
      }
    }
{code}

This basically puts the result set size at 1 row, a little slow, but if you 
can't set your JVM memory settings high enough
it gives you a option.

Also suggest null-ing the row hashmap in DocBuilder immediately after use to 
allow GC to clean up
the reference faster within eden space.

DocBuilder.java
{code}
    if (entity.isDocRoot) {
            if (stop.get())
              return;
            boolean result = writer.upload(doc);
            doc = null;
            if (result)
              importStatistics.docCount.incrementAndGet();
          }
          
       arow = null; // pjaol : set reference to hashmap to null to eliminate 
strong reference                                                   
       

       } catch (DataImportHandlerException e)
..........
{code}

> Data Import RequestHandler
> --------------------------
>
>                 Key: SOLR-469
>                 URL: https://issues.apache.org/jira/browse/SOLR-469
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Grant Ingersoll
>             Fix For: 1.3
>
>         Attachments: SOLR-469-contrib.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch
>
>
> We need a RequestHandler Which can import data from a DB or other dataSources 
> into the Solr index .Think of it as an advanced form of SqlUpload Plugin 
> (SOLR-103).
> The way it works is as follows.
>     * Provide a configuration file (xml) to the Handler which takes in the 
> necessary SQL queries and mappings to a solr schema
>           - It also takes in a properties file for the data source 
> configuraution
>     * Given the configuration it can also generate the solr schema.xml
>     * It is registered as a RequestHandler which can take two commands 
> do-full-import, do-delta-import
>           -  do-full-import - dumps all the data from the Database into the 
> index (based on the SQL query in configuration)
>           - do-delta-import - dumps all the data that has changed since last 
> import. (We assume a modified-timestamp column in tables)
>     * It provides a admin page
>           - where we can schedule it to be run automatically at regular 
> intervals
>           - It shows the status of the Handler (idle, full-import, 
> delta-import)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-469) Data Import RequestHandler

Reply via email to