[ https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12603662#action_12603662 ]
patrick o'leary commented on SOLR-469: -------------------------------------- There's a slight problem using Connector/J for mysql, in that it doesn't fully implement the jdbc spec for setFetchSize, resulting in all rows in mysql being selected into memory. Connector/J states that you must pass ?useCursorFetch=true in the connect string, but it exposes another mysql bug with server-side parsed queries throwing an error of "incorrect key file" on the temp tables generated by the cursor, as yet there isn't a fix in mysql that I know of. Something that seems to work is to set the batchSize to Integer.MIN_VALUE: JdbcDataSource.java {code} if (bsz != null) { try { batchSize = Integer.parseInt(bsz); if (batchSize < 0) batchSize = Integer.MIN_VALUE; // pjaol : setting batchSize to <0 in dataSource forces connector / j to use Integer.MIN_VALUE } catch (NumberFormatException e) { LOG.log(Level.WARNING, "Invalid batch size: " + bsz); } } {code} This basically puts the result set size at 1 row, a little slow, but if you can't set your JVM memory settings high enough it gives you a option. Also suggest null-ing the row hashmap in DocBuilder immediately after use to allow GC to clean up the reference faster within eden space. DocBuilder.java {code} if (entity.isDocRoot) { if (stop.get()) return; boolean result = writer.upload(doc); doc = null; if (result) importStatistics.docCount.incrementAndGet(); } arow = null; // pjaol : set reference to hashmap to null to eliminate strong reference } catch (DataImportHandlerException e) .......... {code} > Data Import RequestHandler > -------------------------- > > Key: SOLR-469 > URL: https://issues.apache.org/jira/browse/SOLR-469 > Project: Solr > Issue Type: New Feature > Components: update > Affects Versions: 1.3 > Reporter: Noble Paul > Assignee: Grant Ingersoll > Fix For: 1.3 > > Attachments: SOLR-469-contrib.patch, SOLR-469.patch, SOLR-469.patch, > SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, > SOLR-469.patch, SOLR-469.patch > > > We need a RequestHandler Which can import data from a DB or other dataSources > into the Solr index .Think of it as an advanced form of SqlUpload Plugin > (SOLR-103). > The way it works is as follows. > * Provide a configuration file (xml) to the Handler which takes in the > necessary SQL queries and mappings to a solr schema > - It also takes in a properties file for the data source > configuraution > * Given the configuration it can also generate the solr schema.xml > * It is registered as a RequestHandler which can take two commands > do-full-import, do-delta-import > - do-full-import - dumps all the data from the Database into the > index (based on the SQL query in configuration) > - do-delta-import - dumps all the data that has changed since last > import. (We assume a modified-timestamp column in tables) > * It provides a admin page > - where we can schedule it to be run automatically at regular > intervals > - It shows the status of the Handler (idle, full-import, > delta-import) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.