[ 
https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-469:
---------------------------------------

    Attachment: SOLR-469.patch

A new patch consisting of a few bug fixes and some major new features. The 
changes include:

 * No need to write fields in data-config if the field name from DB/XML and 
field-name in schema.xml are the same. This removes a lot of useless verbosity 
from data-config.xml 
 * A cool new interactive development page, in which you write/change 
data-config.xml and see results immeadiately making interations extremely fast! 
Use http://host:port/solr/admin/dataimport.jsp or if using multi-core 
http://host:port/solr/core-name/admin/dataimport.jsp
 * You can start using the interactive mode without specifying data-config file 
in solrconfig.xml, however, specifying the data sources is necessary in 
solrconfig.xml
 * Interactive development uses a new debug mode in DataImportHandler, add 
debug=on to the full-import command to see the actual documents which are 
created by DataImportHandler. This shows the first 10 documents created by 
DataImportHandler using the existing config without committing them to solr. It 
supports the start and rows parameter (just like query params) which you can 
use to see any document. This comes in very useful when suppose the 1000th 
document failed during indexing and you want to see the reason. If there are 
exceptions, the stacktrace is shown with the response.
 * Verbose mode with verbose=on as a request parameter (used in conjunction 
with debug=on) which shows exactly how DataImportHandler created each document. 
 ** What query was executed?
 ** How much time it took?
 ** What rows it gave back?
 ** What transformers were applied and what was the result?
 ** Another advantage is that you can see the fields which are indexed but not 
stored
 * A show-config command has been added which gives the data-config.xml as a 
raw response (uses RawResponseWriter)
 * A new interface called Evaluator has been added which makes it possible to 
plugin new expression evaluators (for resolving variable names)
 * Using the same Evaluator interface, a few new evaluators have been added
 ** formatDate - use as ${dataimporter.functions.formatDate('NOW',yyyy-MM-dd 
HH:mm)}, this will format NOW as per the given format and return a string which 
can be used in queries or urls. It supports the full DateMathParser syntax. You 
can also format fields e.g. 
${dataimporter.functions.formatDate(A.purchase_date,dd-MM-yyyy)}
 ** encodeUrl - useful for URL-encoding parameters when making a HTTP call. Use 
as ${dataimport.functions.encodeUrl(emp.name)}
 ** escapeSql - useful for escaping parameters supplied in sql statements. This 
can replace quotes with two quotes to avoid sql syntax errors. Use as 
${dataimporter.functions.escapeSql(emp.name)}
 * Custom Evaluators can be specified in data-config.xml (more details and 
example will be added to the wiki)
 * HttpDataSource now reads the content encoding from the response by default. 
Previously it assumed the default encoding to be UTF-8. This behavior can be 
overriden by explicitly specifying an encoding in solrconfig.xml
 * A FileDataSource has been added which can read content from local files 
(e.g. XML feed files on local disk).
 * Transformers can signal skipping a document by adding a key "$skipDoc" with 
value "true" in the returned map.
 * NumberFormatTransformer is a new transformer which can be used to 
extract/convert numbers from strings. It uses the java.text.NumberFormat class 
in Java to provide its features.
 * The Context interface has been enhanced to add new methods for 
getting/setting session variables which can be used by Transformers to share 
data. Also a new method called getParentContext can enable a 
Transformer/EntityProcessor to get the parent entity's context in full imports.

Please let us know your comments and feedback. More details and examples will 
soon be added to the wiki page at http://wiki.apache.org/solr/DataImportHandler

> Data Import RequestHandler
> --------------------------
>
>                 Key: SOLR-469
>                 URL: https://issues.apache.org/jira/browse/SOLR-469
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Grant Ingersoll
>             Fix For: 1.3
>
>         Attachments: SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch
>
>
> We need a RequestHandler Which can import data from a DB or other dataSources 
> into the Solr index .Think of it as an advanced form of SqlUpload Plugin 
> (SOLR-103).
> The way it works is as follows.
>     * Provide a configuration file (xml) to the Handler which takes in the 
> necessary SQL queries and mappings to a solr schema
>           - It also takes in a properties file for the data source 
> configuraution
>     * Given the configuration it can also generate the solr schema.xml
>     * It is registered as a RequestHandler which can take two commands 
> do-full-import, do-delta-import
>           -  do-full-import - dumps all the data from the Database into the 
> index (based on the SQL query in configuration)
>           - do-delta-import - dumps all the data that has changed since last 
> import. (We assume a modified-timestamp column in tables)
>     * It provides a admin page
>           - where we can schedule it to be run automatically at regular 
> intervals
>           - It shows the status of the Handler (idle, full-import, 
> delta-import)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to