[jira] Commented: (SOLR-469) Data Import RequestHandler

Shalin Shekhar Mangar (JIRA) Thu, 26 Jun 2008 07:18:09 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608447#action_12608447
 ]


Shalin Shekhar Mangar commented on SOLR-469:
--------------------------------------------

bq. Patch applies cleanly, tests pass, although I notice several @ignore in 
there.
The @ignore are present in TestJdbcDataSource (for lack of mysql to test with) 
and in TestScriptTransformer (script tests can only be run with Java 6 which 
has a JS ScriptEngine present by default). We can rewrite the test with Derby 
if needed.

bq. Also, I notice several interfaces that have a number of methods on them. 
Have you thought about abstract base classes instead?
Apart from the ones Noble pointed out, there's Evaluator which users can use to 
extend the power of VariableResolver. The EvaluatorBag provides some generally 
useful implementations. Probably the context can be passed to Evaluator as 
well. Apart from that, I'm not sure if/how they would change in the future. An 
AbstractDataSource can be added -- maybe we can templatize the query as well in 
addition to the return type.

bq. What relation does the Context have to the HttpDataSource? 
The Context is independent of a data source. It's just extra information which 
is passed along if someone needs to use. Most of the implementation do not 
actually use it.

bq. What if I wanted to slurp from a table on the fly?
If you mean passing an SQL query on the fly as a request parameter then no, it 
is not supported. We haven't seen a use-case for it yet -- since schema and 
indexing are well defined in advance and there is no harm in putting the query 
in the configuration. However, if someone really wants to do something like 
that, he/she can pass a full data-config as a request parameter (debug mode) 
which can be executed. The interactive mode uses this approach. An alternate 
approach can be to extend SqlEntityProcessor and override the getQuery method 
to use the Context#getRequestParameters and if sql param is present, use that 
as the query instead of the sql in configuration.

bq. Interactive mode has a bit of a chicken and the egg problem when it comes 
to JDBC, right, in that the Driver needs to be present in Solr/lib right?
Yes, to play interactively while using a JdbcDataSource, one would need to have 
the driver jar present in the class-path before hand. The interactive mode is 
however independent -- HttpDataSource does not have this limitation (slashdot 
example on the wiki)

bq. In the JDBCDataSource, not sure I follow the connection stuff. Can you 
explain a bit? 
The connection is acquired once and used throught the import process. It is 
closed if not used for 10 seconds. The idea behind the time-out was to avoid 
the connection getting closed by the server due to the inactivity. Apart from 
that scenario, there's very less probability of a connection error happening -- 
and even if it did, we may not have a way to deal with it.



> Data Import RequestHandler
> --------------------------
>
>                 Key: SOLR-469
>                 URL: https://issues.apache.org/jira/browse/SOLR-469
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Grant Ingersoll
>             Fix For: 1.3
>
>         Attachments: SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch
>
>
> We need a RequestHandler Which can import data from a DB or other dataSources 
> into the Solr index .Think of it as an advanced form of SqlUpload Plugin 
> (SOLR-103).
> The way it works is as follows.
>     * Provide a configuration file (xml) to the Handler which takes in the 
> necessary SQL queries and mappings to a solr schema
>           - It also takes in a properties file for the data source 
> configuraution
>     * Given the configuration it can also generate the solr schema.xml
>     * It is registered as a RequestHandler which can take two commands 
> do-full-import, do-delta-import
>           -  do-full-import - dumps all the data from the Database into the 
> index (based on the SQL query in configuration)
>           - do-delta-import - dumps all the data that has changed since last 
> import. (We assume a modified-timestamp column in tables)
>     * It provides a admin page
>           - where we can schedule it to be run automatically at regular 
> intervals
>           - It shows the status of the Handler (idle, full-import, 
> delta-import)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-469) Data Import RequestHandler

Reply via email to