Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature

Noble Paul നോബിള്‍ नोब्ळ् Tue, 02 Dec 2008 08:51:12 -0800

You can write the details to a file using a Transformer itself.

It is wise to stick to the public API as far as possible. We will
maintain back compat and your code will be usable w/ newer versions.



On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese <[EMAIL PROTECTED]> wrote:
>
> Thanks I really apreciate your help.
>
> I didn't explain myself so well in here:
>
>> 2.-This is probably my most difficult goal.
>> Deltaimport reads a timestamp from the dataimport.properties and
>> modify/add
>> all documents from db wich were inserted after that date. What I want is
>> to
>> be able to save in the field the id of the last idexed doc. So in the next
>> time I ejecute the indexer make it start indexing from that last indexed
>> id
>> doc.
> You can use a Transformer to write something to the DB.
> Context#getDataSource(String) for each row
>
> When I said:
>
>> be able to save in the field the id of the last idexed doc
> I made a mistake, wanted to mean :
>
> be able to save in the file (dataimport.properties) the id of the last
> indexed doc.
> The point would be to do my own deltaquery indexing from the last doc
> indexed id instead of the timestamp.
> So I think this would not work in that case (it's my mistake because of the
> bad explanation):
>
>>You can use a Transformer to write something to the DB.
>>Context#getDataSource(String) for each row
>
> It is because I was saying:
>> I think I should begin modifying the SolrWriter.java and DocBuilder.java.
>> Creating functions like getStartTime, persistStartTime... for ID control
>
> I am in the correct direction?
>  Sorry for my englis and thanks in advance
>
>
> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>
>> On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese <[EMAIL PROTECTED]>
>> wrote:
>>>
>>> Hey there,
>>>
>>> I have my dataimporthanlder almost completely configured. I am missing
>>> three
>>> goals. I don't think I can reach them just via xml conf or transformer
>>> and
>>> sqlEntitProcessor plugin. But need to be sure of that.
>>> If there's no other way I will hack some solr source classes, would like
>>> to
>>> know the best way to do that. Once I have it solved, I can upload or post
>>> the source in the forum in case someone think it can be helpful.
>>>
>>> 1.- Every time I execute dataimporthandler (to index data from a db), at
>>> the
>>> start time or end time I need to delete some expired documents. I have to
>>> delete them from the database and from the index. I know wich documents
>>> must
>>> be deleted because of a field in the db that says it. Would not like to
>>> delete first all from DB or first all from index but one from index and
>>> one
>>> from doc every time.
>>
>> You can override the init() destroy() of the SqlEntityProcessor and
>> use it as the processor for the root entity. At this point you can run
>> the necessary db queries and solr delete queries . look at
>> Context#getSolrCore() and Context#getdataSource(String)
>>
>>
>>> The "delete mark" is setted as an update in the db row so I think I could
>>> use deltaImport. Don't know If deletedPkQuery is the way to do that. Can
>>> not
>>> find so much information about how to make it work. As deltaQuery
>>> modifies
>>> docs (delete old and insert new) I supose it must be a easy way to do
>>> this
>>> just doing the delete and not the new insert.
>> deletedPkQuery does everything first. it runs the query and uses that
>> to identify the deleted rows.
>>>
>>> 2.-This is probably my most difficult goal.
>>> Deltaimport reads a timestamp from the dataimport.properties and
>>> modify/add
>>> all documents from db wich were inserted after that date. What I want is
>>> to
>>> be able to save in the field the id of the last idexed doc. So in the
>>> next
>>> time I ejecute the indexer make it start indexing from that last indexed
>>> id
>>> doc.
>> You can use a Transformer to write something to the DB.
>> Context#getDataSource(String) for each row
>>
>>> The point of doing this is that if I do a full import from a db with lots
>>> of
>>> rows the app could encounter a problem in the middle of the execution and
>>> abort the process. As deltaquey works I would have to restart the
>>> execution
>>> from the begining. Having this new functionality I could optimize the
>>> index
>>> and start from the last indexed doc.
>>> I think I should begin modifying the SolrWriter.java and DocBuilder.java.
>>> Creating functions like getStartTime, persistStartTime... for ID control
>>>
>>> 3.-I commented before about this last point. I want to give boost to doc
>>> fields at indexing time.
>>>>>Adding fieldboost is a planned item.
>>>
>>>>>It must work as follows .
>>>>>Add a special value $fieldBoost.<fieldname> to the row map
>>>
>>>>>And DocBuilder should respect that. You can raise a bug and we can
>>>>>commit it soon.
>>> How can I do to rise a bug?
>> https://issues.apache.org/jira/secure/CreateIssue!default.jspa
>>>
>>> Thanks in advance
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20788755.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20790542.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul

Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature

Reply via email to