delta-import file?

On Wed, Dec 3, 2008 at 12:08 AM, Lance Norskog <[EMAIL PROTECTED]> wrote:
> Does the DIH delta feature rewrite the delta-import file for each set of 
> rows? If it does not, that sounds like a bug/enhancement.
> Lance
>
> -----Original Message-----
> From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, December 02, 2008 8:51 AM
> To: solr-user@lucene.apache.org
> Subject: Re: DataImportHandler: Deleteing from index and db; lastIndexed id 
> feature
>
> You can write the details to a file using a Transformer itself.
>
> It is wise to stick to the public API as far as possible. We will maintain 
> back compat and your code will be usable w/ newer versions.
>
>
> On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese <[EMAIL PROTECTED]> wrote:
>>
>> Thanks I really apreciate your help.
>>
>> I didn't explain myself so well in here:
>>
>>> 2.-This is probably my most difficult goal.
>>> Deltaimport reads a timestamp from the dataimport.properties and
>>> modify/add all documents from db wich were inserted after that date.
>>> What I want is to be able to save in the field the id of the last
>>> idexed doc. So in the next time I ejecute the indexer make it start
>>> indexing from that last indexed id doc.
>> You can use a Transformer to write something to the DB.
>> Context#getDataSource(String) for each row
>>
>> When I said:
>>
>>> be able to save in the field the id of the last idexed doc
>> I made a mistake, wanted to mean :
>>
>> be able to save in the file (dataimport.properties) the id of the last
>> indexed doc.
>> The point would be to do my own deltaquery indexing from the last doc
>> indexed id instead of the timestamp.
>> So I think this would not work in that case (it's my mistake because
>> of the bad explanation):
>>
>>>You can use a Transformer to write something to the DB.
>>>Context#getDataSource(String) for each row
>>
>> It is because I was saying:
>>> I think I should begin modifying the SolrWriter.java and DocBuilder.java.
>>> Creating functions like getStartTime, persistStartTime... for ID
>>> control
>>
>> I am in the correct direction?
>>  Sorry for my englis and thanks in advance
>>
>>
>> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>>
>>> On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese
>>> <[EMAIL PROTECTED]>
>>> wrote:
>>>>
>>>> Hey there,
>>>>
>>>> I have my dataimporthanlder almost completely configured. I am
>>>> missing three goals. I don't think I can reach them just via xml
>>>> conf or transformer and sqlEntitProcessor plugin. But need to be
>>>> sure of that.
>>>> If there's no other way I will hack some solr source classes, would
>>>> like to know the best way to do that. Once I have it solved, I can
>>>> upload or post the source in the forum in case someone think it can
>>>> be helpful.
>>>>
>>>> 1.- Every time I execute dataimporthandler (to index data from a
>>>> db), at the start time or end time I need to delete some expired
>>>> documents. I have to delete them from the database and from the
>>>> index. I know wich documents must be deleted because of a field in
>>>> the db that says it. Would not like to delete first all from DB or
>>>> first all from index but one from index and one from doc every time.
>>>
>>> You can override the init() destroy() of the SqlEntityProcessor and
>>> use it as the processor for the root entity. At this point you can
>>> run the necessary db queries and solr delete queries . look at
>>> Context#getSolrCore() and Context#getdataSource(String)
>>>
>>>
>>>> The "delete mark" is setted as an update in the db row so I think I
>>>> could use deltaImport. Don't know If deletedPkQuery is the way to do
>>>> that. Can not find so much information about how to make it work. As
>>>> deltaQuery modifies docs (delete old and insert new) I supose it
>>>> must be a easy way to do this just doing the delete and not the new
>>>> insert.
>>> deletedPkQuery does everything first. it runs the query and uses that
>>> to identify the deleted rows.
>>>>
>>>> 2.-This is probably my most difficult goal.
>>>> Deltaimport reads a timestamp from the dataimport.properties and
>>>> modify/add all documents from db wich were inserted after that date.
>>>> What I want is to be able to save in the field the id of the last
>>>> idexed doc. So in the next time I ejecute the indexer make it start
>>>> indexing from that last indexed id doc.
>>> You can use a Transformer to write something to the DB.
>>> Context#getDataSource(String) for each row
>>>
>>>> The point of doing this is that if I do a full import from a db with
>>>> lots of rows the app could encounter a problem in the middle of the
>>>> execution and abort the process. As deltaquey works I would have to
>>>> restart the execution from the begining. Having this new
>>>> functionality I could optimize the index and start from the last
>>>> indexed doc.
>>>> I think I should begin modifying the SolrWriter.java and DocBuilder.java.
>>>> Creating functions like getStartTime, persistStartTime... for ID
>>>> control
>>>>
>>>> 3.-I commented before about this last point. I want to give boost to
>>>> doc fields at indexing time.
>>>>>>Adding fieldboost is a planned item.
>>>>
>>>>>>It must work as follows .
>>>>>>Add a special value $fieldBoost.<fieldname> to the row map
>>>>
>>>>>>And DocBuilder should respect that. You can raise a bug and we can
>>>>>>commit it soon.
>>>> How can I do to rise a bug?
>>> https://issues.apache.org/jira/secure/CreateIssue!default.jspa
>>>>
>>>> Thanks in advance
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-
>>>> db--lastIndexed-id-feature-tp20788755p20788755.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> --Noble Paul
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db
>> --lastIndexed-id-feature-tp20788755p20790542.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>
>
>
> --
> --Noble Paul
>
>



-- 
--Noble Paul

Reply via email to