delta-import file?
On Wed, Dec 3, 2008 at 12:08 AM, Lance Norskog <[EMAIL PROTECTED]> wrote: > Does the DIH delta feature rewrite the delta-import file for each set of > rows? If it does not, that sounds like a bug/enhancement. > Lance > > -----Original Message----- > From: Noble Paul നോബിള് नोब्ळ् [mailto:[EMAIL PROTECTED] > Sent: Tuesday, December 02, 2008 8:51 AM > To: solr-user@lucene.apache.org > Subject: Re: DataImportHandler: Deleteing from index and db; lastIndexed id > feature > > You can write the details to a file using a Transformer itself. > > It is wise to stick to the public API as far as possible. We will maintain > back compat and your code will be usable w/ newer versions. > > > On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese <[EMAIL PROTECTED]> wrote: >> >> Thanks I really apreciate your help. >> >> I didn't explain myself so well in here: >> >>> 2.-This is probably my most difficult goal. >>> Deltaimport reads a timestamp from the dataimport.properties and >>> modify/add all documents from db wich were inserted after that date. >>> What I want is to be able to save in the field the id of the last >>> idexed doc. So in the next time I ejecute the indexer make it start >>> indexing from that last indexed id doc. >> You can use a Transformer to write something to the DB. >> Context#getDataSource(String) for each row >> >> When I said: >> >>> be able to save in the field the id of the last idexed doc >> I made a mistake, wanted to mean : >> >> be able to save in the file (dataimport.properties) the id of the last >> indexed doc. >> The point would be to do my own deltaquery indexing from the last doc >> indexed id instead of the timestamp. >> So I think this would not work in that case (it's my mistake because >> of the bad explanation): >> >>>You can use a Transformer to write something to the DB. >>>Context#getDataSource(String) for each row >> >> It is because I was saying: >>> I think I should begin modifying the SolrWriter.java and DocBuilder.java. >>> Creating functions like getStartTime, persistStartTime... for ID >>> control >> >> I am in the correct direction? >> Sorry for my englis and thanks in advance >> >> >> Noble Paul നോബിള് नोब्ळ् wrote: >>> >>> On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese >>> <[EMAIL PROTECTED]> >>> wrote: >>>> >>>> Hey there, >>>> >>>> I have my dataimporthanlder almost completely configured. I am >>>> missing three goals. I don't think I can reach them just via xml >>>> conf or transformer and sqlEntitProcessor plugin. But need to be >>>> sure of that. >>>> If there's no other way I will hack some solr source classes, would >>>> like to know the best way to do that. Once I have it solved, I can >>>> upload or post the source in the forum in case someone think it can >>>> be helpful. >>>> >>>> 1.- Every time I execute dataimporthandler (to index data from a >>>> db), at the start time or end time I need to delete some expired >>>> documents. I have to delete them from the database and from the >>>> index. I know wich documents must be deleted because of a field in >>>> the db that says it. Would not like to delete first all from DB or >>>> first all from index but one from index and one from doc every time. >>> >>> You can override the init() destroy() of the SqlEntityProcessor and >>> use it as the processor for the root entity. At this point you can >>> run the necessary db queries and solr delete queries . look at >>> Context#getSolrCore() and Context#getdataSource(String) >>> >>> >>>> The "delete mark" is setted as an update in the db row so I think I >>>> could use deltaImport. Don't know If deletedPkQuery is the way to do >>>> that. Can not find so much information about how to make it work. As >>>> deltaQuery modifies docs (delete old and insert new) I supose it >>>> must be a easy way to do this just doing the delete and not the new >>>> insert. >>> deletedPkQuery does everything first. it runs the query and uses that >>> to identify the deleted rows. >>>> >>>> 2.-This is probably my most difficult goal. >>>> Deltaimport reads a timestamp from the dataimport.properties and >>>> modify/add all documents from db wich were inserted after that date. >>>> What I want is to be able to save in the field the id of the last >>>> idexed doc. So in the next time I ejecute the indexer make it start >>>> indexing from that last indexed id doc. >>> You can use a Transformer to write something to the DB. >>> Context#getDataSource(String) for each row >>> >>>> The point of doing this is that if I do a full import from a db with >>>> lots of rows the app could encounter a problem in the middle of the >>>> execution and abort the process. As deltaquey works I would have to >>>> restart the execution from the begining. Having this new >>>> functionality I could optimize the index and start from the last >>>> indexed doc. >>>> I think I should begin modifying the SolrWriter.java and DocBuilder.java. >>>> Creating functions like getStartTime, persistStartTime... for ID >>>> control >>>> >>>> 3.-I commented before about this last point. I want to give boost to >>>> doc fields at indexing time. >>>>>>Adding fieldboost is a planned item. >>>> >>>>>>It must work as follows . >>>>>>Add a special value $fieldBoost.<fieldname> to the row map >>>> >>>>>>And DocBuilder should respect that. You can raise a bug and we can >>>>>>commit it soon. >>>> How can I do to rise a bug? >>> https://issues.apache.org/jira/secure/CreateIssue!default.jspa >>>> >>>> Thanks in advance >>>> >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and- >>>> db--lastIndexed-id-feature-tp20788755p20788755.html >>>> Sent from the Solr - User mailing list archive at Nabble.com. >>>> >>>> >>> >>> >>> >>> -- >>> --Noble Paul >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db >> --lastIndexed-id-feature-tp20788755p20790542.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > --Noble Paul > > -- --Noble Paul