Shalin, Thanks for your questions- the mystery is solved this morning. My "unique" key was only unique within an entity and not between them. There was only one instance of overlap- the no-longer mysterious record and its doppelganger.
All the other symptoms were side effects from how I was troubleshooting. For example, if I did a full import, the doppelganger record (which I didnt know about) would be imported- but my test query was only looking for the one that didnt make it in. However, if I imported only that entity, it would, as expected, update the index record and things would appear fine to me. So, no bug. Just plain old bad/narrow troubleshooting combined with coincidence (only record not getting imported is first row, etc). -justin On Mon, Mar 18, 2013 at 7:34 PM, Shalin Shekhar Mangar < [email protected]> wrote: > That does sound perplexing. > > Justin, can you tell us which field in the query is your record id? What is > the record id's type in database and in solr schema? What is your unique > key and its type in solr schema? > > > On Tue, Mar 19, 2013 at 5:19 AM, Justin L. <[email protected]> wrote: > > > Every time I do an import, DataImportHandler is not importing 1 row from > my > > database. > > > > I have 3 entities each defined with a single query. I have confirmed, by > > looking at totals from solr as well as comparing a "*:*" query to direct > db > > queries-- exactly 1 row is missing every time. And its the same row- the > > first row of one of my entities when sorted by primary key. The other two > > entities are fully imported without trouble. > > > > There are no errors in the log- even when DIH logging is turned up to > FINE. > > When I alter the query to retrieve only the mysterious record, it shows > up > > as "Fetched: 1 Skipped: 0 Processed: 1". But when I do a query for *:* it > > returns 0 documents. > > > > Ready for a twist? The DIH query for this entity does not have an ORDER > BY > > clause- when I add one to sort by primary key DESC it imports all of the > > rows for that entity, including the mysterious record. > > > > Ready to have your mind blown? I am using the alternative method for > doing > > delta imports (see query below). When I make clean=false, and update the > > timestamp on the mysterious record- yup- it gets imported properly. > > > > > > > > Because I have the ORDER BY DESC hack, I can get by and live to fight > > another day. But I thought someone might like to know this because I > think > > I am hitting a bug in DIH- specifically, something after the querying but > > before the posting to solr. If someone familiar with DIH innards wants to > > suggest where I should look or how to step through it, I'd be willing to > > take a look. > > > > xoxo, > > Justin > > > > > > * Fun facts: > > Solr 4.0 > > Oracle 11g > > The mysterious record's id is "000001" > > I use field elements to rename the columns rather than in-the-sql aliases > > because of a problem I had with them earlier. But I will try changing > that. > > > > > > * Alternative delta import method: > > > > http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport > > > > > > * DIH query that should import mysterious record: > > > > select organization_name, organization_id, address > > from organization o > > join rolodex r on r.rolodex_id = o.contact_address_id > > and r.sponsor_address_flag = 'N' > > and r.actv_ind = 'Y' > > where '${dataimporter.request.clean}' = 'true' > > or to_char(o.update_timestamp,'YYYY-MM-DD HH24:MI:SS') > > > '${dataimporter.organization.last_index_time > > > > > > -- > Regards, > Shalin Shekhar Mangar. >
