Shalin,

Thanks for your questions- the mystery is solved this morning. My "unique"
key was only unique within an entity and not between them. There was only
one instance of overlap- the no-longer mysterious record and its
doppelganger.

All the other symptoms were side effects from how I was troubleshooting.
For example, if I did a full import, the doppelganger record (which I didnt
know about) would be imported- but my test query was only looking for the
one that didnt make it in. However, if I imported only that entity, it
would, as expected, update the index record and things would appear fine to
me.

So, no bug. Just plain old bad/narrow troubleshooting combined with
coincidence (only record not getting imported is first row, etc).

-justin


On Mon, Mar 18, 2013 at 7:34 PM, Shalin Shekhar Mangar <
[email protected]> wrote:

> That does sound perplexing.
>
> Justin, can you tell us which field in the query is your record id? What is
> the record id's type in database and in solr schema? What is your unique
> key and its type in solr schema?
>
>
> On Tue, Mar 19, 2013 at 5:19 AM, Justin L. <[email protected]> wrote:
>
> > Every time I do an import, DataImportHandler is not importing 1 row from
> my
> > database.
> >
> > I have 3 entities each defined with a single query. I have confirmed, by
> > looking at totals from solr as well as comparing a "*:*" query to direct
> db
> > queries-- exactly 1 row is missing every time. And its the same row- the
> > first row of one of my entities when sorted by primary key. The other two
> > entities are fully imported without trouble.
> >
> > There are no errors in the log- even when DIH logging is turned up to
> FINE.
> > When I alter the query to retrieve only the mysterious record, it shows
> up
> > as "Fetched: 1 Skipped: 0 Processed: 1". But when I do a query for *:* it
> > returns 0 documents.
> >
> > Ready for a twist? The DIH query for this entity does not have an ORDER
> BY
> > clause- when I add one to sort by primary key DESC it imports all of the
> > rows for that entity, including the mysterious record.
> >
> > Ready to have your mind blown? I am using the alternative method for
> doing
> > delta imports (see query below). When I make clean=false, and update the
> > timestamp on the mysterious record- yup- it gets imported properly.
> >
> >
> >
> > Because I have the ORDER BY DESC hack, I can get by and live to fight
> > another day. But I thought someone might like to know this because I
> think
> > I am hitting a bug in DIH- specifically, something after the querying but
> > before the posting to solr. If someone familiar with DIH innards wants to
> > suggest where I should look or how to step through it, I'd be willing to
> > take a look.
> >
> > xoxo,
> > Justin
> >
> >
> > * Fun facts:
> > Solr 4.0
> > Oracle 11g
> > The mysterious record's id is "000001"
> > I use field elements to rename the columns rather than in-the-sql aliases
> > because of a problem I had with them earlier. But I will try changing
> that.
> >
> >
> > * Alternative delta import method:
> >
> > http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport
> >
> >
> > * DIH query that should import mysterious record:
> >
> > select organization_name, organization_id, address
> > from organization o
> > join rolodex r on r.rolodex_id = o.contact_address_id
> > and r.sponsor_address_flag = 'N'
> > and r.actv_ind = 'Y'
> > where '${dataimporter.request.clean}' = 'true'
> > or to_char(o.update_timestamp,'YYYY-MM-DD HH24:MI:SS') >
> > '${dataimporter.organization.last_index_time
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Reply via email to