I have the same problem. I had thought we could specify multiple <document> blah blah blah</document>s, each of which is mapping one table in the RDBMS. But I found it was not the case. It only picks the first <document>blah blah blah</document> to do indexing.
I think Rupert's and my request are pretty common. Basically there are multiple tables in RDBMS, and we want each row in each table become a document in Lucene index. How can we write one data config.xml file to let DataImportHandler import multiple tables at the same time? Rupert, have you figured out a way to do it? Thanks. On Tue, Sep 8, 2009 at 3:42 PM, Rupert Fiasco <rufia...@gmail.com> wrote: > Maybe I should be more clear: I have multiple tables in my DB that I > need to save to my Solr index. In my app code I have logic to persist > each table, which maps to an application model to Solr. This is fine. > I am just trying to speed up indexing time by using DIH instead of > going through my application. From what I understand of DIH I can > specify one dataSource element and then a series of document/entity > sets, for each of my models. But like I said before, DIH only appears > to want to index the first document declared under the dataSource tag. > > -Rupert > > On Tue, Sep 8, 2009 at 4:05 PM, Rupert Fiasco<rufia...@gmail.com> wrote: > > I am using the DataImportHandler with a JDBC datasource. From my > > understanding of DIH, for each of my "content types" e.g. Blog posts, > > Mesh Categories, etc I would construct a series of document/entity > > sets, like > > > > <dataConfig> > > <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://...." /> > > > > <!-- BLOG ENTRIES --> > > <document name="blog_entries"> > > <entity name="blog_entries" query="select > > id,title,keywords,summary,data,title as name_fc,'BlogEntry' as type > > from blog_entries"> > > <field column="id" name="pk_i" /> > > <field column="id" name="id" /> > > <field column="title" name="text_t" /> > > <field column="data" name="text_t" /> > > </entity> > > </document> > > > > <!-- MESH CATEGORIES --> > > <document name="mesh_category"> > > <entity name="mesh_categories" query="select > > id,name,node_key,name as name_fc,'MeshCategory' as type from > > mesh_categories"> > > <field column="id" name="pk_i" /> > > <field column="id" name="id" /> > > <field column="name" name="text_t" /> > > <field column="node_key" name="string" /> > > <field column="name_fc" name="facet_value" /> > > <field column="type" name="type_t" /> > > </entity> > > </document> > > </datasource> > > </dataConfig> > > > > > > Solr parses this just fine and allows me to issue a > > /dataimport?command=full-import and it runs, but it only runs against > > the "first" document (blog_entries). It doesnt run against the 2nd > > document (mesh_categories). > > > > If I remove the 2 document elements and wrap both entity sets in just > > one document tag, then both sets get indexed, which seemingly achieves > > my goal. This just doesnt make sense from my understanding of how DIH > > works. My 2 content types are indeed separate so they logically > > represent two document types, not one. > > > > Is this correct? What am I missing here? > > > > Thanks > > -Rupert > > >