On Sun, Nov 8, 2009 at 8:25 AM, Bertie Shen <bertie.s...@gmail.com> wrote: > I have figured out a way to solve this problem: just specify a > single <document> blah blah blah </document>. Under <document>, specify > multiple top level entity entries, each of which corresponds to one table > data. > > So each top level entry will map one row in it to a document in Lucene > index. <document> in DIH is *NOT* mapped to a document in Lucene index while > top-level entity is. I feel <document> tag is redundant and misleading in > data config and thus should be removed.
There are some common attributes specified at the <document> level . It still acts as a container tag . > > Cheers. > > On Sat, Nov 7, 2009 at 9:43 AM, Bertie Shen <bertie.s...@gmail.com> wrote: > >> I have the same problem. I had thought we could specify multiple <document> >> blah blah blah</document>s, each of which is mapping one table in the RDBMS. >> But I found it was not the case. It only picks the first <document>blah blah >> blah</document> to do indexing. >> >> I think Rupert's and my request are pretty common. Basically there are >> multiple tables in RDBMS, and we want each row in each table become a >> document in Lucene index. How can we write one data config.xml file to let >> DataImportHandler import multiple tables at the same time? >> >> Rupert, have you figured out a way to do it? >> >> Thanks. >> >> >> >> On Tue, Sep 8, 2009 at 3:42 PM, Rupert Fiasco <rufia...@gmail.com> wrote: >> >>> Maybe I should be more clear: I have multiple tables in my DB that I >>> need to save to my Solr index. In my app code I have logic to persist >>> each table, which maps to an application model to Solr. This is fine. >>> I am just trying to speed up indexing time by using DIH instead of >>> going through my application. From what I understand of DIH I can >>> specify one dataSource element and then a series of document/entity >>> sets, for each of my models. But like I said before, DIH only appears >>> to want to index the first document declared under the dataSource tag. >>> >>> -Rupert >>> >>> On Tue, Sep 8, 2009 at 4:05 PM, Rupert Fiasco<rufia...@gmail.com> wrote: >>> > I am using the DataImportHandler with a JDBC datasource. From my >>> > understanding of DIH, for each of my "content types" e.g. Blog posts, >>> > Mesh Categories, etc I would construct a series of document/entity >>> > sets, like >>> > >>> > <dataConfig> >>> > <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://...." /> >>> > >>> > <!-- BLOG ENTRIES --> >>> > <document name="blog_entries"> >>> > <entity name="blog_entries" query="select >>> > id,title,keywords,summary,data,title as name_fc,'BlogEntry' as type >>> > from blog_entries"> >>> > <field column="id" name="pk_i" /> >>> > <field column="id" name="id" /> >>> > <field column="title" name="text_t" /> >>> > <field column="data" name="text_t" /> >>> > </entity> >>> > </document> >>> > >>> > <!-- MESH CATEGORIES --> >>> > <document name="mesh_category"> >>> > <entity name="mesh_categories" query="select >>> > id,name,node_key,name as name_fc,'MeshCategory' as type from >>> > mesh_categories"> >>> > <field column="id" name="pk_i" /> >>> > <field column="id" name="id" /> >>> > <field column="name" name="text_t" /> >>> > <field column="node_key" name="string" /> >>> > <field column="name_fc" name="facet_value" /> >>> > <field column="type" name="type_t" /> >>> > </entity> >>> > </document> >>> > </datasource> >>> > </dataConfig> >>> > >>> > >>> > Solr parses this just fine and allows me to issue a >>> > /dataimport?command=full-import and it runs, but it only runs against >>> > the "first" document (blog_entries). It doesnt run against the 2nd >>> > document (mesh_categories). >>> > >>> > If I remove the 2 document elements and wrap both entity sets in just >>> > one document tag, then both sets get indexed, which seemingly achieves >>> > my goal. This just doesnt make sense from my understanding of how DIH >>> > works. My 2 content types are indeed separate so they logically >>> > represent two document types, not one. >>> > >>> > Is this correct? What am I missing here? >>> > >>> > Thanks >>> > -Rupert >>> > >>> >> >> > -- ----------------------------------------------------- Noble Paul | Principal Engineer| AOL | http://aol.com