Hi All, I am a Java/J2ee programmer and very new to SOLR. I would like to index a table in a postgresSql database to SOLR. Then searching the records from a GUI (Jsp Page) and showing the results in tabular form. Could any one help me out with a simple sample code.
Thank you. Regards, Ashique On Fri, May 13, 2011 at 4:53 AM, Weiss, Eric <wei...@llnl.gov> wrote: > Apologies in advance if this topic/question has been previously answered…I > have scoured the docs, mail archives, web looking for an answer(s) with no > luck. I am sure I am just being dense or missing something obvious…please > point out my stupidity as my head hurts trying to get this working. > > Solr 3.1 > Java 1.6 > Eclipse/Tomcat 7/Maven 2.x > > Goal: to extract manufacturer names from a repeating list of keywords each > denoted by a Category, one of which is "Manufacturer", and load them into a > MsgKeywordMF field (see xml below) > > I have xml files I am loading via DIH. This an abbreviated example xml > data (each file has repeating "Report" items, each report has repeating > MsgSet, Msg, MsgList, etc items). Notice the nested repeating groups, > namely MsgItems, within each document (Report): > > > <Report> > > <ReportMeta> > > <ReportDate>02/22/2011</ReportDate> > > … > > </ReportMeta> > > <MsgSet> > > <Msg> > > <SourceDocID>http://someurl.com/path/to/doc</SourceDocID> > > … > > <DocumentText>........blah blah</DocumentText> > > <MsgList> > > <MsgItem> > > <MsgType>SomeType</MsgType> > > <Category>Location</Category> > > <Keyword>USA</Keyword> > > </MsgItem> > > <MsgItem> > > <MsgType>AnotherType</MsgType> > > <Category>Manufacturer</Category> > > <Keyword>Apple</Keyword> > > </MsgItem> > > … > > </MsgList> > > </Msg> > > </MsgSet> > > </Report> > <Report> > … > </Report> > <Report> > … > </Report> > … > > Here is my data-config.xml: > > > <dataConfig> > > <dataSource type="FileDataSource" encoding="UTF-8" /> > > > <document> > > <entity name="fileload" rootEntity="false" > > processor="FileListEntityProcessor" fileName="^.*\.xml$" > recursive="false" baseDir="/files/xml/"> > > <entity name="report" > > rootEntity="true" pk="id" > > url="${fileload.fileAbsolutePath}" > processor="XPathEntityProcessor" > > forEach="/Report/MsgSet/Msg" onError="skip" > > transformer="DateFormatTransformer,RegexTransformer"> > > <field column="DocumentText" > xpath="/Report/MsgSet/Msg/DocumentText"/> > > <field column="id" xpath="/Report/MsgSet/Msg/SourceDocID"/> > > <field column="MsgCategory" > xpath="/Report/MsgSet/Msg/MsgList/MsgItem/Category" /> > > <field column="MsgKeyword" > xpath="/Report/MsgSet/Msg/MsgList/MsgItem/Keyword" /> > > <field column="MsgKeywordMF" > xpath="/Report/MsgSet/Msg/MsgList/MsgItem[Category='Manufacturer']/Keyword" > /> > > … > > </entity> > > </entity> > > </document> > > </dataConfig> > > > As seen in my config and sample data above, I am extracting the repeating > "Keywords" into the the MsgKeyword field. Also, and the part that does NOT > work, I am trying to extract into a separate field just the keywords that > have a "Category" of "Manufacturer" --> <field column="MsgKeywordMF" > xpath="/Report/MsgSet/Msg/MsgList/MsgItem[Category='Manufacturer']/Keyword" > /> > > I have also tried: <field column="MsgKeywordMF" > xpath="/Report/MsgSet/Msg/MsgList/MsgItem[@Category='Manufacturer']/Keyword" > /> > …after changing the "Category" to an attribute of MsgItem (<MsgItem > Category="Location">) but it too fails to match. > > I have tested my xpath notation against my xml data file using various > xpath evaluator tools, like within Eclipse, and it matches perfectly…but I > can't get it to match/work during import. > > As I am able to understand it, DIH does not support nested/correlated > entities, at least not with XML data sources using nested entity tags. I've > tried without success to nest entities but I can't "correlate" the nested > entity with the parent. I think the way I'm trying should work, but no luck > so far…. > > BTW, I can't easily change the xml format, although it is possible with > some pain… > > Any ideas? > > TIA, > -- Eric > >