Hi All,

I am a Java/J2ee programmer and very new to SOLR. I would  like to index a
table in a postgresSql database to SOLR. Then searching the records from a
GUI (Jsp Page) and showing the results in tabular form. Could any one help
me out with a simple sample code.

Thank you.

Regards,
Ashique

On Fri, May 13, 2011 at 4:53 AM, Weiss, Eric <wei...@llnl.gov> wrote:

> Apologies in advance if this topic/question has been previously answered…I
> have scoured the docs, mail archives, web looking for an answer(s) with no
> luck.  I am sure I am just being dense or missing something obvious…please
> point out my stupidity as my head hurts trying to get this working.
>
> Solr 3.1
> Java 1.6
> Eclipse/Tomcat 7/Maven 2.x
>
> Goal: to extract manufacturer names from a repeating list of keywords each
> denoted by a Category, one of which is "Manufacturer", and load them into a
> MsgKeywordMF field  (see xml below)
>
> I have xml files I am loading via DIH.  This an abbreviated example xml
> data (each file has repeating "Report" items, each report has repeating
> MsgSet, Msg, MsgList, etc items).  Notice the nested repeating groups,
> namely MsgItems, within each document (Report):
>
>
> <Report>
>
>  <ReportMeta>
>
>    <ReportDate>02/22/2011</ReportDate>
>
>     …
>
>  </ReportMeta>
>
>  <MsgSet>
>
>    <Msg>
>
>      <SourceDocID>http://someurl.com/path/to/doc</SourceDocID>
>
>       …
>
>      <DocumentText>........blah blah</DocumentText>
>
>      <MsgList>
>
>        <MsgItem>
>
>          <MsgType>SomeType</MsgType>
>
>          <Category>Location</Category>
>
>          <Keyword>USA</Keyword>
>
>        </MsgItem>
>
>        <MsgItem>
>
>          <MsgType>AnotherType</MsgType>
>
>          <Category>Manufacturer</Category>
>
>          <Keyword>Apple</Keyword>
>
>        </MsgItem>
>
>        …
>
>      </MsgList>
>
>    </Msg>
>
>  </MsgSet>
>
> </Report>
> <Report>
> …
> </Report>
> <Report>
> …
> </Report>
> …
>
> Here is my data-config.xml:
>
>
> <dataConfig>
>
>  <dataSource type="FileDataSource" encoding="UTF-8" />
>
>
>  <document>
>
>    <entity name="fileload" rootEntity="false"
>
>            processor="FileListEntityProcessor" fileName="^.*\.xml$"
> recursive="false" baseDir="/files/xml/">
>
>      <entity name="report"
>
>            rootEntity="true" pk="id"
>
>              url="${fileload.fileAbsolutePath}"
> processor="XPathEntityProcessor"
>
>              forEach="/Report/MsgSet/Msg" onError="skip"
>
>              transformer="DateFormatTransformer,RegexTransformer">
>
>          <field column="DocumentText"
> xpath="/Report/MsgSet/Msg/DocumentText"/>
>
>          <field column="id" xpath="/Report/MsgSet/Msg/SourceDocID"/>
>
>  <field column="MsgCategory"
> xpath="/Report/MsgSet/Msg/MsgList/MsgItem/Category" />
>
>  <field column="MsgKeyword"
> xpath="/Report/MsgSet/Msg/MsgList/MsgItem/Keyword" />
>
>  <field column="MsgKeywordMF"
> xpath="/Report/MsgSet/Msg/MsgList/MsgItem[Category='Manufacturer']/Keyword"
> />
>
>          …
>
>      </entity>
>
>    </entity>
>
>  </document>
>
> </dataConfig>
>
>
> As seen in my config and sample data above, I am extracting the repeating
> "Keywords" into the the MsgKeyword field.  Also, and the part that does NOT
> work, I am trying to extract into a separate field just the keywords that
> have a "Category" of "Manufacturer" -->   <field column="MsgKeywordMF"
> xpath="/Report/MsgSet/Msg/MsgList/MsgItem[Category='Manufacturer']/Keyword"
> />
>
> I have also tried: <field column="MsgKeywordMF"
> xpath="/Report/MsgSet/Msg/MsgList/MsgItem[@Category='Manufacturer']/Keyword"
> />
> …after changing the "Category" to an attribute of MsgItem (<MsgItem
> Category="Location">) but it too fails to match.
>
> I have tested my xpath notation against my xml data file using various
> xpath evaluator tools, like within Eclipse, and it matches perfectly…but I
> can't get it to match/work during import.
>
> As I am able to understand it, DIH does not support nested/correlated
> entities, at least not with XML data sources using nested entity tags.  I've
> tried without success to nest entities but I can't "correlate" the nested
> entity with the parent.  I think the way I'm trying should work, but no luck
> so far….
>
> BTW, I can't easily change the xml format, although it is possible with
> some pain…
>
> Any ideas?
>
> TIA,
> -- Eric
>
>

Reply via email to