And, a use case: Tika blows up on some files. But we still want other
data like file name etc. and an empty text field. So:

<entity rootEntity="false">
    <field set unique id and file name etc.>
    <entity blah blah is a document
         use "Tika Empty Parser">
         <field failed="true"/>
     </entity
     <entity blah blah is a document
         use "Tika Auto Parser"
         onError="skip">
          <field failed="false">
       </entity>
</entity>

Both documents have the same unique id. If the Tika autoparser uses
PDF and the PDF works, the second document overwrites the first. If
the PDF blows up, the second document skips and: the first document
goes in.

Ugly, yes, but a testament to the maturity of DIH that it had enough
tools to work around a Tika weakness. Oh, and the AutoParser does not
work: SOLR-2116:
https://issues.apache.org/jira/browse/SOLR-2116

In my previous example, the innermost entities below should be <field>
not <entity>. Sorry for any confusion.

On Sat, Dec 18, 2010 at 4:22 PM, Lance Norskog <goks...@gmail.com> wrote:
> You can have multiple documents generated by the same data-config:
>
> <dataConfig>
>  <dataSource name="ds1" .../>
>  <dataSource name="ds2" .../>
>  <dataSource name="ds3" .../>
>  <document>
>   <entity blah blah rootEntity="false">
>       <entity blah blah this is a document>
>          <entity sets unique id/>
>       </document>
>       <document blah blah this is another document>
>          <entity sets unique id>
>       </document>
>  </document>
> </dataConfig>
>
> It's the 'rootEntity="false" that makes the child entity a document.
>
> On Sat, Dec 18, 2010 at 7:43 AM, Dennis Gearon <gear...@sbcglobal.net> wrote:
>> Just curious, do these tables have the same schema, like a set of shards 
>> would?
>>
>> If not, how do you map them to the index?
>>
>>  Dennis Gearon
>>
>>
>> Signature Warning
>> ----------------
>> It is always a good idea to learn from your own mistakes. It is usually a 
>> better
>> idea to learn from others’ mistakes, so you do not have to make them 
>> yourself.
>> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>>
>>
>> EARTH has a Right To Life,
>> otherwise we all die.
>>
>>
>>
>> ----- Original Message ----
>> From: Koji Sekiguchi <k...@r.email.ne.jp>
>> To: solr-user@lucene.apache.org
>> Sent: Sat, December 18, 2010 5:19:08 AM
>> Subject: Re: Is there a way to create multiple <doc> using DIH and access the
>> data pertaining to a particular <doc name> ?
>>
>> (10/11/11 1:57), bbarani wrote:
>>>
>>> Hi,
>>>
>>> I have a peculiar situation where we are trying to use SOLR for indexing
>>> multiple tables (There is no relation between these tables). We are trying
>>> to use the SOLR index instead of using the source tables and hence we are
>>> trying to create the SOLR index as that of source tables.
>>>
>>> There are 3 tables which needs to be indexed.
>>>
>>> Table 1, table 2 and table 3.
>>>
>>> I am trying to index each table in seperate doc tag with different doc tag
>>> name and each table has some of the common field names. For Ex:
>>>
>>> <document name="DataStoreElement">
>>>     <entity name="DataStoreElement" query="">
>>>     <field column="DATA_STOR" name="DATA_STO"/>
>>>     </entity>
>>> </document>
>>> <document name="DataStore">
>>>     <entity name="DataStore" query="">
>>>     <field column="DATA_STOR" name="DATA_STO"/>
>>>     </entity>
>>> </document>
>>
>> Barani,
>>
>> You cannot have multiple documents in a data-config, but you can
>> have multiple entities in a document. And if your table 1,2, and 3
>> come from different dataSources, you can have multiple data sources
>> in a data-config. If so, you should use dataSource attribute of entity
>> element to refer to the name of dataSource:
>>
>> <dataConfig>
>>  <dataSource name="ds1" .../>
>>  <dataSource name="ds2" .../>
>>  <dataSource name="ds3" .../>
>>  <document>
>>    <entity name="t1" dataSource="d1" query="SELECT * from t1 ..." .../>
>>    <entity name="t2" dataSource="d2" query="SELECT * from t2 ..." .../>
>>    <entity name="t3" dataSource="d3" query="SELECT * from t3 ..." .../>
>>  </document>
>> </dataConfig>
>>
>> Koji
>> -- http://www.rondhuit.com/en/
>>
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to