Hi,
you should change your data-config moving data that come from
FileListEntityProcessor to its entity, one level up. Try this configuration:

<dataConfig>
    <dataSource name="bin" type="BinFileDataSource" />
    <document>
        <entity name="f" dataSource="null" rootEntity="false"
            processor="FileListEntityProcessor"
                        transformer="TemplateTransformer"
            baseDir="/home/luca/Documents"
            fileName=".*\.(DOC)|(PDF)|(pdf)|(doc)|(docx)|(ppt)" onError="skip"
            recursive="true">

                        <field column="fileAbsolutePath" name="path" />
                        <field column="fileSize" name="size" />
            <field column="fileLastModified" name="lastmodified" />

            <entity name="tika-test" dataSource="bin"
processor="TikaEntityProcessor"
            url="${f.fileAbsolutePath}" format="text" onError="skip">
                <field column="Author" name="author" meta="true"/>
                <field column="title" name="title" meta="true"/>

                <!--<field column="text" />-->                  

            </entity>
        </entity>
    </document>
</dataConfig>


On Wed, Mar 28, 2012 at 3:50 AM, ZHANG Liang F <
liang.f.zh...@alcatel-sbell.com.cn> wrote:

> Could you please show me how to get those values inside
> TikaEntityProcessor?
>
> -----Original Message-----
> From: Ahmet Arslan [mailto:iori...@yahoo.com]
> Sent: 2012年3月27日 22:43
> To: solr-user@lucene.apache.org
> Subject: Re: how to store file path in Solr when using TikaEntityProcessor
>
>
> > I am using DIH to index local file system. But the file path, size and
> > lastmodified field were not stored. in the schema.xml I defined:
> >
> >  <fields>
> >    <field name="title" type="string"
> > indexed="true" stored="true"/>
> >    <field name="author" type="string"
> > indexed="true" stored="true" />
> >    <!--<field name="text" type="text"
> > indexed="true" stored="true" />
> >     liang added-->
> >    <field name="path" type="string"
> > indexed="true" stored="true" />
> >    <field name="size" type="long"
> > indexed="true" stored="true" />
> >    <field name="lastmodified" type="date"
> > indexed="true" stored="true" />
> >  </fields>
> >
> >
> > And also defined tika-data-config.xml:
> >
> > <dataConfig>
> >     <dataSource name="bin"
> > type="BinFileDataSource" />
> >     <document>
> >         <entity name="f"
> > dataSource="null" rootEntity="false"
> >
> > processor="FileListEntityProcessor"
> >
> > baseDir="E:/my_project/ecmkit/infotouch"
> >
> > fileName=".*\.(DOC)|(PDF)|(pdf)|(doc)|(docx)|(ppt)"
> > onError="skip"
> >
> > recursive="true">
> >             <entity
> > name="tika-test" dataSource="bin"
> > processor="TikaEntityProcessor"
> >
> > url="${f.fileAbsolutePath}" format="text"
> > onError="skip">
> >
> > <field column="Author" name="author" meta="true"/>
> >
> > <field column="title" name="title" meta="true"/>
> >
> > <!--
> >
> > <field column="text" name="text"/> -->
> >
> > <field column="fileAbsolutePath" name="path" />
> >
> > <field column="fileSize" name="size" />
> >
> > <field column="fileLastModified" name="lastmodified"
> > />
> >             </entity>
> >         </entity>
> >     </document>
> > </dataConfig>
> >
> >
> > The Solr version is 3.5. any idea?
>
> The implicit fields fileDir, file, fileAbsolutePath, fileSize,
> fileLastModified are generated by the FileListEntityProcessor. They should
> be defined above the TikaEntityProcessor.
>

Reply via email to