You can have this kind of configuration in Data import handler xml file to
index different type of files.

<dataConfig>
<dataSource type="BinFileDataSource" />    
<document>  
<entity name="files" dataSource="null" rootEntity="false"
processor="FileListEntityProcessor" baseDir="(enter the file repository
path)"
fileName=".*.(doc)|(pdf)|(docx)|(txt)|(ppt)|(xls)|(xlsx)|(sql)|(vsd)|(zip)"
onError="skip" recursive="true">
                  <field column="fileAbsolutePath" name="id" />
                <field column="fileSize" name="size" />
                <field column="fileLastModified" name="lastModified" />
<entity name="tika-documentimport" processor="TikaEntityProcessor"
url="${files.fileAbsolutePath}" format="text">  
                        <field column="File" name="fileName"/>
             <field column="Author" name="author" meta="true"/>
</entity>  
</entity>  
</document>
</dataConfig>

Hope this helps.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Apache-Solr-tp4114996p4115102.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to