You can have this kind of configuration in Data import handler xml file to index different type of files.
<dataConfig> <dataSource type="BinFileDataSource" /> <document> <entity name="files" dataSource="null" rootEntity="false" processor="FileListEntityProcessor" baseDir="(enter the file repository path)" fileName=".*.(doc)|(pdf)|(docx)|(txt)|(ppt)|(xls)|(xlsx)|(sql)|(vsd)|(zip)" onError="skip" recursive="true"> <field column="fileAbsolutePath" name="id" /> <field column="fileSize" name="size" /> <field column="fileLastModified" name="lastModified" /> <entity name="tika-documentimport" processor="TikaEntityProcessor" url="${files.fileAbsolutePath}" format="text"> <field column="File" name="fileName"/> <field column="Author" name="author" meta="true"/> </entity> </entity> </document> </dataConfig> Hope this helps. -- View this message in context: http://lucene.472066.n3.nabble.com/Apache-Solr-tp4114996p4115102.html Sent from the Solr - User mailing list archive at Nabble.com.