Im successfully able to index pdf,doc,ppt,etc files using the Data Import Handler in solr 4.3.0 .
My data-config.xml looks like this - <dataConfig> <dataSource name="bin" type="BinFileDataSource" /> <document> <entity name="f" dataSource="null" rootEntity="false" processor="FileListEntityProcessor" baseDir="C:\Users\aroraarc\Desktop\Impdo" fileName=".*\.(DOC)|(PDF)|(pdf)|(doc)|(docx)|(ppt)|(pptx)|(xls)|(xlsx)|(txt)" onError="skip" recursive="true"> <field column="fileAbsolutePath" name="path" /> <field column="fileSize" name="size" /> <field column="fileLastModified" name="lastmodified" /> <field column="file" name="fileName"/> <entity name="tika-test" dataSource="bin" processor="TikaEntityProcessor" url="${f.fileAbsolutePath}" format="text" onError="skip"> <field column="Author" name="author" meta="true"/> <field column="title" name="title" meta="true"/> <field column="text" name="content"/> </entity> </entity> </document> </dataConfig> However in the fileName field i want to insert the pure file name without the extension. Eg - Instead of 'HelloWorld.txt' I want only 'HelloWorld' to be inserted in the fileName field. How do I achieve this? Thanks in advance! -- View this message in context: http://lucene.472066.n3.nabble.com/Extract-file-name-without-extension-while-indexing-using-Data-Import-Handler-in-Solr-tp4074991.html Sent from the Solr - User mailing list archive at Nabble.com.