Im successfully able to index pdf,doc,ppt,etc files using the Data Import
Handler in solr 4.3.0 .

My data-config.xml looks like this -

<dataConfig>
    <dataSource name="bin" type="BinFileDataSource" />
    <document>
        <entity name="f" dataSource="null" rootEntity="false"
            processor="FileListEntityProcessor"
            baseDir="C:\Users\aroraarc\Desktop\Impdo" 
           
fileName=".*\.(DOC)|(PDF)|(pdf)|(doc)|(docx)|(ppt)|(pptx)|(xls)|(xlsx)|(txt)"
onError="skip"
            recursive="true">

            <field column="fileAbsolutePath" name="path" />
            <field column="fileSize" name="size" />
            <field column="fileLastModified" name="lastmodified" />
            <field column="file" name="fileName"/>

             <entity name="tika-test" dataSource="bin"
processor="TikaEntityProcessor"
            url="${f.fileAbsolutePath}" format="text" onError="skip">
                <field column="Author" name="author" meta="true"/>
                <field column="title" name="title" meta="true"/>
                <field column="text" name="content"/>

          </entity>
        </entity>
    </document>
</dataConfig>

However in the fileName field i want to insert the pure file name without
the extension. Eg - Instead of 'HelloWorld.txt' I want only 'HelloWorld' to
be inserted in the fileName field. How do I achieve this?

Thanks in advance!




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Extract-file-name-without-extension-while-indexing-using-Data-Import-Handler-in-Solr-tp4074991.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to