Hi to everyone,

 

I've configured import of a document folder with FileListEntityProcessor,
everything went smooth on the first try, but I have a simple question. I'm
able to map metadata without any problem, but I'd like to import in my index
all metadata, not only those I've configured with field nodes. In this
example I've imported Author and title, but I does not know in advance which
metadata a document could have and I wish to have all of them inside my
index.

 

Here is my import config. It is the first try with importing with tika and
probably I'm missing a simple stuff.

 

<dataConfig>  

                <dataSource type="BinFileDataSource" />

                                <document>

                                                <entity name="files"
dataSource="null" rootEntity="false"

 
processor="FileListEntityProcessor" 

                                                baseDir="c:/temp/docs"
fileName=".*\.(doc)|(pdf)|(docx)"

                                                onError="skip"

                                                recursive="true">

                                                                <field
column="file" name="id" />

                                                                <field
column="fileAbsolutePath" name="path" />

                                                                <field
column="fileSize" name="size" />

                                                                <field
column="fileLastModified" name="lastModified" />

                                                                

                                                                <entity 

 
name="documentImport" 

 
processor="TikaEntityProcessor"

 
url="${files.fileAbsolutePath}" 

 
format="text">

 
<field column="file" name="fileName"/>

 
<field column="Author" name="author" meta="true"/>

 
<field column="title" name="title" meta="true"/>

 
<field column="text" name="text"/>

                                                                </entity>

                                </entity>

                                </document> 

</dataConfig>  

 

 

--

Gian Maria Ricci

Mobile: +39 320 0136949

 <http://mvp.microsoft.com/en-us/mvp/Gian%20Maria%20Ricci-4025635>
<http://www.linkedin.com/in/gianmariaricci>
<https://twitter.com/alkampfer>   <http://feeds.feedburner.com/AlkampferEng>
<skype://alkampferaok/> 

 

 

Reply via email to