I set in the job the connection:

  1.  Repository: WinShare
  2.  Transformation: Allowed Documents
  3.  Transformation: TikaExternal
  4.  Transformation: MetadataExtractor
  5.  Output: SolrShare

so, in
allowed contents I put the allowed mimetypes and extension

in the field mapping I added
[cid:image002.png@01D46574.F9A5A060]
and I unchecked  “keep all metadata”

in the metadata expressions I checked “Keep all incoming metadata” and “remove 
empy metadata values”

Obviously, my solr schema has to contains the field last_author, author besides 
the fields that I specified in the output connection SolrShare tab Schema
[cid:image006.png@01D46574.F9A5A060]


It works, in the solr index I find the field added last_author and author 
(where they aren’t empty)

I hope that my approach is the right way to set the architecture 
ManifoldCF-Solr-Tika

Thanks a lot, Karl for your patience..

Mario




Da: Karl Wright <daddy...@gmail.com>
Inviato: martedì 16 ottobre 2018 13:11
A: user@manifoldcf.apache.org
Oggetto: Re: Add field to Output Solr

If it's not in your PDFs, Tika won't extract it.
If you merely want to copy another field, you can use the Metadata Adjuster 
transformer to do that.

Karl


On Tue, Oct 16, 2018 at 4:38 AM Bisonti Mario 
<mario.biso...@vimar.com<mailto:mario.biso...@vimar.com>> wrote:
Hallo
I am using Tika server as processor of file pdf, doc, etc

I configured:
[cid:image003.png@01D4653C.61DD4040]
In my solr output connection, so, when I index the documents I see the field:
id
last_modified
resourcename
content_type
allow_token_document
deny_token_document
allow_token_share
deny_token_share
stream_size
creator
deny_token_parent
allow_token_parent
content
_version_


In my schema of Solr, I have the field last_author that I would like to be 
indexed.
How can I add it?

Thanks a lot

Mario

Reply via email to