Hallo.
I startup tika server from command line:
java -jar /opt/tika/tika-server-1.19.1.jar

I configured, with ManifoldCF a connector to Solr.

When I start the ingest of pdf and .xls document, I see in the tika server:

INFO  Setting the server's publish address to be http://localhost:9998/
INFO  Logging initialized @1053ms to org.eclipse.jetty.util.log.Slf4jLog
INFO  jetty-9.4.z-SNAPSHOT; built: 2018-06-05T18:24:03.829Z; git: 
d5fc0523cfa96bfebfbda19606cad384d772f04c; jvm 10.0.2+13-Ubuntu-1ubuntu0.18.04.2
INFO  Started ServerConnector@f74e835{HTTP/1.1,[http/1.1]}{localhost:9998}
INFO  Started @1134ms
WARN  Empty contextPath
INFO  Started o.e.j.s.h.ContextHandler@68d6972f{/,null,AVAILABLE}
INFO  Started Apache Tika server at http://localhost:9998/
INFO  meta (application/pdf)
INFO  meta (application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
WARN  Using fallback font 'LiberationSans' for 'TimesNewRomanPS-BoldMT'
WARN  Using fallback font 'LiberationSans' for 'Arial-Black'
WARN  Using fallback font 'LiberationSans' for 'TimesNewRomanPSMT'
WARN  Using fallback font 'LiberationSans' for 'Arial-BoldMT'
WARN  Using fallback font 'LiberationSans' for 'ArialMT'
WARN  Using fallback font 'LiberationSans' for 'CourierNewPSMT'
WARN  Using fallback font 'LiberationSans' for 'TimesNewRomanPS-ItalicMT'
INFO  tika (application/pdf)
WARN  Using fallback font 'LiberationSans' for 'TimesNewRomanPS-BoldMT'
WARN  Using fallback font 'LiberationSans' for 'Arial-Black'
WARN  Using fallback font 'LiberationSans' for 'TimesNewRomanPSMT'
WARN  Using fallback font 'LiberationSans' for 'Arial-BoldMT'
WARN  Using fallback font 'LiberationSans' for 'ArialMT'
WARN  Using fallback font 'LiberationSans' for 'CourierNewPSMT'
WARN  Using fallback font 'LiberationSans' for 'TimesNewRomanPS-ItalicMT'
INFO  tika (application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

so it seems that tika server process the cocuments, but, Solr server doesn't 
ingest.

I obtain the error:
Solr connector rejected document due to mime type restrictions: 
(application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
Solr connector rejected document due to mime type restrictions: 
(application/pdf)

I understood that tika converts all documents in text so it index to solr, or 
are there any restriction about Tika Server mime typ?

Thanks a lot

Mario

Reply via email to