thank you Erik for your precious advice.

2016-01-14 17:24 GMT+00:00 Erik Hatcher <erik.hatc...@gmail.com>:

> And also, bin/post can be your friend when it comes to troubleshooting or
> introspecting Tika parsing via /update/extract.  Like this:
>
> $ bin/post -c test -params "extractOnly=true&wt=ruby&indent=yes" -out yes
> docs/SYSTEM_REQUIREMENTS.html
> java -classpath /Users/erikhatcher/solr-5.3.0/dist/solr-core-5.3.0.jar
> -Dauto=yes -Dparams=extractOnly=true&wt=ruby&indent=yes -Dout=yes -Dc=test
> -Ddata=files org.apache.solr.util.SimplePostTool
> /Users/erikhatcher/solr-5.3.0/docs/SYSTEM_REQUIREMENTS.html
> SimplePostTool version 5.0.0
> Posting files to [base] url
> http://localhost:8983/solr/test/update?extractOnly=true&wt=ruby&indent=yes.
> ..
> Entering auto mode. File endings considered are
> xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> POSTing file SYSTEM_REQUIREMENTS.html (text/html) to [base]/extract
> {
>   'responseHeader'=>{
>     'status'=>0,
>     'QTime'=>3},
>   ''=>'<?xml version="1.0" encoding="UTF-8"?>
> <html xmlns="http://www.w3.org/1999/xhtml";>
> <head>
> <meta...
>
>    - from
> https://lucidworks.com/blog/2015/08/04/solr-5-new-binpost-utility/
>
> But I also recommend having the Tika desktop app handy, in which you can
> drag and drop a file and see the gory details of how it parses the file.
>
> —
> Erik Hatcher, Senior Solutions Architect
> http://www.lucidworks.com
>
>
>
> > On Jan 14, 2016, at 10:55 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
> >
> > No good way except to try them. For getting details on Tika parsing
> > failures, I much prefer the SolrJ process that the link I sent you
> > outlines.
> >
> > Best,
> > Erick
> >
> > On Thu, Jan 14, 2016 at 7:52 AM, kostali hassan
> > <med.has.kost...@gmail.com> wrote:
> >> thank you Eric I have prb with this files; last question how to define
> or
> >> get the list of files cant be indexing or bad files.
> >>
> >>
> >>>
> >>>
> >>>
> >>>
>
>

Reply via email to