Rename the shell script’s extension to end in .bat and you should be good to go.


From: kostali hassan [mailto:med.has.kost...@gmail.com]
Sent: Friday, July 15, 2016 1:26 PM
To: user@tika.apache.org
Subject: Re: detect corrupt file and build a list of them before indexing in 
solr

I USE TIKA_app1.12

2016-07-15 18:20 GMT+01:00 Allison, Timothy B. 
<talli...@mitre.org<mailto:talli...@mitre.org>>:
Can you share the shell script/bat file you’re using?

From: kostali hassan 
[mailto:med.has.kost...@gmail.com<mailto:med.has.kost...@gmail.com>]
Sent: Friday, July 15, 2016 1:13 PM

To: user@tika.apache.org<mailto:user@tika.apache.org>
Subject: Re: detect corrupt file and build a list of them before indexing in 
solr

when I add to inputDIR d:\test the log tell me:java.lang.RuntimeException: 
Crawler couldn't find this directory:D:\tika_batch_config\test
the same if I add to inputDIR d:\Cvs the log is:java.lang.RuntimeException: 
Crawler couldn't find this directory: D:\tika_batch_config\Cvs

2016-07-15 17:54 GMT+01:00 kostali hassan 
<med.has.kost...@gmail.com<mailto:med.has.kost...@gmail.com>>:
I added this directorry ANd still not working

2016-07-15 17:42 GMT+01:00 Allison, Timothy B. 
<talli...@mitre.org<mailto:talli...@mitre.org>>:
Y, the log tells you that the input directory wasn’t specified correctly:

1375 2016-07-15 17:33:17,354 [Thread-2] INFO  
org.apache.tika.batch.BatchProcessDriverCLI  - BatchProcess: 
java.lang.RuntimeException: Crawler couldn't find this 
directory:D:\tika_batch_config\test

From: kostali hassan 
[mailto:med.has.kost...@gmail.com<mailto:med.has.kost...@gmail.com>]
Sent: Friday, July 15, 2016 12:40 PM

To: user@tika.apache.org<mailto:user@tika.apache.org>
Subject: Re: detect corrupt file and build a list of them before indexing in 
solr

only JXmx1g work AND the inputDIR is empty AND I get this files empty in logs :
batch-driver-warn.log
batch-process-warn.log
tika-batch-pdfbox.log

AND this attached files

2016-07-15 16:36 GMT+01:00 Allison, Timothy B. 
<talli...@mitre.org<mailto:talli...@mitre.org>>:
Try changing the max heap to something that will work on your computer:

-JXmx5g

To (say):

-JXmx1g
From: kostali hassan 
[mailto:med.has.kost...@gmail.com<mailto:med.has.kost...@gmail.com>]
Sent: Friday, July 15, 2016 11:27 AM
To: user@tika.apache.org<mailto:user@tika.apache.org>
Subject: Re: detect corrupt file and build a list of them before indexing in 
solr

I get this files in the logs ; AND when I run the script he dont finich he 
restart all the time

2016-07-15 13:19 GMT+01:00 Allison, Timothy B. 
<talli...@mitre.org<mailto:talli...@mitre.org>>:
Sorry, you’ll get 0 byte files for an error that caused Tika batch to do a 
restart (hang/oom); and depending on cause, you may get an error logged in 
batch-process-error.xml.  If your OS kills the process or something truly 
catastrophic happens, the only trace you have is the 0 byte file.


  For regular caught exceptions, you can look in the .json file (key: 
TikaCoreProperties.TIKA_META_EXCEPTION_PREFIX+"runtime")
for the stack trace, or you can look in the logs as described below.

From: Allison, Timothy B. [mailto:talli...@mitre.org<mailto:talli...@mitre.org>]
Sent: Friday, July 15, 2016 8:11 AM
To: user@tika.apache.org<mailto:user@tika.apache.org>
Subject: RE: detect corrupt file and build a list of them before indexing in 
solr

Checking for 0 byte files is one option.  The other option is to configure the 
logs to capture exceptions.  I’ve attached the config files and the shell 
script that I use when running our large scale regression testing here: 
https://wiki.apache.org/tika/TikaBatchUsage?action=AttachFile&do=view&target=tika-batch-sh.zip

To run those, unzip the folder, put the tika-app.jar in the bin/ directory, 
update the shell script for your <input_dir> and your <output_dir> and you 
should be good to go.  You may need to create a “logs” directory.  Exceptions 
will be recorded in the batch-process-warn.log, and original file names are 
included along with stack traces.

From: kostali hassan [mailto:med.has.kost...@gmail.com]
Sent: Friday, July 15, 2016 5:17 AM
To: user@tika.apache.org<mailto:user@tika.apache.org>
Subject: detect corrupt file and build a list of them before indexing in solr

I'am looking to index ms word and pdf using uploading data with solr cell using 
apache tika;
 I just hope use tika to detect corrupt files before indexing and get a list of 
corrupted file. if its possible.
I try runing java -jar tika-app.jar <input_dir> <output_dir> I get in the 
output_dir all the files of <input_dir> in format xml and all the corrupt file 
with size 0ko (empty)





Reply via email to