This is bad. I’m sorry for your experience with this.

Let me see if I can get something working with our v2 external parsers.

At the least, I agree that we need to fix our documentation.

On Tue, Aug 26, 2025 at 6:11 AM Adrian Bird <[email protected]>
wrote:

> Hi,
>   I tried requesting a Jira account (birdya22) to report this issue
> but the request was denied.
>
> The reply said I could submit PRs on github (I have an account), but I
> didn't see how to do it (https://github.com/apache/tika/).
>
> So I've subscribed to this list and here are the details.
>
> I tried to get Tika and ExifTool to work together to process some JPEG
> image files and came across a number of issues.
> 1) Tika and ExifTool don't work on Windows
> I used the Wiki page
> 'https://cwiki.apache.org/confluence/display/TIKA/EXIFToolParser' to
> understand how to do the integration.
> Because I wasn't getting the metadata I expected, I used the
> '--verbose' option and got a Java Exception which contained this text:
>  "WARN  [main] 07:13:34,699
> org.apache.tika.parser.external.ExternalParser problem with process
> exec
> java.io.IOException: Cannot run program "env": CreateProcess error=2,
> The system cannot find the file specified"
> The exception occurs because 'env' is not a valid Windows command.
> I tracked this down to the file
> 'org\apache\tika\parser\external\tika-external-parsers.xml' in the
> Tika App jar where the command is:
> '<command>env FOO=${OUTPUT} exiftool ${INPUT}</command>'
> This doesn't work on Windows because 'env' does not exist as a command.
>
> 2) In the same file I noticed an entry for 'sox'. For the same reason
> as ExifTool, Tika and sox won't work on Windows
> The command is:
> <command>env FOO=${OUTPUT} sox --info ${INPUT}</command>
> Note I didn't find any information on 'sox' on the Wiki.
>
> 3) Looking at the file
> 'org\apache\tika\parser\external\tika-external-parsers.xml' I noticed
> that it only contains video related mime-types, meaning that I cannot
> use it with image files. The Wiki page says:
> 'EXIFTool is a wonderful tool that reads videos, images, audio and
> other media files and that extracts EXIF metadata from them.'
> I took this to mean that Tika can extract metadata from all 3 file
> types, but that isn't the case as it only supports video files.
> Given this can I suggest the Wiki page should be updated to make this
> clear.
>
> Adrian
>

Reply via email to