Hello, I have been stuck on this for too long now I feel like, so I decided to try to get some information here.
I would need Tika to extract content and metadata from thousands of files from S3. What I wanted to do is, have Tika running as a standalone server and use S3 fetchers and emitters in conjunction with /async However I am having some difficulties to track what is going on the server side, my client code is in python using `python-tika` A payload is built programmatically and sent to /async endpoint, but I need to be able to track either the whole async task or individual tuples from it - have they failed, succeeded, or still running but I am struggling to achieve that, could not found any related information on whether this is possible, came across some information that it can be achieved by checking `/tika/async/<task_id> and that when you send a put the response should contain X-Tika-id header, but none of these seem to work. Additionally from confluence: As default the fetchKey is used as the id for logging. However, if users > need a distinct task id for the request, they may add an id element: > > { > "id": "myTaskId", > "fetcher": "fsf", > "fetchKey": "hello_world.pdf", > "emitter": "fse", > "emitKey": "hello_world.pdf.json" > } > > > Is there a way to track the task id when running from /async looks like there is from all that I have seen so far but can't seem to figure out how to actually achieve it, if i try to do `GET` on /async/<task_id> nothing happens - no resource, if i try to use /tika/async/<task_id> I get a 405. I have tried using /pipes which would capture errors etc and is handy but what about async ? /async doesn't seem to throw any errors no matter what actually happens in tika as long the payload is valid e.g processing errors, or bad cred errors for aws everything just gets skipped. Any pointers in the right direction will be welcome. Thanks, Georgi