Hi All, I am trying to build a pipeline that needs to process content recursively and store the binary bytes of all embedded children in addition to their text and other metadata.
I was looking at two options: 1. using Tika's /rmeta API and having my code just call it synchronously - is there a way for me to get bytes for embedded children when doing this ? basically some way to smoosh together what /unpack/all does into /rmeta. - if it's not built-in any guidance on extending my own recursive handler to do this ?. i'd like to keep tika-server as is and just configure this extension so I can keep up with updates. 2. using /async or /pipes - with this I had 2 questions: - Is there emitter configuration to commit both bytes and text for all children ? - is there a way for me to pass in input with my HTTP request, and use a emitter only for storage (basically some sort of fetcher that uses the input request stream - this will help me avoid one external request). Thank you for any help!, Samuel
