Hi All,

I am trying to build a pipeline that needs to process content recursively
and store the binary bytes of all embedded children in addition to their
text and other metadata.

 I was looking at two options:

1. using Tika's /rmeta API and having my code just call it synchronously -
is there a way for me to get bytes for embedded children when doing this ?
basically some way to smoosh together what /unpack/all does into /rmeta.
-   if it's not built-in any guidance on extending my own recursive handler
to do this ?. i'd like to keep tika-server as is and just configure this
extension so I can keep up with updates.

2. using /async or /pipes - with this I had 2 questions:
- Is there emitter configuration to commit both bytes and text for all
children ?
- is there a way for me to pass in input with my HTTP request, and use a
emitter only for storage (basically some sort of fetcher that uses the
input request stream - this will help me avoid one external request).

Thank you for any help!,
Samuel

Reply via email to