Thanks Tim/Josh,
My use case is to recursively get all 3 things - text (as plain text),
metadata and bytes - I made a mistake /rmeta/all was wrong and /rmeta/text
does what I need (I thought it wouldn't output meta and only output text,
but I see everything is still there) - sorry.

Thanks for working on 4207 I am trying to adapt my pipeline to work
asynchronously but there are some architecture issues with my own
code/infrastructure, if you create a synchronous endpoint I'd suggest
/runpack (for recursive unpack to go with rmeta)

Thank you,
Samuel



On Thu, Mar 21, 2024 at 3:14 PM Tim Allison <[email protected]> wrote:

> I’m making progress on TIKA-4207, which will allow you to specify separate
> emitters for /rmeta like output and a separate emitter for the raw bytes
> from all embedded files.
>
> That uses the /pipes or /async endpoints.
>
> After I finish that, I’ll try to add another endpoint that returns a zip
> with embedded raw bytes and the rmeta content.
>
> Not sure what to call that endpoint. Recommendations?
>
> On Thu, Mar 21, 2024 at 6:10 PM Tim Allison <[email protected]> wrote:
>
>> If rmeta/text is not returning text extracted from embedded files that’s
>> a bug.
>>
>> I don’t think /rmeta/all is a thing.
>>
>> On Thu, Mar 21, 2024 at 5:21 PM Zig Zag <[email protected]> wrote:
>>
>>> Thanks Josh, thats correct but rmeta/text allows you to control this but
>>> it only returns one level of text (not documents embedded within others) -
>>> when you use the recursive interface rmeta/all it always returns content as
>>> HTML and similarly unpack/all returns meta as CSV.
>>>
>>> On Thu, Mar 21, 2024 at 1:40 PM Josh Burchard <[email protected]>
>>> wrote:
>>>
>>>> Samuel - Well, I use Tika server and I get my data back in JSON format
>>>> because I use the /rmeta/text endpoint and send the HTTP header
>>>> Accept:application/json.  If you were to send Accept:text/plain would that
>>>> work for you? I've only done that in the context of the /tika endpoint and
>>>> that was long ago.  Not sure how to do anything similar in the app because
>>>> I never use that.  By the way, in the context of using the server I find
>>>> this table very helpful:
>>>>
>>>>
>>>> https://cwiki.apache.org/confluence/display/TIKA/TikaServerEndpointsCompared
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> From:        "Zig Zag" <[email protected]>
>>>> To:        [email protected]
>>>> Date:        03/21/2024 03:49 PM
>>>> Subject:        Re: Meta output format of tika server /unpack/all
>>>> ------------------------------
>>>>
>>>>
>>>>
>>>> [CAUTION: This email is from outside the organization. Unless you trust
>>>> the sender, don't click links or open attachments as it may be a phishing
>>>> email, which can steal your information and compromise your computer.]
>>>>
>>>>
>>>> Similarly is it possible to have /rmeta/all format content/text as text
>>>> instead of HTML?
>>>>
>>>> On Thu, Mar 21, 2024 at 9:50 AM Zig Zag <*[email protected]*
>>>> <[email protected]>> wrote:
>>>> Hi All,
>>>>
>>>> Is there a way to get the __META__ output of /unpack/all in a JSON
>>>> rather than CSV ?
>>>>
>>>> Thank you,
>>>> Samuel
>>>>
>>>>

Reply via email to