My 0.02...
1) It is important that we do what we can to make it easy for people
to integrate Tika into the dense vector/llm/rag landscape. I see A LOT
of projects reinventing the wheel (without multi-parser full recursion
like we have), or just running pdftotext and declaring victory. So, if
we ca
Your approach sounds great as well Nick….
> On Apr 9, 2024, at 2:21 AM, Michael Wechner wrote:
>
> Thanks for sharing your approach!
>
> Do you already have some code to share?
>
> Today I read about https://github.com/infiniflow/ragflow which might also
> have some interesting chunking ap
Thanks for sharing your approach!
Do you already have some code to share?
Today I read about https://github.com/infiniflow/ragflow which might
also have some interesting chunking approaches.
Thanks
Michael
Am 09.04.24 um 01:25 schrieb Nick Burch:
On Mon, 8 Apr 2024, Tim Allison wrote:
Not
On Mon, 8 Apr 2024, Tim Allison wrote:
Not sure we should jump on the bandwagon, but anything we can do to
support smart chunking would benefit us.
Could just be more integrations with parsers that turn out to be useful. I
haven’t had much joy with some. Here’s one that I haven’t evaluated yet:
I am also very interested in this vector-based search. Indexes are a big
thing right now.
On Mon, Apr 8, 2024, 4:16 PM Michael Wechner
wrote:
> It would be great to have good "semantic chunking" in order to generate
> vector embeddings.
>
> Thanks for the link below, will try to test it.
>
> Tha
It would be great to have good "semantic chunking" in order to generate
vector embeddings.
Thanks for the link below, will try to test it.
Thanks
Michael
Am 08.04.24 um 18:29 schrieb Tim Allison:
Not sure we should jump on the bandwagon, but anything we can do to support
smart chunking wou