Joe,
In this scenario we are talking about very similar use cases for these
processors, which would result in very similar processor code.
Probably similar properties, similar functions used by all of these
processors. That would result in a common codebase, which after some
refactoring would resu
Gabor
Thanks. While I understand the logical grouping *these all do doc parsing
things* why is it important for them to be in the same package? Why not
have separate document parsing packages each which can be built/deployed
separately?
Thanks
On Tue, Sep 24, 2024 at 9:29 AM Gábor Gyimesi wrot
David, Joe,
You are right, it's easier to understand such a use case with an
example. We currently have a ParseDocument processor in our python
extensions with PLAIN_TEXT, HTML, MARKDOWN, PDF, WORD, EXCEL,
POWERPOINT input format support, using the unstructured library on its
own or through langch
Gabor,
On a similar note, it would be helpful to provide a concrete example.
Unlike Java NARs, Python Processors do not have the same concept of
multiple layers of parent class loaders right now. Virtual
environments provide dependency sharing, but there isn't the same
concept of sharing dependen
Gabor
Can you please describe a specific case or cases where ProcessorA and
ProcessorB should be in the same package/module and yet have such vastly
different (100s of MB or even GB) of dependency requirements?
Thanks
Joe
On Tue, Sep 24, 2024 at 7:32 AM Ferenc Gerlits wrote:
> Hi Gabor,
>
> I
Hi Gabor,
I like this approach, and I think the restriction you propose (that
all utility files in the package use the same dependencies, and extra
dependencies for processor A are only used in ProcessorA.py) is
reasonable. I would be happy to implement this if there are no
objections.
Thanks,
F
Hi Team,
I would like to discuss the current dependency management and possible
improvements for python processors. At the moment there are two ways
to specify dependencies, either on a package level using a
requirements.txt file to list all the dependencies for the processors
in that package or i