Re: [DISCUSS] Python processor dependency management

2024-09-24 Thread Gábor Gyimesi
Joe, In this scenario we are talking about very similar use cases for these processors, which would result in very similar processor code. Probably similar properties, similar functions used by all of these processors. That would result in a common codebase, which after some refactoring would resu

Re: [DISCUSS] Python processor dependency management

2024-09-24 Thread Joe Witt
Gabor Thanks. While I understand the logical grouping *these all do doc parsing things* why is it important for them to be in the same package? Why not have separate document parsing packages each which can be built/deployed separately? Thanks On Tue, Sep 24, 2024 at 9:29 AM Gábor Gyimesi wrot

Re: [DISCUSS] Python processor dependency management

2024-09-24 Thread Gábor Gyimesi
David, Joe, You are right, it's easier to understand such a use case with an example. We currently have a ParseDocument processor in our python extensions with PLAIN_TEXT, HTML, MARKDOWN, PDF, WORD, EXCEL, POWERPOINT input format support, using the unstructured library on its own or through langch

Re: [DISCUSS] Python processor dependency management

2024-09-24 Thread David Handermann
Gabor, On a similar note, it would be helpful to provide a concrete example. Unlike Java NARs, Python Processors do not have the same concept of multiple layers of parent class loaders right now. Virtual environments provide dependency sharing, but there isn't the same concept of sharing dependen

Re: [DISCUSS] Python processor dependency management

2024-09-24 Thread Joe Witt
Gabor Can you please describe a specific case or cases where ProcessorA and ProcessorB should be in the same package/module and yet have such vastly different (100s of MB or even GB) of dependency requirements? Thanks Joe On Tue, Sep 24, 2024 at 7:32 AM Ferenc Gerlits wrote: > Hi Gabor, > > I

Re: [DISCUSS] Python processor dependency management

2024-09-24 Thread Ferenc Gerlits
Hi Gabor, I like this approach, and I think the restriction you propose (that all utility files in the package use the same dependencies, and extra dependencies for processor A are only used in ProcessorA.py) is reasonable. I would be happy to implement this if there are no objections. Thanks, F

[DISCUSS] Python processor dependency management

2024-09-18 Thread Gábor Gyimesi
Hi Team, I would like to discuss the current dependency management and possible improvements for python processors. At the moment there are two ways to specify dependencies, either on a package level using a requirements.txt file to list all the dependencies for the processors in that package or i