Hi Julien, You must understand that a job with a complex pipeline is really not running N independent jobs; it's running ONE job. Every document is processed through the pipeline only once. The pipeline may have faster components and slower components; doesn't matter; the document takes the sum total of the time all components need to fetch and process the document.
Karl On Tue, Sep 10, 2019 at 12:48 PM Julien Massiera < julien.massi...@francelabs.com> wrote: > Ok, so to be sure I understood what you are saying: > > suppose a job with two output connections and one of the outputs is twice > time faster than the other one to index documents. At a given time t, both > of the outputs will have indexed the same amount of documents, no matter if > one output is faster than the other one. > In other words : The fastest output will not have indexed all the crawled > documents meanwhile the second one will still have half of them to index. > > Am I wrong ? > On 10/09/2019 18:09, Karl Wright wrote: > > The output connection contract is that a request to index is made to the > connector, and the connector returns when it is done. > When there are multiple output connections, these are each handed a copy > of the document, one after the other, and told to index it. This is all > done by one worker thread. Multiple worker threads are not used for > multiple outputs of the same document. > > The framework is smart enough to not hand a document to a connector if it > hasn't changed (according to how the connector computes the > connector-specific output version string). > > Karl > > > On Tue, Sep 10, 2019 at 11:00 AM Julien Massiera < > julien.massi...@francelabs.com> wrote: > >> Hi, >> >> I would like to have an explanation about the behavior of a job when >> several outputs are configured. My main question is : for each output, >> how is the docs ingestion managed ? More precisely, are the ingest >> processes synchronized or not ? (in other words, is the ingestion of the >> next document waiting for the current ingestion to be completed for both >> outputs ?). But also, if one output is configured to send a commit at >> the end of the job, is this commit pending until the last ingestion has >> occured in the other output ? >> >> Thanks for your help, >> Julien >> > -- > Julien MASSIERA > Directeur développement produit > France Labs – Les experts du Search > Datafari – Vainqueur du trophée Big Data 2018 au Digital Innovation Makers > Summitwww.francelabs.com > >