Re: Document Splitter

2020-07-08 Thread Michael Cizmar
Cool.  I'll shift to that approach.  Have a lot of cases were we are indexing a 
csv, xml, or json file where we want them split up.


--

Michael Cizmar
Managing Director

p: 312.585.6396

d: 312.585.6286
twitter: @michaelcizmar

http://www.mcplusa.com/


The information contained in this communication is confidential, private, 
proprietary, or otherwise privileged and is intended only for the use of the 
addressee.  This e-mail is intended only for the person or entity to whom it is 
directed.  Unauthorized use, disclosure, distribution or copying is strictly 
prohibited and may be unlawful.  If you are not the intended recipient, please 
notify us immediately and permanently delete this e-mail and any attachments.


From: Karl Wright 
Sent: Wednesday, July 8, 2020 4:43 PM
To: dev 
Subject: Re: Document Splitter

Hi all,
Julien is correct; all documents must originate in the document
repository.  You can create document components this way, but they're all
subsidiaries of the principle document, so really the framework only tracks
the principle document in that case.

So you have a choice: either use the component approach, or have each row
be a full document in its own right.

>From what I see, the component approach would be the best one.

Karl


On Wed, Jul 8, 2020 at 1:25 PM Michael Cizmar 
wrote:

> Good point, I was thinking that I could do a:
> return activities.sendDocument(documentURI,docCopy);
>
> For each row of the XML or JSON.
>
>
>
> 
> From: julien.massi...@francelabs.com 
> Sent: Wednesday, July 8, 2020 9:45 AM
> To: dev@manifoldcf.apache.org 
> Subject: RE: Document Splitter
>
> Hi Michael,
>
> if I am not wrong (and that Karl confirms), what you want to do is not
> possible in a transformation connector. A transformation connector cannot
> transform 1 incoming document into several ones. The only way to do that is
> in a repository connector but it would then be bound to the type of the
> repo source.
>
> Regards,
> Julien
>
> -Message d'origine-
> De : Karl Wright 
> Envoyé : mercredi 8 juillet 2020 16:16
> À : dev 
> Objet : Re: Document Splitter
>
> Not that I know of.  But I'll let others answer as to what they may have
> written.
> Karl
>
>
> On Tue, Jul 7, 2020 at 7:38 PM Michael Cizmar 
> wrote:
>
> > I have a Json file which has an array of objects that I want to index
> > as separate documents.  Before I build a transformer to split it, is
> > there a ready made transformer to do this?
> >
> > Thanks!
> >
> > Michael
> >
>
>


Re: Document Splitter

2020-07-08 Thread Karl Wright
Hi all,
Julien is correct; all documents must originate in the document
repository.  You can create document components this way, but they're all
subsidiaries of the principle document, so really the framework only tracks
the principle document in that case.

So you have a choice: either use the component approach, or have each row
be a full document in its own right.

>From what I see, the component approach would be the best one.

Karl


On Wed, Jul 8, 2020 at 1:25 PM Michael Cizmar 
wrote:

> Good point, I was thinking that I could do a:
> return activities.sendDocument(documentURI,docCopy);
>
> For each row of the XML or JSON.
>
>
>
> 
> From: julien.massi...@francelabs.com 
> Sent: Wednesday, July 8, 2020 9:45 AM
> To: dev@manifoldcf.apache.org 
> Subject: RE: Document Splitter
>
> Hi Michael,
>
> if I am not wrong (and that Karl confirms), what you want to do is not
> possible in a transformation connector. A transformation connector cannot
> transform 1 incoming document into several ones. The only way to do that is
> in a repository connector but it would then be bound to the type of the
> repo source.
>
> Regards,
> Julien
>
> -Message d'origine-
> De : Karl Wright 
> Envoyé : mercredi 8 juillet 2020 16:16
> À : dev 
> Objet : Re: Document Splitter
>
> Not that I know of.  But I'll let others answer as to what they may have
> written.
> Karl
>
>
> On Tue, Jul 7, 2020 at 7:38 PM Michael Cizmar 
> wrote:
>
> > I have a Json file which has an array of objects that I want to index
> > as separate documents.  Before I build a transformer to split it, is
> > there a ready made transformer to do this?
> >
> > Thanks!
> >
> > Michael
> >
>
>


Re: Document Splitter

2020-07-08 Thread Michael Cizmar
Good point, I was thinking that I could do a:
return activities.sendDocument(documentURI,docCopy);

For each row of the XML or JSON.




From: julien.massi...@francelabs.com 
Sent: Wednesday, July 8, 2020 9:45 AM
To: dev@manifoldcf.apache.org 
Subject: RE: Document Splitter

Hi Michael,

if I am not wrong (and that Karl confirms), what you want to do is not possible 
in a transformation connector. A transformation connector cannot transform 1 
incoming document into several ones. The only way to do that is in a repository 
connector but it would then be bound to the type of the repo source.

Regards,
Julien

-Message d'origine-
De : Karl Wright 
Envoyé : mercredi 8 juillet 2020 16:16
À : dev 
Objet : Re: Document Splitter

Not that I know of.  But I'll let others answer as to what they may have 
written.
Karl


On Tue, Jul 7, 2020 at 7:38 PM Michael Cizmar 
wrote:

> I have a Json file which has an array of objects that I want to index
> as separate documents.  Before I build a transformer to split it, is
> there a ready made transformer to do this?
>
> Thanks!
>
> Michael
>



RE: Document Splitter

2020-07-08 Thread julien.massiera
Hi Michael, 

if I am not wrong (and that Karl confirms), what you want to do is not possible 
in a transformation connector. A transformation connector cannot transform 1 
incoming document into several ones. The only way to do that is in a repository 
connector but it would then be bound to the type of the repo source.

Regards,
Julien

-Message d'origine-
De : Karl Wright  
Envoyé : mercredi 8 juillet 2020 16:16
À : dev 
Objet : Re: Document Splitter

Not that I know of.  But I'll let others answer as to what they may have 
written.
Karl


On Tue, Jul 7, 2020 at 7:38 PM Michael Cizmar 
wrote:

> I have a Json file which has an array of objects that I want to index 
> as separate documents.  Before I build a transformer to split it, is 
> there a ready made transformer to do this?
>
> Thanks!
>
> Michael
>



Re: Document Splitter

2020-07-08 Thread Karl Wright
Not that I know of.  But I'll let others answer as to what they may have
written.
Karl


On Tue, Jul 7, 2020 at 7:38 PM Michael Cizmar 
wrote:

> I have a Json file which has an array of objects that I want to index as
> separate documents.  Before I build a transformer to split it, is there a
> ready made transformer to do this?
>
> Thanks!
>
> Michael
>