Re: Technical question on repo connector dev

2019-10-05 Thread Karl Wright
Yes, that is what I suggest.
Karl


On Sat, Oct 5, 2019 at 8:42 AM  wrote:

> Hi Karl,
>
> Thanks for the answer.
>
> Is your suggestion something like :
>
> processDocuments(...) {
>
> if(documentIdentifier.isURI) {
> jsonDocs = getJsonDocsFromURI(documentIdentifier)
> jsonDocs.foreach(jsonDoc -> {
> String jsonDocID = "jsonDoc+" +
> jsonDoc.toJsonString();
> activities.addDocumentReference(jsonDocID);
> })
> } else if(documentIdentifier.isJsonDoc) {
> jsonDoc = getJsonDoc(documentIdentifier)
> jsonDocVersion = jsonDoc.getVersion()
> jsonDocUri = jsonDoc.getUri();
>
> if(activities.checkDocumentNeedsReindexing(documentIdentifier,
> jsonDocVersion)) {
>
> activities.ingestDocumentWithException(documentIdentifier, jsonDoc,
> jsonDocUri)
> }
> }
> }
>
> ?
>
> Julien
>
> -Message d'origine-----
> De : Karl Wright 
> Envoyé : vendredi 4 octobre 2019 21:07
> À : dev 
> Objet : Re: Technical question on repo connector dev
>
> Hi Julien,
>
> The checkDocumentNeedsReindexing() method is meant to be used inside
> processDocuments() for the specific document you are checking.  So you can
> convert your URI to a set of JSON documents, if the document identifier is
> a URI, But you will probably want to put the actual data for the document
> in carrydown information.  You will need to also create some kind of
> non-URI document ID too.
>
> Karl
>
>
> On Fri, Oct 4, 2019 at 1:36 PM  wrote:
>
> > Hi,
> >
> >
> >
> > I am facing a simple technical case that I am not sure how to deal
> > with, concerning the development of a repository connector.
> >
> >
> >
> > I want to develop a repo connector using the ADD_CHANGE_DELETE model
> > that will normally add seed documents, and each seed document will
> > produce several documents.
> > The problem is that each produced document from a seed doc is
> > instantly ingest-able and does not need to be processed.
> >
> >
> >
> > The use case here is that the addSeedDocuments method will call an API
> > that will provide several URIs (seeds).
> >
> > In the processDocuments method, each URI provides a JSON array
> > containing JSON objects and those JSON objects are meant to become
> > repository documents and ingested.
> > So the logic would be to use the activities.addDocumentReference for
> > each JSON object before I can use the
> > activities.checkDocumentNeedsReindexing
> > (each JSON object has an id and a version field) and then ingest the
> > document. But by doing this, I am afraid that the processDocuments
> > method will be called with those newly referenced docs while they do
> > not need to be processed.
> >
> >
> >
> > Any suggestion about how to deal with this use case is welcome.
> >
> >
> >
> > Thanks,
> > Julien
> >
> >
>
>


RE: Technical question on repo connector dev

2019-10-05 Thread julien.massiera
Hi Karl, 

Thanks for the answer. 

Is your suggestion something like :

processDocuments(...) {

if(documentIdentifier.isURI) {
jsonDocs = getJsonDocsFromURI(documentIdentifier)
jsonDocs.foreach(jsonDoc -> {
String jsonDocID = "jsonDoc+" + jsonDoc.toJsonString();
activities.addDocumentReference(jsonDocID);
})
} else if(documentIdentifier.isJsonDoc) {
jsonDoc = getJsonDoc(documentIdentifier)
jsonDocVersion = jsonDoc.getVersion()
jsonDocUri = jsonDoc.getUri();
if(activities.checkDocumentNeedsReindexing(documentIdentifier, 
jsonDocVersion)) {

activities.ingestDocumentWithException(documentIdentifier, jsonDoc, jsonDocUri)
}
}
}  

?

Julien

-Message d'origine-
De : Karl Wright  
Envoyé : vendredi 4 octobre 2019 21:07
À : dev 
Objet : Re: Technical question on repo connector dev

Hi Julien,

The checkDocumentNeedsReindexing() method is meant to be used inside
processDocuments() for the specific document you are checking.  So you can 
convert your URI to a set of JSON documents, if the document identifier is a 
URI, But you will probably want to put the actual data for the document in 
carrydown information.  You will need to also create some kind of non-URI 
document ID too.

Karl


On Fri, Oct 4, 2019 at 1:36 PM  wrote:

> Hi,
>
>
>
> I am facing a simple technical case that I am not sure how to deal 
> with, concerning the development of a repository connector.
>
>
>
> I want to develop a repo connector using the ADD_CHANGE_DELETE model 
> that will normally add seed documents, and each seed document will 
> produce several documents.
> The problem is that each produced document from a seed doc is 
> instantly ingest-able and does not need to be processed.
>
>
>
> The use case here is that the addSeedDocuments method will call an API 
> that will provide several URIs (seeds).
>
> In the processDocuments method, each URI provides a JSON array 
> containing JSON objects and those JSON objects are meant to become 
> repository documents and ingested.
> So the logic would be to use the activities.addDocumentReference for 
> each JSON object before I can use the 
> activities.checkDocumentNeedsReindexing
> (each JSON object has an id and a version field) and then ingest the 
> document. But by doing this, I am afraid that the processDocuments 
> method will be called with those newly referenced docs while they do 
> not need to be processed.
>
>
>
> Any suggestion about how to deal with this use case is welcome.
>
>
>
> Thanks,
> Julien
>
>



Re: Technical question on repo connector dev

2019-10-04 Thread Karl Wright
Hi Julien,

The checkDocumentNeedsReindexing() method is meant to be used inside
processDocuments() for the specific document you are checking.  So you can
convert your URI to a set of JSON documents, if the document identifier is
a URI, But you will probably want to put the actual data for the document
in carrydown information.  You will need to also create some kind of
non-URI document ID too.

Karl


On Fri, Oct 4, 2019 at 1:36 PM  wrote:

> Hi,
>
>
>
> I am facing a simple technical case that I am not sure how to deal with,
> concerning the development of a repository connector.
>
>
>
> I want to develop a repo connector using the ADD_CHANGE_DELETE model that
> will normally add seed documents, and each seed document will produce
> several documents.
> The problem is that each produced document from a seed doc is instantly
> ingest-able and does not need to be processed.
>
>
>
> The use case here is that the addSeedDocuments method will call an API that
> will provide several URIs (seeds).
>
> In the processDocuments method, each URI provides a JSON array containing
> JSON objects and those JSON objects are meant to become repository
> documents
> and ingested.
> So the logic would be to use the activities.addDocumentReference for each
> JSON object before I can use the activities.checkDocumentNeedsReindexing
> (each JSON object has an id and a version field) and then ingest the
> document. But by doing this, I am afraid that the processDocuments method
> will be called with those newly referenced docs while they do not need to
> be
> processed.
>
>
>
> Any suggestion about how to deal with this use case is welcome.
>
>
>
> Thanks,
> Julien
>
>


Technical question on repo connector dev

2019-10-04 Thread julien.massiera
Hi, 

 

I am facing a simple technical case that I am not sure how to deal with,
concerning the development of a repository connector. 

 

I want to develop a repo connector using the ADD_CHANGE_DELETE model that
will normally add seed documents, and each seed document will produce
several documents. 
The problem is that each produced document from a seed doc is instantly
ingest-able and does not need to be processed.

 

The use case here is that the addSeedDocuments method will call an API that
will provide several URIs (seeds).

In the processDocuments method, each URI provides a JSON array containing
JSON objects and those JSON objects are meant to become repository documents
and ingested. 
So the logic would be to use the activities.addDocumentReference for each
JSON object before I can use the activities.checkDocumentNeedsReindexing
(each JSON object has an id and a version field) and then ingest the
document. But by doing this, I am afraid that the processDocuments method
will be called with those newly referenced docs while they do not need to be
processed.

 

Any suggestion about how to deal with this use case is welcome. 

 

Thanks,
Julien