Hi Ritika,

' My requirement is to abort a job whenever a seed-corresponding site is
down or returning some 5xx response codes. '

(1) Connector methods, like addSeedDocuments(), are called by the
framework.  You do not call them yourself when you write a connector.  So
you are looking in the wrong place here.
(2) All that addSeedDocuments does in the web connector is add seed URLs to
the queue for the job.  You do not want to change this implementation.
(3) The only time the web connector fetches anything is when it is
processing documents, in the processDocuments() method.
(4) You don't get to control the queue.  Documents are processed by the
framework in the order *it* determines they should be processed.  You can
create an "event" which must be satisfied before processing can occur but
that is all the control you get at the connector level.
(5) Similarly, you don't get told which document URLs are seeds.  This
information is in the job, and it is included in the job queue "isSeed"
field for each document, but it is never sent to any connector method.

It is therefore possible to add "isSeed" to the IRepositoryConnector
processDocuments() method, which will change the contract for all
connectors.  You might be able to prevent carnage by creating a
BaseRepositoryConnector method implementation and abstract method that
would provide a shim for most connectors.

Karl






On Mon, Jul 6, 2020 at 8:52 AM ritika jain <ritikajain5...@gmail.com> wrote:

> Hi All,
>
> I have confusion regarding WebCrawler connector code.My requirement is to
> abort a job whenever a seed-corresponding site is down or returning some
> 5xx response codes.
> So I have used the jobManager errorAbort method for this
> in addSeedDocuments method of Webcrawlerconnector.java.., JobStatus class
> to get a Job ID.
>
> My confusion here is to get all seeds corresponding to corresponding job
> iD. So I used getAllSeeds() method declared in IJobManager Class.
>
> Query here is getAllSeeds method when used is returning a length zero
> array always.As I doubt this method is not having its corresponding
> definition in its implementation class.
> *Why this method has not been implemented in its Implementation class
> JobManager.*
>
> *Code done is:-*
>     String[]
> array1=jobManager.getAllSeeds(Long.parseLong(jsr[k].getJobID()));
> array 1 is always returning empty array.
>
> *Also another query is *
> public String addSeedDocuments(ISeedingActivity activities, Specification
> spec,
>     String lastSeedVersion, long seedTime, int jobMode)
>     throws ManifoldCFException, ServiceInterruption
>
> activities object is having jobID of the job which is calling this
> addSeeds method, but the interface as well its implementation class is
> having no getter(java) method to get JobID in the method.(it is set in
> constructor only)
>
>
> Can anybody please guide me on this.
>
> Thanks
> Ritika
>
>
>
>

Reply via email to