Hi Ritika,

You do not want to load the list of seeds on every document processing that
is done for performance reasons.  The connector API does not support
accessing arbitrary job data in part for this reason.

You should NEVER be calling JobManager methods from a connector either.
You have *Activity methods that you can call.

Karl


On Tue, Jul 7, 2020 at 4:04 AM ritika jain <ritikajain5...@gmail.com> wrote:

> Hi  Karl,
>
> Many thanks for your response.!!
>
> The problem I faced is to get Current JobID , so that's why I used the
> JobStatus class. another thing is to get the seeds corresponding to the
> running JOb ID.
>
> activities object is having value of job ID set in its constructor object.
> But no way  to get the value in WebCrawlerConnector.java as no getter is
> defined.
>
> Another thing is JobManager is having a function getAllSeeds which is
> defined in its interface class IJobManager, but not defined in its
> implementation class JobManager, so it is always returning an empty value.
>
> Thanks
>
>
> On Mon, Jul 6, 2020 at 6:44 PM Karl Wright <daddy...@gmail.com> wrote:
>
>> Hi Ritika,
>>
>> ' My requirement is to abort a job whenever a seed-corresponding site is
>> down or returning some 5xx response codes. '
>>
>> (1) Connector methods, like addSeedDocuments(), are called by the
>> framework.  You do not call them yourself when you write a connector.  So
>> you are looking in the wrong place here.
>> (2) All that addSeedDocuments does in the web connector is add seed URLs
>> to the queue for the job.  You do not want to change this implementation.
>> (3) The only time the web connector fetches anything is when it is
>> processing documents, in the processDocuments() method.
>> (4) You don't get to control the queue.  Documents are processed by the
>> framework in the order *it* determines they should be processed.  You can
>> create an "event" which must be satisfied before processing can occur but
>> that is all the control you get at the connector level.
>> (5) Similarly, you don't get told which document URLs are seeds.  This
>> information is in the job, and it is included in the job queue "isSeed"
>> field for each document, but it is never sent to any connector method.
>>
>> It is therefore possible to add "isSeed" to the IRepositoryConnector
>> processDocuments() method, which will change the contract for all
>> connectors.  You might be able to prevent carnage by creating a
>> BaseRepositoryConnector method implementation and abstract method that
>> would provide a shim for most connectors.
>>
>> Karl
>>
>>
>>
>>
>>
>>
>> On Mon, Jul 6, 2020 at 8:52 AM ritika jain <ritikajain5...@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> I have confusion regarding WebCrawler connector code.My requirement is
>>> to abort a job whenever a seed-corresponding site is down or returning some
>>> 5xx response codes.
>>> So I have used the jobManager errorAbort method for this
>>> in addSeedDocuments method of Webcrawlerconnector.java.., JobStatus class
>>> to get a Job ID.
>>>
>>> My confusion here is to get all seeds corresponding to corresponding job
>>> iD. So I used getAllSeeds() method declared in IJobManager Class.
>>>
>>> Query here is getAllSeeds method when used is returning a length zero
>>> array always.As I doubt this method is not having its corresponding
>>> definition in its implementation class.
>>> *Why this method has not been implemented in its Implementation class
>>> JobManager.*
>>>
>>> *Code done is:-*
>>>     String[]
>>> array1=jobManager.getAllSeeds(Long.parseLong(jsr[k].getJobID()));
>>> array 1 is always returning empty array.
>>>
>>> *Also another query is *
>>> public String addSeedDocuments(ISeedingActivity activities,
>>> Specification spec,
>>>     String lastSeedVersion, long seedTime, int jobMode)
>>>     throws ManifoldCFException, ServiceInterruption
>>>
>>> activities object is having jobID of the job which is calling this
>>> addSeeds method, but the interface as well its implementation class is
>>> having no getter(java) method to get JobID in the method.(it is set in
>>> constructor only)
>>>
>>>
>>> Can anybody please guide me on this.
>>>
>>> Thanks
>>> Ritika
>>>
>>>
>>>
>>>

Reply via email to