Re: Crawling / Indexation Query

2020-05-30 Thread Karl Wright
We can't.  You need to follow the instructions and send email to the
appropriate address, listed here:

http://manifoldcf.apache.org/en_US/mail.html

Karl


On Sat, May 30, 2020 at 4:40 PM Shashank Saurabh 
wrote:

> Please unsubscribe me from your mailing list.
>
> Thanks,
> Shashank
>
> On Thu, May 7, 2020 at 4:11 PM Karl Wright  wrote:
>
>> Hi,
>>
>> ManifoldCF is not a crawler, it's a synchronizer.  If robots says not to
>> crawl something, then it will not be indexed.  If robots is changed to
>> prohibit crawling of certain documents, then yes, those documents will be
>> removed from the index.
>>
>> But you can override the robots behavior in the document specification or
>> configuration, I believe.
>>
>> Karl
>>
>>
>> On Thu, May 7, 2020 at 6:27 AM ritika jain 
>> wrote:
>>
>>> Hi All,
>>>
>>> Can any body explain
>>> If a URL was indexed, and afterwards a noindex tag was added - will that
>>> URL then be deleted from the index when it is visited again by the crawler?
>>>
>>>
>>> Say a url was previously having indexation required meta tag and was
>>> present in Elastic index, but then afterwards
>>> 
>>> was added to page design afterwards.
>>>
>>> Should it be deleted from Index when the Manifoldcf job crawl that url
>>> again or the URL will still be present in the index.
>>>
>>> Thanks
>>>
>>>
>>>
>>


Re: Crawling / Indexation Query

2020-05-30 Thread Shashank Saurabh
Please unsubscribe me from your mailing list.

Thanks,
Shashank

On Thu, May 7, 2020 at 4:11 PM Karl Wright  wrote:

> Hi,
>
> ManifoldCF is not a crawler, it's a synchronizer.  If robots says not to
> crawl something, then it will not be indexed.  If robots is changed to
> prohibit crawling of certain documents, then yes, those documents will be
> removed from the index.
>
> But you can override the robots behavior in the document specification or
> configuration, I believe.
>
> Karl
>
>
> On Thu, May 7, 2020 at 6:27 AM ritika jain 
> wrote:
>
>> Hi All,
>>
>> Can any body explain
>> If a URL was indexed, and afterwards a noindex tag was added - will that
>> URL then be deleted from the index when it is visited again by the crawler?
>>
>>
>> Say a url was previously having indexation required meta tag and was
>> present in Elastic index, but then afterwards
>> 
>> was added to page design afterwards.
>>
>> Should it be deleted from Index when the Manifoldcf job crawl that url
>> again or the URL will still be present in the index.
>>
>> Thanks
>>
>>
>>
>