Re: sharepoint crawler documents limit

Karl Wright Thu, 19 Dec 2019 09:05:46 -0800

If you are using the MCF plugin, and selecting the appropriate version of
Sharepoint in the connection configuration, there is no hard limit I'm
aware of for any Sharepoint job.  We have lots of other people using
SharePoint and nobody has reported this ever before.


If your SharePoint connection says "SharePoint 2003" as the SharePoint
version, then sure, that would be expected behavior.  So please check that
first.

The other question I have is your description of you first getting 10001
documents and then later 20002.  That's not how ManifoldCF works.  At the
start of the crawl, seeds are added; this would start out just being the
root, and then other documents would be discovered as the crawl proceeded,
after subsites and libraries are discovered.  So I am still trying to
square that with your description of how this is working for you.

Are all of your documents in one library?  Or two libraries?

Karl




On Thu, Dec 19, 2019 at 11:42 AM Jorge Alonso Garcia <[email protected]>
wrote:

> Hi,
> On UI shows 20,002 documents (on a firts phase show 10,001,and after
> sometime of process raise to 20,002) .
> It looks like a hard limit, there is more files on sharepoint with the
> used criteria
>
>
> Jorge Alonso Garcia
>
>
>
> El jue., 19 dic. 2019 a las 16:05, Karl Wright (<[email protected]>)
> escribió:
>
>> Hi Jorge,
>>
>> When you run the job, do you see more than 20,000 documents as part of it?
>>
>> Do you see *exactly* 20,000 documents as part of it?
>>
>> Unless you are seeing a hard number like that in the UI for that job on
>> the job status page, I doubt very much that the problem is a numerical
>> limitation in the number of documents.  I would suspect that the inclusion
>> criteria, e.g. the mime type or maximum length, is excluding documents.
>>
>> Karl
>>
>>
>> On Thu, Dec 19, 2019 at 8:51 AM Jorge Alonso Garcia <[email protected]>
>> wrote:
>>
>>> Hi Karl,
>>> We had installed the shaterpoint plugin, and access properly
>>> http:/server/_vti_bin/MCPermissions.asmx
>>>
>>> [image: image.png]
>>>
>>> Sharepoint has more than 20,000 documents, but when execute the jon only
>>> extract these 20,000. How Can I check where is the issue?
>>>
>>> Regards
>>>
>>>
>>> Jorge Alonso Garcia
>>>
>>>
>>>
>>> El jue., 19 dic. 2019 a las 12:52, Karl Wright (<[email protected]>)
>>> escribió:
>>>
>>>> By "stop at 20,000" do you mean that it finds more than 20,000 but
>>>> stops crawling at that time?  Or what exactly do you mean here?
>>>>
>>>> FWIW, the behavior you describe sounds like you may not have installed
>>>> the SharePoint plugin and may have selected a version of SharePoint that is
>>>> inappropriate.  All SharePoint versions after 2008 limit the number of
>>>> documents returned using the standard web services methods.  The plugin
>>>> allows us to bypass that hard limit.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Thu, Dec 19, 2019 at 6:37 AM Jorge Alonso Garcia <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>> We have an isuse with sharepoint connector.
>>>>> There is a job that crawl a sharepoint 2016, but it is not recovering
>>>>> all files, it stop at 20.000 documents without any error.
>>>>> Is there any parameter that should be change to avoid this limitation?
>>>>>
>>>>> Regards
>>>>> Jorge Alonso Garcia
>>>>>
>>>>>

Re: sharepoint crawler documents limit

Reply via email to