Re: sharepoint crawler documents limit

2020-01-27 Thread Karl Wright
I'm glad you got by this.  Thanks for letting us know what the issue was.
Karl

On Mon, Jan 27, 2020 at 4:05 AM Jorge Alonso Garcia 
wrote:

> Hi,
> We had change timeout on sharepoint IIS and now the process is able to
> crall all documents.
> Thanks for your help
>
>
>
> El lun., 30 dic. 2019 a las 12:18, Gaurav G ()
> escribió:
>
>> We had faced a similar issue, wherein our repo had 100,000 documents but
>> our crawler stopped after 5 documents. The issue turned out to be that
>> the Sharepoint query that was fired by the Sharepoint web service gets
>> progressively slower and eventually the connection starts timing out before
>> the next 1 records get returned. We increased a timeout parameter on
>> Sharepoint to 10 minutes and then after that we were able to crawl all
>> documents successfully.  I believe we had increased the parameter indicated
>> in the link below
>>
>>
>> https://weblogs.asp.net/jeffwids/how-to-increase-the-timeout-for-a-sharepoint-2010-website
>>
>>
>>
>> On Fri, Dec 20, 2019 at 6:27 PM Karl Wright  wrote:
>>
>>> Hi Priya,
>>>
>>> This has nothing to do with anything in ManifoldCF.
>>>
>>> Karl
>>>
>>>
>>> On Fri, Dec 20, 2019 at 7:56 AM Priya Arora  wrote:
>>>
 Hi All,

 Is this issue something to have with below value/parameters set in
 properties.xml.
 [image: image.png]


 On Fri, Dec 20, 2019 at 5:21 PM Jorge Alonso Garcia 
 wrote:

> And what other sharepoint parameter I could check?
>
> Jorge Alonso Garcia
>
>
>
> El vie., 20 dic. 2019 a las 12:47, Karl Wright ()
> escribió:
>
>> The code seems correct and many people are using it without
>> encountering this problem.  There may be another SharePoint configuration
>> parameter you also need to look at somewhere.
>>
>> Karl
>>
>>
>> On Fri, Dec 20, 2019 at 6:38 AM Jorge Alonso Garcia <
>> jalon...@gmail.com> wrote:
>>
>>>
>>> Hi Karl,
>>> On sharepoint the list view threshold is 150,000 but we only receipt
>>> 20,000 from mcf
>>> [image: image.png]
>>>
>>>
>>> Jorge Alonso Garcia
>>>
>>>
>>>
>>> El jue., 19 dic. 2019 a las 19:19, Karl Wright ()
>>> escribió:
>>>
 If the job finished without error it implies that the number of
 documents returned from this one library was 1 when the service is
 called the first time (starting at doc 0), 1 when it's called the
 second time (starting at doc 1), and zero when it is called the 
 third
 time (starting at doc 2).

 The plugin code is unremarkable and actually gets results in chunks
 of 1000 under the covers:

 >>
 SPQuery listQuery = new SPQuery();
 listQuery.Query = ">>> Override=\"TRUE\">";
 listQuery.QueryThrottleMode =
 SPQueryThrottleOption.Override;
 listQuery.ViewAttributes =
 "Scope=\"Recursive\"";
 listQuery.ViewFields = ">>> Name='FileRef' />";
 listQuery.RowLimit = 1000;

 XmlDocument doc = new XmlDocument();
 retVal = doc.CreateElement("GetListItems",
 "
 http://schemas.microsoft.com/sharepoint/soap/directory/;);
 XmlNode getListItemsNode =
 doc.CreateElement("GetListItemsResponse");

 uint counter = 0;
 do
 {
 if (counter >= startRowParam +
 rowLimitParam)
 break;

 SPListItemCollection collListItems =
 oList.GetItems(listQuery);


 foreach (SPListItem oListItem in
 collListItems)
 {
 if (counter >= startRowParam &&
 counter < startRowParam + rowLimitParam)
 {
 XmlNode resultNode =
 doc.CreateElement("GetListItemsResult");
 XmlAttribute idAttribute =
 doc.CreateAttribute("FileRef");
 idAttribute.Value =
 oListItem.Url;

 resultNode.Attributes.Append(idAttribute);
 XmlAttribute urlAttribute =
 doc.CreateAttribute("ListItemURL");
 //urlAttribute.Value =
 oListItem.ParentList.DefaultViewUrl;
 

Re: sharepoint crawler documents limit

2020-01-27 Thread Jorge Alonso Garcia
Hi,
We had change timeout on sharepoint IIS and now the process is able to
crall all documents.
Thanks for your help



El lun., 30 dic. 2019 a las 12:18, Gaurav G ()
escribió:

> We had faced a similar issue, wherein our repo had 100,000 documents but
> our crawler stopped after 5 documents. The issue turned out to be that
> the Sharepoint query that was fired by the Sharepoint web service gets
> progressively slower and eventually the connection starts timing out before
> the next 1 records get returned. We increased a timeout parameter on
> Sharepoint to 10 minutes and then after that we were able to crawl all
> documents successfully.  I believe we had increased the parameter indicated
> in the link below
>
>
> https://weblogs.asp.net/jeffwids/how-to-increase-the-timeout-for-a-sharepoint-2010-website
>
>
>
> On Fri, Dec 20, 2019 at 6:27 PM Karl Wright  wrote:
>
>> Hi Priya,
>>
>> This has nothing to do with anything in ManifoldCF.
>>
>> Karl
>>
>>
>> On Fri, Dec 20, 2019 at 7:56 AM Priya Arora  wrote:
>>
>>> Hi All,
>>>
>>> Is this issue something to have with below value/parameters set in
>>> properties.xml.
>>> [image: image.png]
>>>
>>>
>>> On Fri, Dec 20, 2019 at 5:21 PM Jorge Alonso Garcia 
>>> wrote:
>>>
 And what other sharepoint parameter I could check?

 Jorge Alonso Garcia



 El vie., 20 dic. 2019 a las 12:47, Karl Wright ()
 escribió:

> The code seems correct and many people are using it without
> encountering this problem.  There may be another SharePoint configuration
> parameter you also need to look at somewhere.
>
> Karl
>
>
> On Fri, Dec 20, 2019 at 6:38 AM Jorge Alonso Garcia <
> jalon...@gmail.com> wrote:
>
>>
>> Hi Karl,
>> On sharepoint the list view threshold is 150,000 but we only receipt
>> 20,000 from mcf
>> [image: image.png]
>>
>>
>> Jorge Alonso Garcia
>>
>>
>>
>> El jue., 19 dic. 2019 a las 19:19, Karl Wright ()
>> escribió:
>>
>>> If the job finished without error it implies that the number of
>>> documents returned from this one library was 1 when the service is
>>> called the first time (starting at doc 0), 1 when it's called the
>>> second time (starting at doc 1), and zero when it is called the 
>>> third
>>> time (starting at doc 2).
>>>
>>> The plugin code is unremarkable and actually gets results in chunks
>>> of 1000 under the covers:
>>>
>>> >>
>>> SPQuery listQuery = new SPQuery();
>>> listQuery.Query = ">> Override=\"TRUE\">";
>>> listQuery.QueryThrottleMode =
>>> SPQueryThrottleOption.Override;
>>> listQuery.ViewAttributes =
>>> "Scope=\"Recursive\"";
>>> listQuery.ViewFields = ">> Name='FileRef' />";
>>> listQuery.RowLimit = 1000;
>>>
>>> XmlDocument doc = new XmlDocument();
>>> retVal = doc.CreateElement("GetListItems",
>>> "
>>> http://schemas.microsoft.com/sharepoint/soap/directory/;);
>>> XmlNode getListItemsNode =
>>> doc.CreateElement("GetListItemsResponse");
>>>
>>> uint counter = 0;
>>> do
>>> {
>>> if (counter >= startRowParam +
>>> rowLimitParam)
>>> break;
>>>
>>> SPListItemCollection collListItems =
>>> oList.GetItems(listQuery);
>>>
>>>
>>> foreach (SPListItem oListItem in
>>> collListItems)
>>> {
>>> if (counter >= startRowParam &&
>>> counter < startRowParam + rowLimitParam)
>>> {
>>> XmlNode resultNode =
>>> doc.CreateElement("GetListItemsResult");
>>> XmlAttribute idAttribute =
>>> doc.CreateAttribute("FileRef");
>>> idAttribute.Value =
>>> oListItem.Url;
>>>
>>> resultNode.Attributes.Append(idAttribute);
>>> XmlAttribute urlAttribute =
>>> doc.CreateAttribute("ListItemURL");
>>> //urlAttribute.Value =
>>> oListItem.ParentList.DefaultViewUrl;
>>> urlAttribute.Value =
>>> string.Format("{0}?ID={1}",
>>> oListItem.ParentList.Forms[PAGETYPE.PAGE_DISPLAYFORM].ServerRelativeUrl,
>>> oListItem.ID);
>>>
>>> resultNode.Attributes.Append(urlAttribute);
>>>
>>> getListItemsNode.AppendChild(resultNode);