Re: sharepoint crawler documents limit
I'm glad you got by this. Thanks for letting us know what the issue was. Karl On Mon, Jan 27, 2020 at 4:05 AM Jorge Alonso Garcia wrote: > Hi, > We had change timeout on sharepoint IIS and now the process is able to > crall all documents. > Thanks for your help > > > > El lun., 30 dic. 2019 a las 12:18, Gaurav G () > escribió: > >> We had faced a similar issue, wherein our repo had 100,000 documents but >> our crawler stopped after 5 documents. The issue turned out to be that >> the Sharepoint query that was fired by the Sharepoint web service gets >> progressively slower and eventually the connection starts timing out before >> the next 1 records get returned. We increased a timeout parameter on >> Sharepoint to 10 minutes and then after that we were able to crawl all >> documents successfully. I believe we had increased the parameter indicated >> in the link below >> >> >> https://weblogs.asp.net/jeffwids/how-to-increase-the-timeout-for-a-sharepoint-2010-website >> >> >> >> On Fri, Dec 20, 2019 at 6:27 PM Karl Wright wrote: >> >>> Hi Priya, >>> >>> This has nothing to do with anything in ManifoldCF. >>> >>> Karl >>> >>> >>> On Fri, Dec 20, 2019 at 7:56 AM Priya Arora wrote: >>> Hi All, Is this issue something to have with below value/parameters set in properties.xml. [image: image.png] On Fri, Dec 20, 2019 at 5:21 PM Jorge Alonso Garcia wrote: > And what other sharepoint parameter I could check? > > Jorge Alonso Garcia > > > > El vie., 20 dic. 2019 a las 12:47, Karl Wright () > escribió: > >> The code seems correct and many people are using it without >> encountering this problem. There may be another SharePoint configuration >> parameter you also need to look at somewhere. >> >> Karl >> >> >> On Fri, Dec 20, 2019 at 6:38 AM Jorge Alonso Garcia < >> jalon...@gmail.com> wrote: >> >>> >>> Hi Karl, >>> On sharepoint the list view threshold is 150,000 but we only receipt >>> 20,000 from mcf >>> [image: image.png] >>> >>> >>> Jorge Alonso Garcia >>> >>> >>> >>> El jue., 19 dic. 2019 a las 19:19, Karl Wright () >>> escribió: >>> If the job finished without error it implies that the number of documents returned from this one library was 1 when the service is called the first time (starting at doc 0), 1 when it's called the second time (starting at doc 1), and zero when it is called the third time (starting at doc 2). The plugin code is unremarkable and actually gets results in chunks of 1000 under the covers: >> SPQuery listQuery = new SPQuery(); listQuery.Query = ">>> Override=\"TRUE\">"; listQuery.QueryThrottleMode = SPQueryThrottleOption.Override; listQuery.ViewAttributes = "Scope=\"Recursive\""; listQuery.ViewFields = ">>> Name='FileRef' />"; listQuery.RowLimit = 1000; XmlDocument doc = new XmlDocument(); retVal = doc.CreateElement("GetListItems", " http://schemas.microsoft.com/sharepoint/soap/directory/;); XmlNode getListItemsNode = doc.CreateElement("GetListItemsResponse"); uint counter = 0; do { if (counter >= startRowParam + rowLimitParam) break; SPListItemCollection collListItems = oList.GetItems(listQuery); foreach (SPListItem oListItem in collListItems) { if (counter >= startRowParam && counter < startRowParam + rowLimitParam) { XmlNode resultNode = doc.CreateElement("GetListItemsResult"); XmlAttribute idAttribute = doc.CreateAttribute("FileRef"); idAttribute.Value = oListItem.Url; resultNode.Attributes.Append(idAttribute); XmlAttribute urlAttribute = doc.CreateAttribute("ListItemURL"); //urlAttribute.Value = oListItem.ParentList.DefaultViewUrl;
Re: sharepoint crawler documents limit
Hi, We had change timeout on sharepoint IIS and now the process is able to crall all documents. Thanks for your help El lun., 30 dic. 2019 a las 12:18, Gaurav G () escribió: > We had faced a similar issue, wherein our repo had 100,000 documents but > our crawler stopped after 5 documents. The issue turned out to be that > the Sharepoint query that was fired by the Sharepoint web service gets > progressively slower and eventually the connection starts timing out before > the next 1 records get returned. We increased a timeout parameter on > Sharepoint to 10 minutes and then after that we were able to crawl all > documents successfully. I believe we had increased the parameter indicated > in the link below > > > https://weblogs.asp.net/jeffwids/how-to-increase-the-timeout-for-a-sharepoint-2010-website > > > > On Fri, Dec 20, 2019 at 6:27 PM Karl Wright wrote: > >> Hi Priya, >> >> This has nothing to do with anything in ManifoldCF. >> >> Karl >> >> >> On Fri, Dec 20, 2019 at 7:56 AM Priya Arora wrote: >> >>> Hi All, >>> >>> Is this issue something to have with below value/parameters set in >>> properties.xml. >>> [image: image.png] >>> >>> >>> On Fri, Dec 20, 2019 at 5:21 PM Jorge Alonso Garcia >>> wrote: >>> And what other sharepoint parameter I could check? Jorge Alonso Garcia El vie., 20 dic. 2019 a las 12:47, Karl Wright () escribió: > The code seems correct and many people are using it without > encountering this problem. There may be another SharePoint configuration > parameter you also need to look at somewhere. > > Karl > > > On Fri, Dec 20, 2019 at 6:38 AM Jorge Alonso Garcia < > jalon...@gmail.com> wrote: > >> >> Hi Karl, >> On sharepoint the list view threshold is 150,000 but we only receipt >> 20,000 from mcf >> [image: image.png] >> >> >> Jorge Alonso Garcia >> >> >> >> El jue., 19 dic. 2019 a las 19:19, Karl Wright () >> escribió: >> >>> If the job finished without error it implies that the number of >>> documents returned from this one library was 1 when the service is >>> called the first time (starting at doc 0), 1 when it's called the >>> second time (starting at doc 1), and zero when it is called the >>> third >>> time (starting at doc 2). >>> >>> The plugin code is unremarkable and actually gets results in chunks >>> of 1000 under the covers: >>> >>> >> >>> SPQuery listQuery = new SPQuery(); >>> listQuery.Query = ">> Override=\"TRUE\">"; >>> listQuery.QueryThrottleMode = >>> SPQueryThrottleOption.Override; >>> listQuery.ViewAttributes = >>> "Scope=\"Recursive\""; >>> listQuery.ViewFields = ">> Name='FileRef' />"; >>> listQuery.RowLimit = 1000; >>> >>> XmlDocument doc = new XmlDocument(); >>> retVal = doc.CreateElement("GetListItems", >>> " >>> http://schemas.microsoft.com/sharepoint/soap/directory/;); >>> XmlNode getListItemsNode = >>> doc.CreateElement("GetListItemsResponse"); >>> >>> uint counter = 0; >>> do >>> { >>> if (counter >= startRowParam + >>> rowLimitParam) >>> break; >>> >>> SPListItemCollection collListItems = >>> oList.GetItems(listQuery); >>> >>> >>> foreach (SPListItem oListItem in >>> collListItems) >>> { >>> if (counter >= startRowParam && >>> counter < startRowParam + rowLimitParam) >>> { >>> XmlNode resultNode = >>> doc.CreateElement("GetListItemsResult"); >>> XmlAttribute idAttribute = >>> doc.CreateAttribute("FileRef"); >>> idAttribute.Value = >>> oListItem.Url; >>> >>> resultNode.Attributes.Append(idAttribute); >>> XmlAttribute urlAttribute = >>> doc.CreateAttribute("ListItemURL"); >>> //urlAttribute.Value = >>> oListItem.ParentList.DefaultViewUrl; >>> urlAttribute.Value = >>> string.Format("{0}?ID={1}", >>> oListItem.ParentList.Forms[PAGETYPE.PAGE_DISPLAYFORM].ServerRelativeUrl, >>> oListItem.ID); >>> >>> resultNode.Attributes.Append(urlAttribute); >>> >>> getListItemsNode.AppendChild(resultNode);