I'm glad you got by this. Thanks for letting us know what the issue was. Karl
On Mon, Jan 27, 2020 at 4:05 AM Jorge Alonso Garcia <jalon...@gmail.com> wrote: > Hi, > We had change timeout on sharepoint IIS and now the process is able to > crall all documents. > Thanks for your help > > > > El lun., 30 dic. 2019 a las 12:18, Gaurav G (<goyalgaur...@gmail.com>) > escribió: > >> We had faced a similar issue, wherein our repo had 100,000 documents but >> our crawler stopped after 50000 documents. The issue turned out to be that >> the Sharepoint query that was fired by the Sharepoint web service gets >> progressively slower and eventually the connection starts timing out before >> the next 10000 records get returned. We increased a timeout parameter on >> Sharepoint to 10 minutes and then after that we were able to crawl all >> documents successfully. I believe we had increased the parameter indicated >> in the link below >> >> >> https://weblogs.asp.net/jeffwids/how-to-increase-the-timeout-for-a-sharepoint-2010-website >> >> >> >> On Fri, Dec 20, 2019 at 6:27 PM Karl Wright <daddy...@gmail.com> wrote: >> >>> Hi Priya, >>> >>> This has nothing to do with anything in ManifoldCF. >>> >>> Karl >>> >>> >>> On Fri, Dec 20, 2019 at 7:56 AM Priya Arora <pr...@smartshore.nl> wrote: >>> >>>> Hi All, >>>> >>>> Is this issue something to have with below value/parameters set in >>>> properties.xml. >>>> [image: image.png] >>>> >>>> >>>> On Fri, Dec 20, 2019 at 5:21 PM Jorge Alonso Garcia <jalon...@gmail.com> >>>> wrote: >>>> >>>>> And what other sharepoint parameter I could check? >>>>> >>>>> Jorge Alonso Garcia >>>>> >>>>> >>>>> >>>>> El vie., 20 dic. 2019 a las 12:47, Karl Wright (<daddy...@gmail.com>) >>>>> escribió: >>>>> >>>>>> The code seems correct and many people are using it without >>>>>> encountering this problem. There may be another SharePoint configuration >>>>>> parameter you also need to look at somewhere. >>>>>> >>>>>> Karl >>>>>> >>>>>> >>>>>> On Fri, Dec 20, 2019 at 6:38 AM Jorge Alonso Garcia < >>>>>> jalon...@gmail.com> wrote: >>>>>> >>>>>>> >>>>>>> Hi Karl, >>>>>>> On sharepoint the list view threshold is 150,000 but we only receipt >>>>>>> 20,000 from mcf >>>>>>> [image: image.png] >>>>>>> >>>>>>> >>>>>>> Jorge Alonso Garcia >>>>>>> >>>>>>> >>>>>>> >>>>>>> El jue., 19 dic. 2019 a las 19:19, Karl Wright (<daddy...@gmail.com>) >>>>>>> escribió: >>>>>>> >>>>>>>> If the job finished without error it implies that the number of >>>>>>>> documents returned from this one library was 10000 when the service is >>>>>>>> called the first time (starting at doc 0), 10000 when it's called the >>>>>>>> second time (starting at doc 10000), and zero when it is called the >>>>>>>> third >>>>>>>> time (starting at doc 20000). >>>>>>>> >>>>>>>> The plugin code is unremarkable and actually gets results in chunks >>>>>>>> of 1000 under the covers: >>>>>>>> >>>>>>>> >>>>>> >>>>>>>> SPQuery listQuery = new SPQuery(); >>>>>>>> listQuery.Query = "<OrderBy >>>>>>>> Override=\"TRUE\"><FieldRef Name=\"FileRef\" /></OrderBy>"; >>>>>>>> listQuery.QueryThrottleMode = >>>>>>>> SPQueryThrottleOption.Override; >>>>>>>> listQuery.ViewAttributes = >>>>>>>> "Scope=\"Recursive\""; >>>>>>>> listQuery.ViewFields = "<FieldRef >>>>>>>> Name='FileRef' />"; >>>>>>>> listQuery.RowLimit = 1000; >>>>>>>> >>>>>>>> XmlDocument doc = new XmlDocument(); >>>>>>>> retVal = doc.CreateElement("GetListItems", >>>>>>>> " >>>>>>>> http://schemas.microsoft.com/sharepoint/soap/directory/"); >>>>>>>> XmlNode getListItemsNode = >>>>>>>> doc.CreateElement("GetListItemsResponse"); >>>>>>>> >>>>>>>> uint counter = 0; >>>>>>>> do >>>>>>>> { >>>>>>>> if (counter >= startRowParam + >>>>>>>> rowLimitParam) >>>>>>>> break; >>>>>>>> >>>>>>>> SPListItemCollection collListItems = >>>>>>>> oList.GetItems(listQuery); >>>>>>>> >>>>>>>> >>>>>>>> foreach (SPListItem oListItem in >>>>>>>> collListItems) >>>>>>>> { >>>>>>>> if (counter >= startRowParam && >>>>>>>> counter < startRowParam + rowLimitParam) >>>>>>>> { >>>>>>>> XmlNode resultNode = >>>>>>>> doc.CreateElement("GetListItemsResult"); >>>>>>>> XmlAttribute idAttribute = >>>>>>>> doc.CreateAttribute("FileRef"); >>>>>>>> idAttribute.Value = >>>>>>>> oListItem.Url; >>>>>>>> >>>>>>>> resultNode.Attributes.Append(idAttribute); >>>>>>>> XmlAttribute urlAttribute = >>>>>>>> doc.CreateAttribute("ListItemURL"); >>>>>>>> //urlAttribute.Value = >>>>>>>> oListItem.ParentList.DefaultViewUrl; >>>>>>>> urlAttribute.Value = >>>>>>>> string.Format("{0}?ID={1}", >>>>>>>> oListItem.ParentList.Forms[PAGETYPE.PAGE_DISPLAYFORM].ServerRelativeUrl, >>>>>>>> oListItem.ID); >>>>>>>> >>>>>>>> resultNode.Attributes.Append(urlAttribute); >>>>>>>> >>>>>>>> getListItemsNode.AppendChild(resultNode); >>>>>>>> } >>>>>>>> counter++; >>>>>>>> } >>>>>>>> >>>>>>>> listQuery.ListItemCollectionPosition = >>>>>>>> collListItems.ListItemCollectionPosition; >>>>>>>> >>>>>>>> } while >>>>>>>> (listQuery.ListItemCollectionPosition != null); >>>>>>>> >>>>>>>> retVal.AppendChild(getListItemsNode); >>>>>>>> <<<<<< >>>>>>>> >>>>>>>> The code is clearly working if you get 20000 results returned, so I >>>>>>>> submit that perhaps there's a configured limit in your SharePoint >>>>>>>> instance >>>>>>>> that prevents listing more than 20000. That's the only way I can >>>>>>>> explain >>>>>>>> this. >>>>>>>> >>>>>>>> Karl >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Dec 19, 2019 at 12:51 PM Jorge Alonso Garcia < >>>>>>>> jalon...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> The job finnish ok (several times) but always with this 20000 >>>>>>>>> documents, for some reason the loop only execute twice >>>>>>>>> >>>>>>>>> Jorge Alonso Garcia >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> El jue., 19 dic. 2019 a las 18:14, Karl Wright (< >>>>>>>>> daddy...@gmail.com>) escribió: >>>>>>>>> >>>>>>>>>> If the are all in one document, then you'd be running this code: >>>>>>>>>> >>>>>>>>>> >>>>>> >>>>>>>>>> int startingIndex = 0; >>>>>>>>>> int amtToRequest = 10000; >>>>>>>>>> while (true) >>>>>>>>>> { >>>>>>>>>> >>>>>>>>>> com.microsoft.sharepoint.webpartpages.GetListItemsResponseGetListItemsResult >>>>>>>>>> itemsResult = >>>>>>>>>> >>>>>>>>>> itemCall.getListItems(guid,Integer.toString(startingIndex),Integer.toString(amtToRequest)); >>>>>>>>>> >>>>>>>>>> MessageElement[] itemsList = itemsResult.get_any(); >>>>>>>>>> >>>>>>>>>> if (Logging.connectors.isDebugEnabled()){ >>>>>>>>>> Logging.connectors.debug("SharePoint: getChildren xml >>>>>>>>>> response: " + itemsList[0].toString()); >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> if (itemsList.length != 1) >>>>>>>>>> throw new ManifoldCFException("Bad response - >>>>>>>>>> expecting one outer 'GetListItems' node, saw >>>>>>>>>> "+Integer.toString(itemsList.length)); >>>>>>>>>> >>>>>>>>>> MessageElement items = itemsList[0]; >>>>>>>>>> if >>>>>>>>>> (!items.getElementName().getLocalName().equals("GetListItems")) >>>>>>>>>> throw new ManifoldCFException("Bad response - outer >>>>>>>>>> node should have been 'GetListItems' node"); >>>>>>>>>> >>>>>>>>>> int resultCount = 0; >>>>>>>>>> Iterator iter = items.getChildElements(); >>>>>>>>>> while (iter.hasNext()) >>>>>>>>>> { >>>>>>>>>> MessageElement child = (MessageElement)iter.next(); >>>>>>>>>> if >>>>>>>>>> (child.getElementName().getLocalName().equals("GetListItemsResponse")) >>>>>>>>>> { >>>>>>>>>> Iterator resultIter = child.getChildElements(); >>>>>>>>>> while (resultIter.hasNext()) >>>>>>>>>> { >>>>>>>>>> MessageElement result = >>>>>>>>>> (MessageElement)resultIter.next(); >>>>>>>>>> if >>>>>>>>>> (result.getElementName().getLocalName().equals("GetListItemsResult")) >>>>>>>>>> { >>>>>>>>>> resultCount++; >>>>>>>>>> String relPath = result.getAttribute("FileRef"); >>>>>>>>>> String displayURL = >>>>>>>>>> result.getAttribute("ListItemURL"); >>>>>>>>>> fileStream.addFile( relPath, displayURL ); >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> if (resultCount < amtToRequest) >>>>>>>>>> break; >>>>>>>>>> >>>>>>>>>> startingIndex += resultCount; >>>>>>>>>> } >>>>>>>>>> <<<<<< >>>>>>>>>> >>>>>>>>>> What this does is request library content URLs in chunks of >>>>>>>>>> 10000. It stops when it receives less than 10000 documents from any >>>>>>>>>> one >>>>>>>>>> request. >>>>>>>>>> >>>>>>>>>> If the documents were all in one library, then one call to the >>>>>>>>>> web service yielded 10000 documents, and the second call yielded >>>>>>>>>> 10000 >>>>>>>>>> documents, and there was no third call for no reason I can figure >>>>>>>>>> out. >>>>>>>>>> Since 10000 documents were returned each time the loop ought to just >>>>>>>>>> continue, unless there was some kind of error. Does the job >>>>>>>>>> succeed, or >>>>>>>>>> does it abort? >>>>>>>>>> >>>>>>>>>> Karl >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Dec 19, 2019 at 12:05 PM Karl Wright <daddy...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> If you are using the MCF plugin, and selecting the appropriate >>>>>>>>>>> version of Sharepoint in the connection configuration, there is no >>>>>>>>>>> hard >>>>>>>>>>> limit I'm aware of for any Sharepoint job. We have lots of other >>>>>>>>>>> people >>>>>>>>>>> using SharePoint and nobody has reported this ever before. >>>>>>>>>>> >>>>>>>>>>> If your SharePoint connection says "SharePoint 2003" as the >>>>>>>>>>> SharePoint version, then sure, that would be expected behavior. So >>>>>>>>>>> please >>>>>>>>>>> check that first. >>>>>>>>>>> >>>>>>>>>>> The other question I have is your description of you first >>>>>>>>>>> getting 10001 documents and then later 20002. That's not how >>>>>>>>>>> ManifoldCF >>>>>>>>>>> works. At the start of the crawl, seeds are added; this would >>>>>>>>>>> start out >>>>>>>>>>> just being the root, and then other documents would be discovered >>>>>>>>>>> as the >>>>>>>>>>> crawl proceeded, after subsites and libraries are discovered. So I >>>>>>>>>>> am >>>>>>>>>>> still trying to square that with your description of how this is >>>>>>>>>>> working >>>>>>>>>>> for you. >>>>>>>>>>> >>>>>>>>>>> Are all of your documents in one library? Or two libraries? >>>>>>>>>>> >>>>>>>>>>> Karl >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Dec 19, 2019 at 11:42 AM Jorge Alonso Garcia < >>>>>>>>>>> jalon...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> On UI shows 20,002 documents (on a firts phase show 10,001,and >>>>>>>>>>>> after sometime of process raise to 20,002) . >>>>>>>>>>>> It looks like a hard limit, there is more files on sharepoint >>>>>>>>>>>> with the used criteria >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Jorge Alonso Garcia >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> El jue., 19 dic. 2019 a las 16:05, Karl Wright (< >>>>>>>>>>>> daddy...@gmail.com>) escribió: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Jorge, >>>>>>>>>>>>> >>>>>>>>>>>>> When you run the job, do you see more than 20,000 documents as >>>>>>>>>>>>> part of it? >>>>>>>>>>>>> >>>>>>>>>>>>> Do you see *exactly* 20,000 documents as part of it? >>>>>>>>>>>>> >>>>>>>>>>>>> Unless you are seeing a hard number like that in the UI for >>>>>>>>>>>>> that job on the job status page, I doubt very much that the >>>>>>>>>>>>> problem is a >>>>>>>>>>>>> numerical limitation in the number of documents. I would suspect >>>>>>>>>>>>> that the >>>>>>>>>>>>> inclusion criteria, e.g. the mime type or maximum length, is >>>>>>>>>>>>> excluding >>>>>>>>>>>>> documents. >>>>>>>>>>>>> >>>>>>>>>>>>> Karl >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Dec 19, 2019 at 8:51 AM Jorge Alonso Garcia < >>>>>>>>>>>>> jalon...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Karl, >>>>>>>>>>>>>> We had installed the shaterpoint plugin, and access properly >>>>>>>>>>>>>> http:/server/_vti_bin/MCPermissions.asmx >>>>>>>>>>>>>> >>>>>>>>>>>>>> [image: image.png] >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sharepoint has more than 20,000 documents, but when execute >>>>>>>>>>>>>> the jon only extract these 20,000. How Can I check where is the >>>>>>>>>>>>>> issue? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jorge Alonso Garcia >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> El jue., 19 dic. 2019 a las 12:52, Karl Wright (< >>>>>>>>>>>>>> daddy...@gmail.com>) escribió: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> By "stop at 20,000" do you mean that it finds more than >>>>>>>>>>>>>>> 20,000 but stops crawling at that time? Or what exactly do you >>>>>>>>>>>>>>> mean here? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> FWIW, the behavior you describe sounds like you may not have >>>>>>>>>>>>>>> installed the SharePoint plugin and may have selected a version >>>>>>>>>>>>>>> of >>>>>>>>>>>>>>> SharePoint that is inappropriate. All SharePoint versions >>>>>>>>>>>>>>> after 2008 limit >>>>>>>>>>>>>>> the number of documents returned using the standard web >>>>>>>>>>>>>>> services methods. >>>>>>>>>>>>>>> The plugin allows us to bypass that hard limit. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Karl >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Dec 19, 2019 at 6:37 AM Jorge Alonso Garcia < >>>>>>>>>>>>>>> jalon...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> We have an isuse with sharepoint connector. >>>>>>>>>>>>>>>> There is a job that crawl a sharepoint 2016, but it is not >>>>>>>>>>>>>>>> recovering all files, it stop at 20.000 documents without any >>>>>>>>>>>>>>>> error. >>>>>>>>>>>>>>>> Is there any parameter that should be change to avoid this >>>>>>>>>>>>>>>> limitation? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>> Jorge Alonso Garcia >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>