Hi Karl, On sharepoint the list view threshold is 150,000 but we only receipt 20,000 from mcf [image: image.png]
Jorge Alonso Garcia El jue., 19 dic. 2019 a las 19:19, Karl Wright (<daddy...@gmail.com>) escribió: > If the job finished without error it implies that the number of documents > returned from this one library was 10000 when the service is called the > first time (starting at doc 0), 10000 when it's called the second time > (starting at doc 10000), and zero when it is called the third time > (starting at doc 20000). > > The plugin code is unremarkable and actually gets results in chunks of > 1000 under the covers: > > >>>>>> > SPQuery listQuery = new SPQuery(); > listQuery.Query = "<OrderBy > Override=\"TRUE\"><FieldRef Name=\"FileRef\" /></OrderBy>"; > listQuery.QueryThrottleMode = > SPQueryThrottleOption.Override; > listQuery.ViewAttributes = "Scope=\"Recursive\""; > listQuery.ViewFields = "<FieldRef Name='FileRef' > />"; > listQuery.RowLimit = 1000; > > XmlDocument doc = new XmlDocument(); > retVal = doc.CreateElement("GetListItems", > " > http://schemas.microsoft.com/sharepoint/soap/directory/"); > XmlNode getListItemsNode = > doc.CreateElement("GetListItemsResponse"); > > uint counter = 0; > do > { > if (counter >= startRowParam + rowLimitParam) > break; > > SPListItemCollection collListItems = > oList.GetItems(listQuery); > > > foreach (SPListItem oListItem in collListItems) > { > if (counter >= startRowParam && counter < > startRowParam + rowLimitParam) > { > XmlNode resultNode = > doc.CreateElement("GetListItemsResult"); > XmlAttribute idAttribute = > doc.CreateAttribute("FileRef"); > idAttribute.Value = oListItem.Url; > > resultNode.Attributes.Append(idAttribute); > XmlAttribute urlAttribute = > doc.CreateAttribute("ListItemURL"); > //urlAttribute.Value = > oListItem.ParentList.DefaultViewUrl; > urlAttribute.Value = > string.Format("{0}?ID={1}", > oListItem.ParentList.Forms[PAGETYPE.PAGE_DISPLAYFORM].ServerRelativeUrl, > oListItem.ID); > > resultNode.Attributes.Append(urlAttribute); > > getListItemsNode.AppendChild(resultNode); > } > counter++; > } > > listQuery.ListItemCollectionPosition = > collListItems.ListItemCollectionPosition; > > } while (listQuery.ListItemCollectionPosition != > null); > > retVal.AppendChild(getListItemsNode); > <<<<<< > > The code is clearly working if you get 20000 results returned, so I submit > that perhaps there's a configured limit in your SharePoint instance that > prevents listing more than 20000. That's the only way I can explain this. > > Karl > > > On Thu, Dec 19, 2019 at 12:51 PM Jorge Alonso Garcia <jalon...@gmail.com> > wrote: > >> Hi, >> The job finnish ok (several times) but always with this 20000 documents, >> for some reason the loop only execute twice >> >> Jorge Alonso Garcia >> >> >> >> El jue., 19 dic. 2019 a las 18:14, Karl Wright (<daddy...@gmail.com>) >> escribió: >> >>> If the are all in one document, then you'd be running this code: >>> >>> >>>>>> >>> int startingIndex = 0; >>> int amtToRequest = 10000; >>> while (true) >>> { >>> >>> com.microsoft.sharepoint.webpartpages.GetListItemsResponseGetListItemsResult >>> itemsResult = >>> >>> itemCall.getListItems(guid,Integer.toString(startingIndex),Integer.toString(amtToRequest)); >>> >>> MessageElement[] itemsList = itemsResult.get_any(); >>> >>> if (Logging.connectors.isDebugEnabled()){ >>> Logging.connectors.debug("SharePoint: getChildren xml >>> response: " + itemsList[0].toString()); >>> } >>> >>> if (itemsList.length != 1) >>> throw new ManifoldCFException("Bad response - expecting one >>> outer 'GetListItems' node, saw "+Integer.toString(itemsList.length)); >>> >>> MessageElement items = itemsList[0]; >>> if >>> (!items.getElementName().getLocalName().equals("GetListItems")) >>> throw new ManifoldCFException("Bad response - outer node >>> should have been 'GetListItems' node"); >>> >>> int resultCount = 0; >>> Iterator iter = items.getChildElements(); >>> while (iter.hasNext()) >>> { >>> MessageElement child = (MessageElement)iter.next(); >>> if >>> (child.getElementName().getLocalName().equals("GetListItemsResponse")) >>> { >>> Iterator resultIter = child.getChildElements(); >>> while (resultIter.hasNext()) >>> { >>> MessageElement result = >>> (MessageElement)resultIter.next(); >>> if >>> (result.getElementName().getLocalName().equals("GetListItemsResult")) >>> { >>> resultCount++; >>> String relPath = result.getAttribute("FileRef"); >>> String displayURL = result.getAttribute("ListItemURL"); >>> fileStream.addFile( relPath, displayURL ); >>> } >>> } >>> >>> } >>> } >>> >>> if (resultCount < amtToRequest) >>> break; >>> >>> startingIndex += resultCount; >>> } >>> <<<<<< >>> >>> What this does is request library content URLs in chunks of 10000. It >>> stops when it receives less than 10000 documents from any one request. >>> >>> If the documents were all in one library, then one call to the web >>> service yielded 10000 documents, and the second call yielded 10000 >>> documents, and there was no third call for no reason I can figure out. >>> Since 10000 documents were returned each time the loop ought to just >>> continue, unless there was some kind of error. Does the job succeed, or >>> does it abort? >>> >>> Karl >>> >>> >>> On Thu, Dec 19, 2019 at 12:05 PM Karl Wright <daddy...@gmail.com> wrote: >>> >>>> If you are using the MCF plugin, and selecting the appropriate version >>>> of Sharepoint in the connection configuration, there is no hard limit I'm >>>> aware of for any Sharepoint job. We have lots of other people using >>>> SharePoint and nobody has reported this ever before. >>>> >>>> If your SharePoint connection says "SharePoint 2003" as the SharePoint >>>> version, then sure, that would be expected behavior. So please check that >>>> first. >>>> >>>> The other question I have is your description of you first getting >>>> 10001 documents and then later 20002. That's not how ManifoldCF works. At >>>> the start of the crawl, seeds are added; this would start out just being >>>> the root, and then other documents would be discovered as the crawl >>>> proceeded, after subsites and libraries are discovered. So I am still >>>> trying to square that with your description of how this is working for you. >>>> >>>> Are all of your documents in one library? Or two libraries? >>>> >>>> Karl >>>> >>>> >>>> >>>> >>>> On Thu, Dec 19, 2019 at 11:42 AM Jorge Alonso Garcia < >>>> jalon...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> On UI shows 20,002 documents (on a firts phase show 10,001,and after >>>>> sometime of process raise to 20,002) . >>>>> It looks like a hard limit, there is more files on sharepoint with the >>>>> used criteria >>>>> >>>>> >>>>> Jorge Alonso Garcia >>>>> >>>>> >>>>> >>>>> El jue., 19 dic. 2019 a las 16:05, Karl Wright (<daddy...@gmail.com>) >>>>> escribió: >>>>> >>>>>> Hi Jorge, >>>>>> >>>>>> When you run the job, do you see more than 20,000 documents as part >>>>>> of it? >>>>>> >>>>>> Do you see *exactly* 20,000 documents as part of it? >>>>>> >>>>>> Unless you are seeing a hard number like that in the UI for that job >>>>>> on the job status page, I doubt very much that the problem is a numerical >>>>>> limitation in the number of documents. I would suspect that the >>>>>> inclusion >>>>>> criteria, e.g. the mime type or maximum length, is excluding documents. >>>>>> >>>>>> Karl >>>>>> >>>>>> >>>>>> On Thu, Dec 19, 2019 at 8:51 AM Jorge Alonso Garcia < >>>>>> jalon...@gmail.com> wrote: >>>>>> >>>>>>> Hi Karl, >>>>>>> We had installed the shaterpoint plugin, and access properly >>>>>>> http:/server/_vti_bin/MCPermissions.asmx >>>>>>> >>>>>>> [image: image.png] >>>>>>> >>>>>>> Sharepoint has more than 20,000 documents, but when execute the jon >>>>>>> only extract these 20,000. How Can I check where is the issue? >>>>>>> >>>>>>> Regards >>>>>>> >>>>>>> >>>>>>> Jorge Alonso Garcia >>>>>>> >>>>>>> >>>>>>> >>>>>>> El jue., 19 dic. 2019 a las 12:52, Karl Wright (<daddy...@gmail.com>) >>>>>>> escribió: >>>>>>> >>>>>>>> By "stop at 20,000" do you mean that it finds more than 20,000 but >>>>>>>> stops crawling at that time? Or what exactly do you mean here? >>>>>>>> >>>>>>>> FWIW, the behavior you describe sounds like you may not have >>>>>>>> installed the SharePoint plugin and may have selected a version of >>>>>>>> SharePoint that is inappropriate. All SharePoint versions after 2008 >>>>>>>> limit >>>>>>>> the number of documents returned using the standard web services >>>>>>>> methods. >>>>>>>> The plugin allows us to bypass that hard limit. >>>>>>>> >>>>>>>> Karl >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Dec 19, 2019 at 6:37 AM Jorge Alonso Garcia < >>>>>>>> jalon...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> We have an isuse with sharepoint connector. >>>>>>>>> There is a job that crawl a sharepoint 2016, but it is not >>>>>>>>> recovering all files, it stop at 20.000 documents without any error. >>>>>>>>> Is there any parameter that should be change to avoid this >>>>>>>>> limitation? >>>>>>>>> >>>>>>>>> Regards >>>>>>>>> Jorge Alonso Garcia >>>>>>>>> >>>>>>>>>