And what other sharepoint parameter I could check? Jorge Alonso Garcia
El vie., 20 dic. 2019 a las 12:47, Karl Wright (<[email protected]>) escribió: > The code seems correct and many people are using it without encountering > this problem. There may be another SharePoint configuration parameter you > also need to look at somewhere. > > Karl > > > On Fri, Dec 20, 2019 at 6:38 AM Jorge Alonso Garcia <[email protected]> > wrote: > >> >> Hi Karl, >> On sharepoint the list view threshold is 150,000 but we only receipt >> 20,000 from mcf >> [image: image.png] >> >> >> Jorge Alonso Garcia >> >> >> >> El jue., 19 dic. 2019 a las 19:19, Karl Wright (<[email protected]>) >> escribió: >> >>> If the job finished without error it implies that the number of >>> documents returned from this one library was 10000 when the service is >>> called the first time (starting at doc 0), 10000 when it's called the >>> second time (starting at doc 10000), and zero when it is called the third >>> time (starting at doc 20000). >>> >>> The plugin code is unremarkable and actually gets results in chunks of >>> 1000 under the covers: >>> >>> >>>>>> >>> SPQuery listQuery = new SPQuery(); >>> listQuery.Query = "<OrderBy >>> Override=\"TRUE\"><FieldRef Name=\"FileRef\" /></OrderBy>"; >>> listQuery.QueryThrottleMode = >>> SPQueryThrottleOption.Override; >>> listQuery.ViewAttributes = "Scope=\"Recursive\""; >>> listQuery.ViewFields = "<FieldRef Name='FileRef' >>> />"; >>> listQuery.RowLimit = 1000; >>> >>> XmlDocument doc = new XmlDocument(); >>> retVal = doc.CreateElement("GetListItems", >>> " >>> http://schemas.microsoft.com/sharepoint/soap/directory/"); >>> XmlNode getListItemsNode = >>> doc.CreateElement("GetListItemsResponse"); >>> >>> uint counter = 0; >>> do >>> { >>> if (counter >= startRowParam + rowLimitParam) >>> break; >>> >>> SPListItemCollection collListItems = >>> oList.GetItems(listQuery); >>> >>> >>> foreach (SPListItem oListItem in >>> collListItems) >>> { >>> if (counter >= startRowParam && counter >>> < startRowParam + rowLimitParam) >>> { >>> XmlNode resultNode = >>> doc.CreateElement("GetListItemsResult"); >>> XmlAttribute idAttribute = >>> doc.CreateAttribute("FileRef"); >>> idAttribute.Value = oListItem.Url; >>> >>> resultNode.Attributes.Append(idAttribute); >>> XmlAttribute urlAttribute = >>> doc.CreateAttribute("ListItemURL"); >>> //urlAttribute.Value = >>> oListItem.ParentList.DefaultViewUrl; >>> urlAttribute.Value = >>> string.Format("{0}?ID={1}", >>> oListItem.ParentList.Forms[PAGETYPE.PAGE_DISPLAYFORM].ServerRelativeUrl, >>> oListItem.ID); >>> >>> resultNode.Attributes.Append(urlAttribute); >>> >>> getListItemsNode.AppendChild(resultNode); >>> } >>> counter++; >>> } >>> >>> listQuery.ListItemCollectionPosition = >>> collListItems.ListItemCollectionPosition; >>> >>> } while (listQuery.ListItemCollectionPosition != >>> null); >>> >>> retVal.AppendChild(getListItemsNode); >>> <<<<<< >>> >>> The code is clearly working if you get 20000 results returned, so I >>> submit that perhaps there's a configured limit in your SharePoint instance >>> that prevents listing more than 20000. That's the only way I can explain >>> this. >>> >>> Karl >>> >>> >>> On Thu, Dec 19, 2019 at 12:51 PM Jorge Alonso Garcia <[email protected]> >>> wrote: >>> >>>> Hi, >>>> The job finnish ok (several times) but always with this 20000 >>>> documents, for some reason the loop only execute twice >>>> >>>> Jorge Alonso Garcia >>>> >>>> >>>> >>>> El jue., 19 dic. 2019 a las 18:14, Karl Wright (<[email protected]>) >>>> escribió: >>>> >>>>> If the are all in one document, then you'd be running this code: >>>>> >>>>> >>>>>> >>>>> int startingIndex = 0; >>>>> int amtToRequest = 10000; >>>>> while (true) >>>>> { >>>>> >>>>> com.microsoft.sharepoint.webpartpages.GetListItemsResponseGetListItemsResult >>>>> itemsResult = >>>>> >>>>> itemCall.getListItems(guid,Integer.toString(startingIndex),Integer.toString(amtToRequest)); >>>>> >>>>> MessageElement[] itemsList = itemsResult.get_any(); >>>>> >>>>> if (Logging.connectors.isDebugEnabled()){ >>>>> Logging.connectors.debug("SharePoint: getChildren xml >>>>> response: " + itemsList[0].toString()); >>>>> } >>>>> >>>>> if (itemsList.length != 1) >>>>> throw new ManifoldCFException("Bad response - expecting >>>>> one outer 'GetListItems' node, saw "+Integer.toString(itemsList.length)); >>>>> >>>>> MessageElement items = itemsList[0]; >>>>> if >>>>> (!items.getElementName().getLocalName().equals("GetListItems")) >>>>> throw new ManifoldCFException("Bad response - outer node >>>>> should have been 'GetListItems' node"); >>>>> >>>>> int resultCount = 0; >>>>> Iterator iter = items.getChildElements(); >>>>> while (iter.hasNext()) >>>>> { >>>>> MessageElement child = (MessageElement)iter.next(); >>>>> if >>>>> (child.getElementName().getLocalName().equals("GetListItemsResponse")) >>>>> { >>>>> Iterator resultIter = child.getChildElements(); >>>>> while (resultIter.hasNext()) >>>>> { >>>>> MessageElement result = >>>>> (MessageElement)resultIter.next(); >>>>> if >>>>> (result.getElementName().getLocalName().equals("GetListItemsResult")) >>>>> { >>>>> resultCount++; >>>>> String relPath = result.getAttribute("FileRef"); >>>>> String displayURL = >>>>> result.getAttribute("ListItemURL"); >>>>> fileStream.addFile( relPath, displayURL ); >>>>> } >>>>> } >>>>> >>>>> } >>>>> } >>>>> >>>>> if (resultCount < amtToRequest) >>>>> break; >>>>> >>>>> startingIndex += resultCount; >>>>> } >>>>> <<<<<< >>>>> >>>>> What this does is request library content URLs in chunks of 10000. It >>>>> stops when it receives less than 10000 documents from any one request. >>>>> >>>>> If the documents were all in one library, then one call to the web >>>>> service yielded 10000 documents, and the second call yielded 10000 >>>>> documents, and there was no third call for no reason I can figure out. >>>>> Since 10000 documents were returned each time the loop ought to just >>>>> continue, unless there was some kind of error. Does the job succeed, or >>>>> does it abort? >>>>> >>>>> Karl >>>>> >>>>> >>>>> On Thu, Dec 19, 2019 at 12:05 PM Karl Wright <[email protected]> >>>>> wrote: >>>>> >>>>>> If you are using the MCF plugin, and selecting the appropriate >>>>>> version of Sharepoint in the connection configuration, there is no hard >>>>>> limit I'm aware of for any Sharepoint job. We have lots of other people >>>>>> using SharePoint and nobody has reported this ever before. >>>>>> >>>>>> If your SharePoint connection says "SharePoint 2003" as the >>>>>> SharePoint version, then sure, that would be expected behavior. So >>>>>> please >>>>>> check that first. >>>>>> >>>>>> The other question I have is your description of you first getting >>>>>> 10001 documents and then later 20002. That's not how ManifoldCF works. >>>>>> At >>>>>> the start of the crawl, seeds are added; this would start out just being >>>>>> the root, and then other documents would be discovered as the crawl >>>>>> proceeded, after subsites and libraries are discovered. So I am still >>>>>> trying to square that with your description of how this is working for >>>>>> you. >>>>>> >>>>>> Are all of your documents in one library? Or two libraries? >>>>>> >>>>>> Karl >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Dec 19, 2019 at 11:42 AM Jorge Alonso Garcia < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi, >>>>>>> On UI shows 20,002 documents (on a firts phase show 10,001,and after >>>>>>> sometime of process raise to 20,002) . >>>>>>> It looks like a hard limit, there is more files on sharepoint with >>>>>>> the used criteria >>>>>>> >>>>>>> >>>>>>> Jorge Alonso Garcia >>>>>>> >>>>>>> >>>>>>> >>>>>>> El jue., 19 dic. 2019 a las 16:05, Karl Wright (<[email protected]>) >>>>>>> escribió: >>>>>>> >>>>>>>> Hi Jorge, >>>>>>>> >>>>>>>> When you run the job, do you see more than 20,000 documents as part >>>>>>>> of it? >>>>>>>> >>>>>>>> Do you see *exactly* 20,000 documents as part of it? >>>>>>>> >>>>>>>> Unless you are seeing a hard number like that in the UI for that >>>>>>>> job on the job status page, I doubt very much that the problem is a >>>>>>>> numerical limitation in the number of documents. I would suspect that >>>>>>>> the >>>>>>>> inclusion criteria, e.g. the mime type or maximum length, is excluding >>>>>>>> documents. >>>>>>>> >>>>>>>> Karl >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Dec 19, 2019 at 8:51 AM Jorge Alonso Garcia < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi Karl, >>>>>>>>> We had installed the shaterpoint plugin, and access properly >>>>>>>>> http:/server/_vti_bin/MCPermissions.asmx >>>>>>>>> >>>>>>>>> [image: image.png] >>>>>>>>> >>>>>>>>> Sharepoint has more than 20,000 documents, but when execute the >>>>>>>>> jon only extract these 20,000. How Can I check where is the issue? >>>>>>>>> >>>>>>>>> Regards >>>>>>>>> >>>>>>>>> >>>>>>>>> Jorge Alonso Garcia >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> El jue., 19 dic. 2019 a las 12:52, Karl Wright (< >>>>>>>>> [email protected]>) escribió: >>>>>>>>> >>>>>>>>>> By "stop at 20,000" do you mean that it finds more than 20,000 >>>>>>>>>> but stops crawling at that time? Or what exactly do you mean here? >>>>>>>>>> >>>>>>>>>> FWIW, the behavior you describe sounds like you may not have >>>>>>>>>> installed the SharePoint plugin and may have selected a version of >>>>>>>>>> SharePoint that is inappropriate. All SharePoint versions after >>>>>>>>>> 2008 limit >>>>>>>>>> the number of documents returned using the standard web services >>>>>>>>>> methods. >>>>>>>>>> The plugin allows us to bypass that hard limit. >>>>>>>>>> >>>>>>>>>> Karl >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Dec 19, 2019 at 6:37 AM Jorge Alonso Garcia < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> We have an isuse with sharepoint connector. >>>>>>>>>>> There is a job that crawl a sharepoint 2016, but it is not >>>>>>>>>>> recovering all files, it stop at 20.000 documents without any error. >>>>>>>>>>> Is there any parameter that should be change to avoid this >>>>>>>>>>> limitation? >>>>>>>>>>> >>>>>>>>>>> Regards >>>>>>>>>>> Jorge Alonso Garcia >>>>>>>>>>> >>>>>>>>>>>
