Hi All, Is this issue something to have with below value/parameters set in properties.xml. [image: image.png]
On Fri, Dec 20, 2019 at 5:21 PM Jorge Alonso Garcia <[email protected]> wrote: > And what other sharepoint parameter I could check? > > Jorge Alonso Garcia > > > > El vie., 20 dic. 2019 a las 12:47, Karl Wright (<[email protected]>) > escribió: > >> The code seems correct and many people are using it without encountering >> this problem. There may be another SharePoint configuration parameter you >> also need to look at somewhere. >> >> Karl >> >> >> On Fri, Dec 20, 2019 at 6:38 AM Jorge Alonso Garcia <[email protected]> >> wrote: >> >>> >>> Hi Karl, >>> On sharepoint the list view threshold is 150,000 but we only receipt >>> 20,000 from mcf >>> [image: image.png] >>> >>> >>> Jorge Alonso Garcia >>> >>> >>> >>> El jue., 19 dic. 2019 a las 19:19, Karl Wright (<[email protected]>) >>> escribió: >>> >>>> If the job finished without error it implies that the number of >>>> documents returned from this one library was 10000 when the service is >>>> called the first time (starting at doc 0), 10000 when it's called the >>>> second time (starting at doc 10000), and zero when it is called the third >>>> time (starting at doc 20000). >>>> >>>> The plugin code is unremarkable and actually gets results in chunks of >>>> 1000 under the covers: >>>> >>>> >>>>>> >>>> SPQuery listQuery = new SPQuery(); >>>> listQuery.Query = "<OrderBy >>>> Override=\"TRUE\"><FieldRef Name=\"FileRef\" /></OrderBy>"; >>>> listQuery.QueryThrottleMode = >>>> SPQueryThrottleOption.Override; >>>> listQuery.ViewAttributes = >>>> "Scope=\"Recursive\""; >>>> listQuery.ViewFields = "<FieldRef >>>> Name='FileRef' />"; >>>> listQuery.RowLimit = 1000; >>>> >>>> XmlDocument doc = new XmlDocument(); >>>> retVal = doc.CreateElement("GetListItems", >>>> " >>>> http://schemas.microsoft.com/sharepoint/soap/directory/"); >>>> XmlNode getListItemsNode = >>>> doc.CreateElement("GetListItemsResponse"); >>>> >>>> uint counter = 0; >>>> do >>>> { >>>> if (counter >= startRowParam + >>>> rowLimitParam) >>>> break; >>>> >>>> SPListItemCollection collListItems = >>>> oList.GetItems(listQuery); >>>> >>>> >>>> foreach (SPListItem oListItem in >>>> collListItems) >>>> { >>>> if (counter >= startRowParam && counter >>>> < startRowParam + rowLimitParam) >>>> { >>>> XmlNode resultNode = >>>> doc.CreateElement("GetListItemsResult"); >>>> XmlAttribute idAttribute = >>>> doc.CreateAttribute("FileRef"); >>>> idAttribute.Value = oListItem.Url; >>>> >>>> resultNode.Attributes.Append(idAttribute); >>>> XmlAttribute urlAttribute = >>>> doc.CreateAttribute("ListItemURL"); >>>> //urlAttribute.Value = >>>> oListItem.ParentList.DefaultViewUrl; >>>> urlAttribute.Value = >>>> string.Format("{0}?ID={1}", >>>> oListItem.ParentList.Forms[PAGETYPE.PAGE_DISPLAYFORM].ServerRelativeUrl, >>>> oListItem.ID); >>>> >>>> resultNode.Attributes.Append(urlAttribute); >>>> >>>> getListItemsNode.AppendChild(resultNode); >>>> } >>>> counter++; >>>> } >>>> >>>> listQuery.ListItemCollectionPosition = >>>> collListItems.ListItemCollectionPosition; >>>> >>>> } while (listQuery.ListItemCollectionPosition >>>> != null); >>>> >>>> retVal.AppendChild(getListItemsNode); >>>> <<<<<< >>>> >>>> The code is clearly working if you get 20000 results returned, so I >>>> submit that perhaps there's a configured limit in your SharePoint instance >>>> that prevents listing more than 20000. That's the only way I can explain >>>> this. >>>> >>>> Karl >>>> >>>> >>>> On Thu, Dec 19, 2019 at 12:51 PM Jorge Alonso Garcia < >>>> [email protected]> wrote: >>>> >>>>> Hi, >>>>> The job finnish ok (several times) but always with this 20000 >>>>> documents, for some reason the loop only execute twice >>>>> >>>>> Jorge Alonso Garcia >>>>> >>>>> >>>>> >>>>> El jue., 19 dic. 2019 a las 18:14, Karl Wright (<[email protected]>) >>>>> escribió: >>>>> >>>>>> If the are all in one document, then you'd be running this code: >>>>>> >>>>>> >>>>>> >>>>>> int startingIndex = 0; >>>>>> int amtToRequest = 10000; >>>>>> while (true) >>>>>> { >>>>>> >>>>>> com.microsoft.sharepoint.webpartpages.GetListItemsResponseGetListItemsResult >>>>>> itemsResult = >>>>>> >>>>>> itemCall.getListItems(guid,Integer.toString(startingIndex),Integer.toString(amtToRequest)); >>>>>> >>>>>> MessageElement[] itemsList = itemsResult.get_any(); >>>>>> >>>>>> if (Logging.connectors.isDebugEnabled()){ >>>>>> Logging.connectors.debug("SharePoint: getChildren xml >>>>>> response: " + itemsList[0].toString()); >>>>>> } >>>>>> >>>>>> if (itemsList.length != 1) >>>>>> throw new ManifoldCFException("Bad response - expecting >>>>>> one outer 'GetListItems' node, saw "+Integer.toString(itemsList.length)); >>>>>> >>>>>> MessageElement items = itemsList[0]; >>>>>> if >>>>>> (!items.getElementName().getLocalName().equals("GetListItems")) >>>>>> throw new ManifoldCFException("Bad response - outer node >>>>>> should have been 'GetListItems' node"); >>>>>> >>>>>> int resultCount = 0; >>>>>> Iterator iter = items.getChildElements(); >>>>>> while (iter.hasNext()) >>>>>> { >>>>>> MessageElement child = (MessageElement)iter.next(); >>>>>> if >>>>>> (child.getElementName().getLocalName().equals("GetListItemsResponse")) >>>>>> { >>>>>> Iterator resultIter = child.getChildElements(); >>>>>> while (resultIter.hasNext()) >>>>>> { >>>>>> MessageElement result = >>>>>> (MessageElement)resultIter.next(); >>>>>> if >>>>>> (result.getElementName().getLocalName().equals("GetListItemsResult")) >>>>>> { >>>>>> resultCount++; >>>>>> String relPath = result.getAttribute("FileRef"); >>>>>> String displayURL = >>>>>> result.getAttribute("ListItemURL"); >>>>>> fileStream.addFile( relPath, displayURL ); >>>>>> } >>>>>> } >>>>>> >>>>>> } >>>>>> } >>>>>> >>>>>> if (resultCount < amtToRequest) >>>>>> break; >>>>>> >>>>>> startingIndex += resultCount; >>>>>> } >>>>>> <<<<<< >>>>>> >>>>>> What this does is request library content URLs in chunks of 10000. >>>>>> It stops when it receives less than 10000 documents from any one request. >>>>>> >>>>>> If the documents were all in one library, then one call to the web >>>>>> service yielded 10000 documents, and the second call yielded 10000 >>>>>> documents, and there was no third call for no reason I can figure out. >>>>>> Since 10000 documents were returned each time the loop ought to just >>>>>> continue, unless there was some kind of error. Does the job succeed, or >>>>>> does it abort? >>>>>> >>>>>> Karl >>>>>> >>>>>> >>>>>> On Thu, Dec 19, 2019 at 12:05 PM Karl Wright <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> If you are using the MCF plugin, and selecting the appropriate >>>>>>> version of Sharepoint in the connection configuration, there is no hard >>>>>>> limit I'm aware of for any Sharepoint job. We have lots of other people >>>>>>> using SharePoint and nobody has reported this ever before. >>>>>>> >>>>>>> If your SharePoint connection says "SharePoint 2003" as the >>>>>>> SharePoint version, then sure, that would be expected behavior. So >>>>>>> please >>>>>>> check that first. >>>>>>> >>>>>>> The other question I have is your description of you first getting >>>>>>> 10001 documents and then later 20002. That's not how ManifoldCF works. >>>>>>> At >>>>>>> the start of the crawl, seeds are added; this would start out just being >>>>>>> the root, and then other documents would be discovered as the crawl >>>>>>> proceeded, after subsites and libraries are discovered. So I am still >>>>>>> trying to square that with your description of how this is working for >>>>>>> you. >>>>>>> >>>>>>> Are all of your documents in one library? Or two libraries? >>>>>>> >>>>>>> Karl >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Dec 19, 2019 at 11:42 AM Jorge Alonso Garcia < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> On UI shows 20,002 documents (on a firts phase show 10,001,and >>>>>>>> after sometime of process raise to 20,002) . >>>>>>>> It looks like a hard limit, there is more files on sharepoint with >>>>>>>> the used criteria >>>>>>>> >>>>>>>> >>>>>>>> Jorge Alonso Garcia >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> El jue., 19 dic. 2019 a las 16:05, Karl Wright (<[email protected]>) >>>>>>>> escribió: >>>>>>>> >>>>>>>>> Hi Jorge, >>>>>>>>> >>>>>>>>> When you run the job, do you see more than 20,000 documents as >>>>>>>>> part of it? >>>>>>>>> >>>>>>>>> Do you see *exactly* 20,000 documents as part of it? >>>>>>>>> >>>>>>>>> Unless you are seeing a hard number like that in the UI for that >>>>>>>>> job on the job status page, I doubt very much that the problem is a >>>>>>>>> numerical limitation in the number of documents. I would suspect >>>>>>>>> that the >>>>>>>>> inclusion criteria, e.g. the mime type or maximum length, is excluding >>>>>>>>> documents. >>>>>>>>> >>>>>>>>> Karl >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Dec 19, 2019 at 8:51 AM Jorge Alonso Garcia < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi Karl, >>>>>>>>>> We had installed the shaterpoint plugin, and access properly >>>>>>>>>> http:/server/_vti_bin/MCPermissions.asmx >>>>>>>>>> >>>>>>>>>> [image: image.png] >>>>>>>>>> >>>>>>>>>> Sharepoint has more than 20,000 documents, but when execute the >>>>>>>>>> jon only extract these 20,000. How Can I check where is the issue? >>>>>>>>>> >>>>>>>>>> Regards >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Jorge Alonso Garcia >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> El jue., 19 dic. 2019 a las 12:52, Karl Wright (< >>>>>>>>>> [email protected]>) escribió: >>>>>>>>>> >>>>>>>>>>> By "stop at 20,000" do you mean that it finds more than 20,000 >>>>>>>>>>> but stops crawling at that time? Or what exactly do you mean here? >>>>>>>>>>> >>>>>>>>>>> FWIW, the behavior you describe sounds like you may not have >>>>>>>>>>> installed the SharePoint plugin and may have selected a version of >>>>>>>>>>> SharePoint that is inappropriate. All SharePoint versions after >>>>>>>>>>> 2008 limit >>>>>>>>>>> the number of documents returned using the standard web services >>>>>>>>>>> methods. >>>>>>>>>>> The plugin allows us to bypass that hard limit. >>>>>>>>>>> >>>>>>>>>>> Karl >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Dec 19, 2019 at 6:37 AM Jorge Alonso Garcia < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> We have an isuse with sharepoint connector. >>>>>>>>>>>> There is a job that crawl a sharepoint 2016, but it is not >>>>>>>>>>>> recovering all files, it stop at 20.000 documents without any >>>>>>>>>>>> error. >>>>>>>>>>>> Is there any parameter that should be change to avoid this >>>>>>>>>>>> limitation? >>>>>>>>>>>> >>>>>>>>>>>> Regards >>>>>>>>>>>> Jorge Alonso Garcia >>>>>>>>>>>> >>>>>>>>>>>>
