Hi Priya, This has nothing to do with anything in ManifoldCF.
Karl On Fri, Dec 20, 2019 at 7:56 AM Priya Arora <pr...@smartshore.nl> wrote: > Hi All, > > Is this issue something to have with below value/parameters set in > properties.xml. > [image: image.png] > > > On Fri, Dec 20, 2019 at 5:21 PM Jorge Alonso Garcia <jalon...@gmail.com> > wrote: > >> And what other sharepoint parameter I could check? >> >> Jorge Alonso Garcia >> >> >> >> El vie., 20 dic. 2019 a las 12:47, Karl Wright (<daddy...@gmail.com>) >> escribió: >> >>> The code seems correct and many people are using it without encountering >>> this problem. There may be another SharePoint configuration parameter you >>> also need to look at somewhere. >>> >>> Karl >>> >>> >>> On Fri, Dec 20, 2019 at 6:38 AM Jorge Alonso Garcia <jalon...@gmail.com> >>> wrote: >>> >>>> >>>> Hi Karl, >>>> On sharepoint the list view threshold is 150,000 but we only receipt >>>> 20,000 from mcf >>>> [image: image.png] >>>> >>>> >>>> Jorge Alonso Garcia >>>> >>>> >>>> >>>> El jue., 19 dic. 2019 a las 19:19, Karl Wright (<daddy...@gmail.com>) >>>> escribió: >>>> >>>>> If the job finished without error it implies that the number of >>>>> documents returned from this one library was 10000 when the service is >>>>> called the first time (starting at doc 0), 10000 when it's called the >>>>> second time (starting at doc 10000), and zero when it is called the third >>>>> time (starting at doc 20000). >>>>> >>>>> The plugin code is unremarkable and actually gets results in chunks of >>>>> 1000 under the covers: >>>>> >>>>> >>>>>> >>>>> SPQuery listQuery = new SPQuery(); >>>>> listQuery.Query = "<OrderBy >>>>> Override=\"TRUE\"><FieldRef Name=\"FileRef\" /></OrderBy>"; >>>>> listQuery.QueryThrottleMode = >>>>> SPQueryThrottleOption.Override; >>>>> listQuery.ViewAttributes = >>>>> "Scope=\"Recursive\""; >>>>> listQuery.ViewFields = "<FieldRef >>>>> Name='FileRef' />"; >>>>> listQuery.RowLimit = 1000; >>>>> >>>>> XmlDocument doc = new XmlDocument(); >>>>> retVal = doc.CreateElement("GetListItems", >>>>> " >>>>> http://schemas.microsoft.com/sharepoint/soap/directory/"); >>>>> XmlNode getListItemsNode = >>>>> doc.CreateElement("GetListItemsResponse"); >>>>> >>>>> uint counter = 0; >>>>> do >>>>> { >>>>> if (counter >= startRowParam + >>>>> rowLimitParam) >>>>> break; >>>>> >>>>> SPListItemCollection collListItems = >>>>> oList.GetItems(listQuery); >>>>> >>>>> >>>>> foreach (SPListItem oListItem in >>>>> collListItems) >>>>> { >>>>> if (counter >= startRowParam && >>>>> counter < startRowParam + rowLimitParam) >>>>> { >>>>> XmlNode resultNode = >>>>> doc.CreateElement("GetListItemsResult"); >>>>> XmlAttribute idAttribute = >>>>> doc.CreateAttribute("FileRef"); >>>>> idAttribute.Value = oListItem.Url; >>>>> >>>>> resultNode.Attributes.Append(idAttribute); >>>>> XmlAttribute urlAttribute = >>>>> doc.CreateAttribute("ListItemURL"); >>>>> //urlAttribute.Value = >>>>> oListItem.ParentList.DefaultViewUrl; >>>>> urlAttribute.Value = >>>>> string.Format("{0}?ID={1}", >>>>> oListItem.ParentList.Forms[PAGETYPE.PAGE_DISPLAYFORM].ServerRelativeUrl, >>>>> oListItem.ID); >>>>> >>>>> resultNode.Attributes.Append(urlAttribute); >>>>> >>>>> getListItemsNode.AppendChild(resultNode); >>>>> } >>>>> counter++; >>>>> } >>>>> >>>>> listQuery.ListItemCollectionPosition = >>>>> collListItems.ListItemCollectionPosition; >>>>> >>>>> } while (listQuery.ListItemCollectionPosition >>>>> != null); >>>>> >>>>> retVal.AppendChild(getListItemsNode); >>>>> <<<<<< >>>>> >>>>> The code is clearly working if you get 20000 results returned, so I >>>>> submit that perhaps there's a configured limit in your SharePoint instance >>>>> that prevents listing more than 20000. That's the only way I can explain >>>>> this. >>>>> >>>>> Karl >>>>> >>>>> >>>>> On Thu, Dec 19, 2019 at 12:51 PM Jorge Alonso Garcia < >>>>> jalon...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> The job finnish ok (several times) but always with this 20000 >>>>>> documents, for some reason the loop only execute twice >>>>>> >>>>>> Jorge Alonso Garcia >>>>>> >>>>>> >>>>>> >>>>>> El jue., 19 dic. 2019 a las 18:14, Karl Wright (<daddy...@gmail.com>) >>>>>> escribió: >>>>>> >>>>>>> If the are all in one document, then you'd be running this code: >>>>>>> >>>>>>> >>>>>> >>>>>>> int startingIndex = 0; >>>>>>> int amtToRequest = 10000; >>>>>>> while (true) >>>>>>> { >>>>>>> >>>>>>> com.microsoft.sharepoint.webpartpages.GetListItemsResponseGetListItemsResult >>>>>>> itemsResult = >>>>>>> >>>>>>> itemCall.getListItems(guid,Integer.toString(startingIndex),Integer.toString(amtToRequest)); >>>>>>> >>>>>>> MessageElement[] itemsList = itemsResult.get_any(); >>>>>>> >>>>>>> if (Logging.connectors.isDebugEnabled()){ >>>>>>> Logging.connectors.debug("SharePoint: getChildren xml >>>>>>> response: " + itemsList[0].toString()); >>>>>>> } >>>>>>> >>>>>>> if (itemsList.length != 1) >>>>>>> throw new ManifoldCFException("Bad response - expecting >>>>>>> one outer 'GetListItems' node, saw >>>>>>> "+Integer.toString(itemsList.length)); >>>>>>> >>>>>>> MessageElement items = itemsList[0]; >>>>>>> if >>>>>>> (!items.getElementName().getLocalName().equals("GetListItems")) >>>>>>> throw new ManifoldCFException("Bad response - outer node >>>>>>> should have been 'GetListItems' node"); >>>>>>> >>>>>>> int resultCount = 0; >>>>>>> Iterator iter = items.getChildElements(); >>>>>>> while (iter.hasNext()) >>>>>>> { >>>>>>> MessageElement child = (MessageElement)iter.next(); >>>>>>> if >>>>>>> (child.getElementName().getLocalName().equals("GetListItemsResponse")) >>>>>>> { >>>>>>> Iterator resultIter = child.getChildElements(); >>>>>>> while (resultIter.hasNext()) >>>>>>> { >>>>>>> MessageElement result = >>>>>>> (MessageElement)resultIter.next(); >>>>>>> if >>>>>>> (result.getElementName().getLocalName().equals("GetListItemsResult")) >>>>>>> { >>>>>>> resultCount++; >>>>>>> String relPath = result.getAttribute("FileRef"); >>>>>>> String displayURL = >>>>>>> result.getAttribute("ListItemURL"); >>>>>>> fileStream.addFile( relPath, displayURL ); >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> if (resultCount < amtToRequest) >>>>>>> break; >>>>>>> >>>>>>> startingIndex += resultCount; >>>>>>> } >>>>>>> <<<<<< >>>>>>> >>>>>>> What this does is request library content URLs in chunks of 10000. >>>>>>> It stops when it receives less than 10000 documents from any one >>>>>>> request. >>>>>>> >>>>>>> If the documents were all in one library, then one call to the web >>>>>>> service yielded 10000 documents, and the second call yielded 10000 >>>>>>> documents, and there was no third call for no reason I can figure out. >>>>>>> Since 10000 documents were returned each time the loop ought to just >>>>>>> continue, unless there was some kind of error. Does the job succeed, or >>>>>>> does it abort? >>>>>>> >>>>>>> Karl >>>>>>> >>>>>>> >>>>>>> On Thu, Dec 19, 2019 at 12:05 PM Karl Wright <daddy...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> If you are using the MCF plugin, and selecting the appropriate >>>>>>>> version of Sharepoint in the connection configuration, there is no hard >>>>>>>> limit I'm aware of for any Sharepoint job. We have lots of other >>>>>>>> people >>>>>>>> using SharePoint and nobody has reported this ever before. >>>>>>>> >>>>>>>> If your SharePoint connection says "SharePoint 2003" as the >>>>>>>> SharePoint version, then sure, that would be expected behavior. So >>>>>>>> please >>>>>>>> check that first. >>>>>>>> >>>>>>>> The other question I have is your description of you first getting >>>>>>>> 10001 documents and then later 20002. That's not how ManifoldCF >>>>>>>> works. At >>>>>>>> the start of the crawl, seeds are added; this would start out just >>>>>>>> being >>>>>>>> the root, and then other documents would be discovered as the crawl >>>>>>>> proceeded, after subsites and libraries are discovered. So I am still >>>>>>>> trying to square that with your description of how this is working for >>>>>>>> you. >>>>>>>> >>>>>>>> Are all of your documents in one library? Or two libraries? >>>>>>>> >>>>>>>> Karl >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Dec 19, 2019 at 11:42 AM Jorge Alonso Garcia < >>>>>>>> jalon...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> On UI shows 20,002 documents (on a firts phase show 10,001,and >>>>>>>>> after sometime of process raise to 20,002) . >>>>>>>>> It looks like a hard limit, there is more files on sharepoint with >>>>>>>>> the used criteria >>>>>>>>> >>>>>>>>> >>>>>>>>> Jorge Alonso Garcia >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> El jue., 19 dic. 2019 a las 16:05, Karl Wright (< >>>>>>>>> daddy...@gmail.com>) escribió: >>>>>>>>> >>>>>>>>>> Hi Jorge, >>>>>>>>>> >>>>>>>>>> When you run the job, do you see more than 20,000 documents as >>>>>>>>>> part of it? >>>>>>>>>> >>>>>>>>>> Do you see *exactly* 20,000 documents as part of it? >>>>>>>>>> >>>>>>>>>> Unless you are seeing a hard number like that in the UI for that >>>>>>>>>> job on the job status page, I doubt very much that the problem is a >>>>>>>>>> numerical limitation in the number of documents. I would suspect >>>>>>>>>> that the >>>>>>>>>> inclusion criteria, e.g. the mime type or maximum length, is >>>>>>>>>> excluding >>>>>>>>>> documents. >>>>>>>>>> >>>>>>>>>> Karl >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Dec 19, 2019 at 8:51 AM Jorge Alonso Garcia < >>>>>>>>>> jalon...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Karl, >>>>>>>>>>> We had installed the shaterpoint plugin, and access properly >>>>>>>>>>> http:/server/_vti_bin/MCPermissions.asmx >>>>>>>>>>> >>>>>>>>>>> [image: image.png] >>>>>>>>>>> >>>>>>>>>>> Sharepoint has more than 20,000 documents, but when execute the >>>>>>>>>>> jon only extract these 20,000. How Can I check where is the issue? >>>>>>>>>>> >>>>>>>>>>> Regards >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Jorge Alonso Garcia >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> El jue., 19 dic. 2019 a las 12:52, Karl Wright (< >>>>>>>>>>> daddy...@gmail.com>) escribió: >>>>>>>>>>> >>>>>>>>>>>> By "stop at 20,000" do you mean that it finds more than 20,000 >>>>>>>>>>>> but stops crawling at that time? Or what exactly do you mean here? >>>>>>>>>>>> >>>>>>>>>>>> FWIW, the behavior you describe sounds like you may not have >>>>>>>>>>>> installed the SharePoint plugin and may have selected a version of >>>>>>>>>>>> SharePoint that is inappropriate. All SharePoint versions after >>>>>>>>>>>> 2008 limit >>>>>>>>>>>> the number of documents returned using the standard web services >>>>>>>>>>>> methods. >>>>>>>>>>>> The plugin allows us to bypass that hard limit. >>>>>>>>>>>> >>>>>>>>>>>> Karl >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Dec 19, 2019 at 6:37 AM Jorge Alonso Garcia < >>>>>>>>>>>> jalon...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> We have an isuse with sharepoint connector. >>>>>>>>>>>>> There is a job that crawl a sharepoint 2016, but it is not >>>>>>>>>>>>> recovering all files, it stop at 20.000 documents without any >>>>>>>>>>>>> error. >>>>>>>>>>>>> Is there any parameter that should be change to avoid this >>>>>>>>>>>>> limitation? >>>>>>>>>>>>> >>>>>>>>>>>>> Regards >>>>>>>>>>>>> Jorge Alonso Garcia >>>>>>>>>>>>> >>>>>>>>>>>>>