Hi, The job finnish ok (several times) but always with this 20000 documents, for some reason the loop only execute twice
Jorge Alonso Garcia El jue., 19 dic. 2019 a las 18:14, Karl Wright (<daddy...@gmail.com>) escribió: > If the are all in one document, then you'd be running this code: > > >>>>>> > int startingIndex = 0; > int amtToRequest = 10000; > while (true) > { > > com.microsoft.sharepoint.webpartpages.GetListItemsResponseGetListItemsResult > itemsResult = > > itemCall.getListItems(guid,Integer.toString(startingIndex),Integer.toString(amtToRequest)); > > MessageElement[] itemsList = itemsResult.get_any(); > > if (Logging.connectors.isDebugEnabled()){ > Logging.connectors.debug("SharePoint: getChildren xml > response: " + itemsList[0].toString()); > } > > if (itemsList.length != 1) > throw new ManifoldCFException("Bad response - expecting one > outer 'GetListItems' node, saw "+Integer.toString(itemsList.length)); > > MessageElement items = itemsList[0]; > if > (!items.getElementName().getLocalName().equals("GetListItems")) > throw new ManifoldCFException("Bad response - outer node > should have been 'GetListItems' node"); > > int resultCount = 0; > Iterator iter = items.getChildElements(); > while (iter.hasNext()) > { > MessageElement child = (MessageElement)iter.next(); > if > (child.getElementName().getLocalName().equals("GetListItemsResponse")) > { > Iterator resultIter = child.getChildElements(); > while (resultIter.hasNext()) > { > MessageElement result = (MessageElement)resultIter.next(); > if > (result.getElementName().getLocalName().equals("GetListItemsResult")) > { > resultCount++; > String relPath = result.getAttribute("FileRef"); > String displayURL = result.getAttribute("ListItemURL"); > fileStream.addFile( relPath, displayURL ); > } > } > > } > } > > if (resultCount < amtToRequest) > break; > > startingIndex += resultCount; > } > <<<<<< > > What this does is request library content URLs in chunks of 10000. It > stops when it receives less than 10000 documents from any one request. > > If the documents were all in one library, then one call to the web service > yielded 10000 documents, and the second call yielded 10000 documents, and > there was no third call for no reason I can figure out. Since 10000 > documents were returned each time the loop ought to just continue, unless > there was some kind of error. Does the job succeed, or does it abort? > > Karl > > > On Thu, Dec 19, 2019 at 12:05 PM Karl Wright <daddy...@gmail.com> wrote: > >> If you are using the MCF plugin, and selecting the appropriate version of >> Sharepoint in the connection configuration, there is no hard limit I'm >> aware of for any Sharepoint job. We have lots of other people using >> SharePoint and nobody has reported this ever before. >> >> If your SharePoint connection says "SharePoint 2003" as the SharePoint >> version, then sure, that would be expected behavior. So please check that >> first. >> >> The other question I have is your description of you first getting 10001 >> documents and then later 20002. That's not how ManifoldCF works. At the >> start of the crawl, seeds are added; this would start out just being the >> root, and then other documents would be discovered as the crawl proceeded, >> after subsites and libraries are discovered. So I am still trying to >> square that with your description of how this is working for you. >> >> Are all of your documents in one library? Or two libraries? >> >> Karl >> >> >> >> >> On Thu, Dec 19, 2019 at 11:42 AM Jorge Alonso Garcia <jalon...@gmail.com> >> wrote: >> >>> Hi, >>> On UI shows 20,002 documents (on a firts phase show 10,001,and after >>> sometime of process raise to 20,002) . >>> It looks like a hard limit, there is more files on sharepoint with the >>> used criteria >>> >>> >>> Jorge Alonso Garcia >>> >>> >>> >>> El jue., 19 dic. 2019 a las 16:05, Karl Wright (<daddy...@gmail.com>) >>> escribió: >>> >>>> Hi Jorge, >>>> >>>> When you run the job, do you see more than 20,000 documents as part of >>>> it? >>>> >>>> Do you see *exactly* 20,000 documents as part of it? >>>> >>>> Unless you are seeing a hard number like that in the UI for that job on >>>> the job status page, I doubt very much that the problem is a numerical >>>> limitation in the number of documents. I would suspect that the inclusion >>>> criteria, e.g. the mime type or maximum length, is excluding documents. >>>> >>>> Karl >>>> >>>> >>>> On Thu, Dec 19, 2019 at 8:51 AM Jorge Alonso Garcia <jalon...@gmail.com> >>>> wrote: >>>> >>>>> Hi Karl, >>>>> We had installed the shaterpoint plugin, and access properly >>>>> http:/server/_vti_bin/MCPermissions.asmx >>>>> >>>>> [image: image.png] >>>>> >>>>> Sharepoint has more than 20,000 documents, but when execute the jon >>>>> only extract these 20,000. How Can I check where is the issue? >>>>> >>>>> Regards >>>>> >>>>> >>>>> Jorge Alonso Garcia >>>>> >>>>> >>>>> >>>>> El jue., 19 dic. 2019 a las 12:52, Karl Wright (<daddy...@gmail.com>) >>>>> escribió: >>>>> >>>>>> By "stop at 20,000" do you mean that it finds more than 20,000 but >>>>>> stops crawling at that time? Or what exactly do you mean here? >>>>>> >>>>>> FWIW, the behavior you describe sounds like you may not have >>>>>> installed the SharePoint plugin and may have selected a version of >>>>>> SharePoint that is inappropriate. All SharePoint versions after 2008 >>>>>> limit >>>>>> the number of documents returned using the standard web services methods. >>>>>> The plugin allows us to bypass that hard limit. >>>>>> >>>>>> Karl >>>>>> >>>>>> >>>>>> On Thu, Dec 19, 2019 at 6:37 AM Jorge Alonso Garcia < >>>>>> jalon...@gmail.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> We have an isuse with sharepoint connector. >>>>>>> There is a job that crawl a sharepoint 2016, but it is not >>>>>>> recovering all files, it stop at 20.000 documents without any error. >>>>>>> Is there any parameter that should be change to avoid this >>>>>>> limitation? >>>>>>> >>>>>>> Regards >>>>>>> Jorge Alonso Garcia >>>>>>> >>>>>>>