[jira] [Commented] (CONNECTORS-1449) Add support for respecting the NoCrawl flag in Sharepoint
[ https://issues.apache.org/jira/browse/CONNECTORS-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816578#comment-16816578 ] Karl Wright commented on CONNECTORS-1449: - Hi, the method that is used to get the SOAP for metadata for a document is the following: {code} metadataValues = proxy.getFieldValues( sortedMetadataFields, encodePath(sitePath), listID, "/Lists/" + decodedItemPath.substring(cutoff+1), dspStsWorks ); {code} This calls: {code} { // SharePoint 2010: Get field values some other way // Sharepoint 2010; use Lists service instead ListsWS lservice = new ListsWS(baseUrl + site, userName, password, configuration, httpClient ); ListsSoapStub stub1 = (ListsSoapStub)lservice.getListsSoapHandler(); String sitePlusDocId = serverLocation + site + docId; if (sitePlusDocId.startsWith("/")) sitePlusDocId = sitePlusDocId.substring(1); GetListItemsQuery q = buildMatchQuery("FileRef","Text",sitePlusDocId); GetListItemsViewFields viewFields = buildViewFields(fieldNames); GetListItemsResponseGetListItemsResult items = stub1.getListItems(docLibrary, "", q, viewFields, "1", buildNonPagingQueryOptions(), null); if (items == null) return result; MessageElement[] list = items.get_any(); final String xmlResponse = list[0].toString(); if (Logging.connectors.isDebugEnabled()){ Logging.connectors.debug("SharePoint: getListItems FileRef value '"+sitePlusDocId+"', xml response: '" + xmlResponse + "'"); } {code} So it is calling the Lists service to do this right now (SharePoint 2010 and higher). For SharePoint 2003, it used the dspsts service, but that's been broken for a while, and I see no need to support this feature for that version of SharePoint. If you introduce a new service or method, I will also need a configuration switch that enables the code that calls it, or backwards compatibility will not be maintained. > Add support for respecting the NoCrawl flag in Sharepoint > - > > Key: CONNECTORS-1449 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1449 > Project: ManifoldCF > Issue Type: New Feature > Components: SharePoint connector >Reporter: Markus Schuch >Assignee: Markus Schuch >Priority: Major > Fix For: ManifoldCF next > > > There is a flag {{NoCrawl}} in sharepoint that indicates whether an object > should be crawled or not: > Lists > https://msdn.microsoft.com/en-us/library/office/microsoft.sharepoint.splist.nocrawl.aspx > Web > https://msdn.microsoft.com/en-us/library/office/microsoft.sharepoint.spweb.nocrawl.aspx > Field > https://msdn.microsoft.com/en-us/library/office/microsoft.sharepoint.spfield.nocrawl.aspx > Wouldn't it be nice to respect that flag? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1449) Add support for respecting the NoCrawl flag in Sharepoint
[ https://issues.apache.org/jira/browse/CONNECTORS-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816568#comment-16816568 ] Drai commented on CONNECTORS-1449: -- Karl, Regarding your recommendation: "I would propose (if either the dspsts, webs, or versions services do not handle this themselves) that we either add a new MCPermissions service that wraps whatever is currently used to obtain document metadata with one that also adds the "NoCrawl" flag to the result," I can get the MCPermissions c# code changed to add a new method for this purpose, deploy and test. Could you give a specification on this method? Name, Input and output, etc. I am not clear on whether you need "NoCrawl' flag along with ListItems returned by getListItems method or a new method to return document metadata by Id + NoCraw Flag. Please advise. Once this is done, what does it take for you to modify the Sharepoint 2013 repo connector to recognize and respect this flag? Regards -- *Durai Kalaiselvan* Founder, Cumilisys LLC Office: 408 940-5135 Mobile: 408 835 0309 This email and (any accompanying attachments) may contain confidential information belonging to the sender which is legally protected. The information is intended only for the use of the individual or entity to whom it is addressed, and others who have been specifically authorized by the addressee to receive it. If you are not an intended recipient, you are hereby notified that any disclosure, copying, or distribution of, or the taking of any action in reliance on, this communication or the information contained herein is strictly prohibited. If you have received this communication in error, please notify us immediately by email or telephone. Thank you for your cooperation. > Add support for respecting the NoCrawl flag in Sharepoint > - > > Key: CONNECTORS-1449 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1449 > Project: ManifoldCF > Issue Type: New Feature > Components: SharePoint connector >Reporter: Markus Schuch >Assignee: Markus Schuch >Priority: Major > Fix For: ManifoldCF next > > > There is a flag {{NoCrawl}} in sharepoint that indicates whether an object > should be crawled or not: > Lists > https://msdn.microsoft.com/en-us/library/office/microsoft.sharepoint.splist.nocrawl.aspx > Web > https://msdn.microsoft.com/en-us/library/office/microsoft.sharepoint.spweb.nocrawl.aspx > Field > https://msdn.microsoft.com/en-us/library/office/microsoft.sharepoint.spfield.nocrawl.aspx > Wouldn't it be nice to respect that flag? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1449) Add support for respecting the NoCrawl flag in Sharepoint
[ https://issues.apache.org/jira/browse/CONNECTORS-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816164#comment-16816164 ] Karl Wright commented on CONNECTORS-1449: - The MCPermissions plugin at present furnishes two services: one to get permissions for users, and the other to list documents without restrictions imposed by SharePoint. I would propose (if either the dspsts, webs, or versions services do not handle this themselves) that we either add a new MCPermissions service that wraps whatever is currently used to obtain document metadata with one that also adds the "NoCrawl" flag to the result, OR we put it in the existing Lists service wrapper we currently have. Note that the problem isn't going to be adequately addressed unless we can get this information on a per-document basis, somehow. We need to be able to tell the framework to delete the document when the connector looks at it. Doing this in a transformation connector won't work for that very same reason: the document won't be sent to the transformer unless it's noticed to have been changed in some way. So the repository connector really has to handle this. > Add support for respecting the NoCrawl flag in Sharepoint > - > > Key: CONNECTORS-1449 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1449 > Project: ManifoldCF > Issue Type: New Feature > Components: SharePoint connector >Reporter: Markus Schuch >Assignee: Markus Schuch >Priority: Major > Fix For: ManifoldCF next > > > There is a flag {{NoCrawl}} in sharepoint that indicates whether an object > should be crawled or not: > Lists > https://msdn.microsoft.com/en-us/library/office/microsoft.sharepoint.splist.nocrawl.aspx > Web > https://msdn.microsoft.com/en-us/library/office/microsoft.sharepoint.spweb.nocrawl.aspx > Field > https://msdn.microsoft.com/en-us/library/office/microsoft.sharepoint.spfield.nocrawl.aspx > Wouldn't it be nice to respect that flag? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1449) Add support for respecting the NoCrawl flag in Sharepoint
[ https://issues.apache.org/jira/browse/CONNECTORS-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816106#comment-16816106 ] Drai commented on CONNECTORS-1449: -- Crawling needs to be avoided because content authors/managers mark a library/list as 'non-crawlable' by using 'NoCrawl' settings. In current use case, fetching is fine because crawl user has access to content anyways. They do not want the items stored in these libraries to show up in search results. When the flag is made false again, thos items need to be indexed again. Should inserting a transformer between Sharepoint repository connection and Solr output connection achieve this? This way, depending on flag switches from True to False or vice versa , content will be pushed to solr or be ignored. Regarding modified list service: Will take a look at it. If we modify the lists service, should it be deployed to SP 2013 like MCPermissions.asmx deployment.? Thanks -- *Durai Kalaiselvan* Founder, Cumilisys LLC Office: 408 940-5135 Mobile: 408 835 0309 This email and (any accompanying attachments) may contain confidential information belonging to the sender which is legally protected. The information is intended only for the use of the individual or entity to whom it is addressed, and others who have been specifically authorized by the addressee to receive it. If you are not an intended recipient, you are hereby notified that any disclosure, copying, or distribution of, or the taking of any action in reliance on, this communication or the information contained herein is strictly prohibited. If you have received this communication in error, please notify us immediately by email or telephone. Thank you for your cooperation. > Add support for respecting the NoCrawl flag in Sharepoint > - > > Key: CONNECTORS-1449 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1449 > Project: ManifoldCF > Issue Type: New Feature > Components: SharePoint connector >Reporter: Markus Schuch >Assignee: Markus Schuch >Priority: Major > Fix For: ManifoldCF next > > > There is a flag {{NoCrawl}} in sharepoint that indicates whether an object > should be crawled or not: > Lists > https://msdn.microsoft.com/en-us/library/office/microsoft.sharepoint.splist.nocrawl.aspx > Web > https://msdn.microsoft.com/en-us/library/office/microsoft.sharepoint.spweb.nocrawl.aspx > Field > https://msdn.microsoft.com/en-us/library/office/microsoft.sharepoint.spfield.nocrawl.aspx > Wouldn't it be nice to respect that flag? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1449) Add support for respecting the NoCrawl flag in Sharepoint
[ https://issues.apache.org/jira/browse/CONNECTORS-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816017#comment-16816017 ] Karl Wright commented on CONNECTORS-1449: - It depends on why you want to avoid crawling something. If it's to prevent fetching it then you can't do it at the transformer level. But the right solution is to look for it in the SOAP response. There is another solution, which is to modify the ManifoldCF SharePoint plugin for SharePoint 2013 to return it from the modified Lists service. That would involve C# code changes, but would definitely allow us access to the flag in the connector. The code is checked in under https://svn.apache.org/repos/asf/manifoldcf/integration/sharepoint-2013/trunk . Have a look. > Add support for respecting the NoCrawl flag in Sharepoint > - > > Key: CONNECTORS-1449 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1449 > Project: ManifoldCF > Issue Type: New Feature > Components: SharePoint connector >Reporter: Markus Schuch >Assignee: Markus Schuch >Priority: Major > Fix For: ManifoldCF next > > > There is a flag {{NoCrawl}} in sharepoint that indicates whether an object > should be crawled or not: > Lists > https://msdn.microsoft.com/en-us/library/office/microsoft.sharepoint.splist.nocrawl.aspx > Web > https://msdn.microsoft.com/en-us/library/office/microsoft.sharepoint.spweb.nocrawl.aspx > Field > https://msdn.microsoft.com/en-us/library/office/microsoft.sharepoint.spfield.nocrawl.aspx > Wouldn't it be nice to respect that flag? -- This message was sent by Atlassian JIRA (v7.6.3#76005)