[jira] [Commented] (CONNECTORS-1668) Use of Wild Characters in SharePoint Connector.

2021-05-23 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17350025#comment-17350025
 ] 

Karl Wright commented on CONNECTORS-1668:
-

If you think you have a web service call that will locate the list of virtual 
sites given a root site, I'd create a method in SPSProxyHelper that implements 
that.  If you can show it works, the next thing to do is:

(1) Come up with a document identifier format that represents the root site.
(2) Change the processDocuments() method of the connector to recognize that 
document identifier format and call your new method.  The results should be 
added to the queue using "processActivities.addDocumentReference()".
(3) Decide how the document specification for this connector would need to be 
extended to support virtual site discovery - this is actually the tricky part, 
because you will need to modify the HTML editor for document specification 
editing to include this.

I can help you with (3) but first you need to prove you can do (1).
 

> Use of Wild Characters in SharePoint Connector.
> ---
>
> Key: CONNECTORS-1668
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1668
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.16
>Reporter: Shashank Dwivedi
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.16
>
> Attachments: image-2021-05-23-00-36-45-378.png
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi, 
> My SharePoint site is of the following *Format* :
> -*Projects(root)*
>    -*Project 1*
>         -Project Library
>         -Folder 1
>         -Folder 2 ... Folder N
>    -*Project 2 ... Project N*
>         -Project Library
>         -Folder 1 .. Folder N
> We have the *Projects(root site)* in this fashion from Project 1 to *Project 
> N(2)*, where N is a *large number.* I wish to process all files present 
> inside the *Project Library folder* of all the projects.
> So, as a Path rule I am currently supplying "*Projects/**/*Project Library/* 
> *". There is no space between / and * in the last.
> However, this is *not working out*. It is also pulling documents inside 
> *Folder 1, Folder2,..Folder N.* I want it to Process files only inside 
> Project Library.
> Please suggest me the right way to accomplish this Task.
> I could not identify any suggestion regarding the same in the End user 
> Documentation.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1668) Use of Wild Characters in SharePoint Connector.

2021-05-23 Thread Shashank Dwivedi (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17350010#comment-17350010
 ] 

Shashank Dwivedi commented on CONNECTORS-1668:
--

Hello Karl, 

I have an idea, since my path rules follow strict pattern like this like this :-

1) ""

2) ""

N) ""

where N = 20,000

I plan to first make a *webservice call* to SharePoint and get the *list of 
projects* by supplying *"/Projects"* *as parent site in getSite method in 
SPSProxyHelper*, and then manually create the above paths with String operation 
and append the complete list to *docspec* field against the *jobID in the 
database*.

I want this call and database update to happen automatically just *before the 
SharePoint job* is triggered.

The SPSProxyHelper class helps to make a  webservice call to SharePoint.

So my question is what are the *minimum parameters required to make this 
webservice call* and how can i make this webservice call and update the 
database just before the SharePoint Job starts running.

Where in the code should I make this call and update the database.

> Use of Wild Characters in SharePoint Connector.
> ---
>
> Key: CONNECTORS-1668
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1668
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.16
>Reporter: Shashank Dwivedi
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.16
>
> Attachments: image-2021-05-23-00-36-45-378.png
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi, 
> My SharePoint site is of the following *Format* :
> -*Projects(root)*
>    -*Project 1*
>         -Project Library
>         -Folder 1
>         -Folder 2 ... Folder N
>    -*Project 2 ... Project N*
>         -Project Library
>         -Folder 1 .. Folder N
> We have the *Projects(root site)* in this fashion from Project 1 to *Project 
> N(2)*, where N is a *large number.* I wish to process all files present 
> inside the *Project Library folder* of all the projects.
> So, as a Path rule I am currently supplying "*Projects/**/*Project Library/* 
> *". There is no space between / and * in the last.
> However, this is *not working out*. It is also pulling documents inside 
> *Folder 1, Folder2,..Folder N.* I want it to Process files only inside 
> Project Library.
> Please suggest me the right way to accomplish this Task.
> I could not identify any suggestion regarding the same in the End user 
> Documentation.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1668) Use of Wild Characters in SharePoint Connector.

2021-05-22 Thread Shashank Dwivedi (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17349818#comment-17349818
 ] 

Shashank Dwivedi commented on CONNECTORS-1668:
--

Thank You Karl, i will go through your suggested code and update you 
subsequently to close this ticket.

> Use of Wild Characters in SharePoint Connector.
> ---
>
> Key: CONNECTORS-1668
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1668
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.16
>Reporter: Shashank Dwivedi
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.16
>
> Attachments: image-2021-05-23-00-36-45-378.png
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi, 
> My SharePoint site is of the following *Format* :
> -*Projects(root)*
>    -*Project 1*
>         -Project Library
>         -Folder 1
>         -Folder 2 ... Folder N
>    -*Project 2 ... Project N*
>         -Project Library
>         -Folder 1 .. Folder N
> We have the *Projects(root site)* in this fashion from Project 1 to *Project 
> N(2)*, where N is a *large number.* I wish to process all files present 
> inside the *Project Library folder* of all the projects.
> So, as a Path rule I am currently supplying "*Projects/**/*Project Library/* 
> *". There is no space between / and * in the last.
> However, this is *not working out*. It is also pulling documents inside 
> *Folder 1, Folder2,..Folder N.* I want it to Process files only inside 
> Project Library.
> Please suggest me the right way to accomplish this Task.
> I could not identify any suggestion regarding the same in the End user 
> Documentation.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1668) Use of Wild Characters in SharePoint Connector.

2021-05-22 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17349817#comment-17349817
 ] 

Karl Wright commented on CONNECTORS-1668:
-

About whether we can implement "site discovery" in this connector: the problem 
is that Microsoft has deprecated the entire API we use, and the connector must 
be 100% redeveloped.  The old API did not have any ability to do site 
discovery.  Not sure what the new API has, but nobody on the MCF team has the 
six free weeks of coding time and access to a MS Sharepoint instance to build 
what is needed.  Volunteers welcome.


> Use of Wild Characters in SharePoint Connector.
> ---
>
> Key: CONNECTORS-1668
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1668
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.16
>Reporter: Shashank Dwivedi
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.16
>
> Attachments: image-2021-05-23-00-36-45-378.png
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi, 
> My SharePoint site is of the following *Format* :
> -*Projects(root)*
>    -*Project 1*
>         -Project Library
>         -Folder 1
>         -Folder 2 ... Folder N
>    -*Project 2 ... Project N*
>         -Project Library
>         -Folder 1 .. Folder N
> We have the *Projects(root site)* in this fashion from Project 1 to *Project 
> N(2)*, where N is a *large number.* I wish to process all files present 
> inside the *Project Library folder* of all the projects.
> So, as a Path rule I am currently supplying "*Projects/**/*Project Library/* 
> *". There is no space between / and * in the last.
> However, this is *not working out*. It is also pulling documents inside 
> *Folder 1, Folder2,..Folder N.* I want it to Process files only inside 
> Project Library.
> Please suggest me the right way to accomplish this Task.
> I could not identify any suggestion regarding the same in the End user 
> Documentation.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1668) Use of Wild Characters in SharePoint Connector.

2021-05-22 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17349815#comment-17349815
 ] 

Karl Wright commented on CONNECTORS-1668:
-

The logic for path rules is as follows:

{code}
  if (sn.getType().equals("pathrule"))
  {
// New-style rule.
// Here's the trick: We do what the first matching rule tells us to do.
String pathMatch = sn.getAttributeValue("match");
String action = sn.getAttributeValue("action");
String ruleType = sn.getAttributeValue("type");

// First, find out if we match EXACTLY.
if (checkMatch(libraryPath,0,pathMatch))
{
  // If this is true, the type also has to match if the rule is to 
apply.
  if (ruleType.equals("library"))
  {
if (Logging.connectors.isDebugEnabled())
  Logging.connectors.debug("SharePoint: Library '"+libraryPath+"' 
exactly matched rule path '"+pathMatch+"'");
if (action.equals("include"))
{
  // For include rules, partial match is good enough to proceed.
  if (Logging.connectors.isDebugEnabled())
Logging.connectors.debug("SharePoint: Including library 
'"+libraryPath+"'");
  return true;
}
if (Logging.connectors.isDebugEnabled())
  Logging.connectors.debug("SharePoint: Excluding library 
'"+libraryPath+"'");
return false;
  }
}
else if (ruleType.equals("file") && 
checkPartialPathMatch(libraryPath,0,pathMatch,1) && action.equals("include"))
{
  if (Logging.connectors.isDebugEnabled())
Logging.connectors.debug("SharePoint: Library '"+libraryPath+"' 
partially matched file rule path '"+pathMatch+"' - including");
  return true;
}
else if (ruleType.equals("folder") && 
checkPartialPathMatch(libraryPath,0,pathMatch,1) && action.equals("include"))
{
  if (Logging.connectors.isDebugEnabled())
Logging.connectors.debug("SharePoint: Library '"+libraryPath+"' 
partially matched folder rule path '"+pathMatch+"' - including");
  return true;
}
  }
}
{code}

I need to see the rule type; as you can see, to include a library, you need a 
library rule, and to include a site, you need a site rule.

The checkMatch() method does this:

{code}
  /** Recursive worker method for checkMatch.  Returns 'true' if there is a 
path that consumes both
  * strings in their entirety in a matched way.
  *@param caseSensitive is true if file names are case sensitive.
  *@param sourceMatch is the source string (w/o wildcards)
  *@param match is the match string (w/wildcards)
  *@return true if there is a match.
  */
  protected static boolean checkMatch(boolean caseSensitive, String 
sourceMatch, String match)
{code}

The partial path match method looks like this:

{code}
  protected static boolean checkPartialPathMatch( String sourceMatch, int 
sourceIndex, String match, int requiredExtraPathSections )
  {
// The partial match must be of a complete path, with at least a specified 
number of trailing path components possible in what remains.
// Path components can include everything but the "/" character itself.
//
// The match string is the one containing the wildcards.  Both the "*" 
wildcard and the "?" wildcard will match a "/", which is intended but is why 
this
// matcher is a little tricky to write.
//
// Note also that it is OK to return "true" more than strictly necessary, 
but it is never OK to return "false" incorrectly.

// This is a partial path match.  That means that we don't have to 
completely use up the match string, but what's left on the match string after 
the source
// string is used up MUST either be capable of being null, or be capable of 
starting with a "/"integral path sections, and MUST include at least n of these 
sections.
//
{code}


If you look at the code, you will note there's quite a bit of debug logging 
around path matching.  The basic point though is that the entire match string 
must be consumed for the full match, meaning that anything that is not a 
wildcard MUST match, and for a partial match there must be at least N sections 
left over after the match is entirely consumed.

To summarize:

(1) You need a Site rule to include a site.
(2) You need a Library rule to include a library.







> Use of Wild Characters in SharePoint Connector.
> ---
>
> Key: CONNECTORS-1668
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1668
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.16
>Reporter: Shashank Dwivedi
>Assignee: Karl Wright
>Priority: 

[jira] [Commented] (CONNECTORS-1668) Use of Wild Characters in SharePoint Connector.

2021-05-22 Thread Shashank Dwivedi (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17349816#comment-17349816
 ] 

Shashank Dwivedi commented on CONNECTORS-1668:
--

This is what the structure looks like, where project 1, Project 2 ... Project N 
are sharepoint sites and I need to process only doc.pdf with manifold, this is 
just a sample, there are many other documents under Project library which needs 
to be process, for this Task I am using the above stated path rule.

!image-2021-05-23-00-36-45-378.png!

> Use of Wild Characters in SharePoint Connector.
> ---
>
> Key: CONNECTORS-1668
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1668
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.16
>Reporter: Shashank Dwivedi
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.16
>
> Attachments: image-2021-05-23-00-36-45-378.png
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi, 
> My SharePoint site is of the following *Format* :
> -*Projects(root)*
>    -*Project 1*
>         -Project Library
>         -Folder 1
>         -Folder 2 ... Folder N
>    -*Project 2 ... Project N*
>         -Project Library
>         -Folder 1 .. Folder N
> We have the *Projects(root site)* in this fashion from Project 1 to *Project 
> N(2)*, where N is a *large number.* I wish to process all files present 
> inside the *Project Library folder* of all the projects.
> So, as a Path rule I am currently supplying "*Projects/**/*Project Library/* 
> *". There is no space between / and * in the last.
> However, this is *not working out*. It is also pulling documents inside 
> *Folder 1, Folder2,..Folder N.* I want it to Process files only inside 
> Project Library.
> Please suggest me the right way to accomplish this Task.
> I could not identify any suggestion regarding the same in the End user 
> Documentation.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1668) Use of Wild Characters in SharePoint Connector.

2021-05-22 Thread Shashank Dwivedi (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17349813#comment-17349813
 ] 

Shashank Dwivedi commented on CONNECTORS-1668:
--

Hey Karl,

*Thanks* for the quick reply.

So it is slightly difficult for me to put the screen shot here.

The docspec of my job has only one rule :-



Here star basically denotes all projects under Projects, the *Bold Portion is a 
site,* so i have to manually add all projects by clicking on *add site button* 
and then I can provide Project Library/* to get all files under Project 
Library(Library) under that specific Project.

There around 2 projects and they will increase in future, I was figuring 
out a way to use wild characters so that all sites can be added automatically 
under Projects.

This is similar to Connector-1668 issue, which i commented on.

 

> Use of Wild Characters in SharePoint Connector.
> ---
>
> Key: CONNECTORS-1668
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1668
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.16
>Reporter: Shashank Dwivedi
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.16
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi, 
> My SharePoint site is of the following *Format* :
> -*Projects(root)*
>    -*Project 1*
>         -Project Library
>         -Folder 1
>         -Folder 2 ... Folder N
>    -*Project 2 ... Project N*
>         -Project Library
>         -Folder 1 .. Folder N
> We have the *Projects(root site)* in this fashion from Project 1 to *Project 
> N(2)*, where N is a *large number.* I wish to process all files present 
> inside the *Project Library folder* of all the projects.
> So, as a Path rule I am currently supplying "*Projects/**/*Project Library/* 
> *". There is no space between / and * in the last.
> However, this is *not working out*. It is also pulling documents inside 
> *Folder 1, Folder2,..Folder N.* I want it to Process files only inside 
> Project Library.
> Please suggest me the right way to accomplish this Task.
> I could not identify any suggestion regarding the same in the End user 
> Documentation.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1668) Use of Wild Characters in SharePoint Connector.

2021-05-22 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17349807#comment-17349807
 ] 

Karl Wright commented on CONNECTORS-1668:
-

Could you view your job, and include a screen shot of the inclusion rules you 
have?  There are several different kinds of rules, I have to see what you're 
actually trying.



> Use of Wild Characters in SharePoint Connector.
> ---
>
> Key: CONNECTORS-1668
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1668
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: SharePoint connector
>Affects Versions: ManifoldCF 2.16
>Reporter: Shashank Dwivedi
>Priority: Major
> Fix For: ManifoldCF 2.16
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi, 
> My SharePoint site is of the following *Format* :
> -*Projects(root)*
>    -*Project 1*
>         -Project Library
>         -Folder 1
>         -Folder 2 ... Folder N
>    -*Project 2 ... Project N*
>         -Project Library
>         -Folder 1 .. Folder N
> We have the *Projects(root site)* in this fashion from Project 1 to *Project 
> N(2)*, where N is a *large number.* I wish to process all files present 
> inside the *Project Library folder* of all the projects.
> So, as a Path rule I am currently supplying "*Projects/**/*Project Library/* 
> *". There is no space between / and * in the last.
> However, this is *not working out*. It is also pulling documents inside 
> *Folder 1, Folder2,..Folder N.* I want it to Process files only inside 
> Project Library.
> Please suggest me the right way to accomplish this Task.
> I could not identify any suggestion regarding the same in the End user 
> Documentation.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)