[ 
https://issues.apache.org/jira/browse/SOLR-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will White updated SOLR-16820:
------------------------------
    Description: 
It's possible to create a collection via the CreateCollectionAPI which [passes 
validation from the 
SolrIdentifierValidation|https://github.com/apache/solr/blob/main/solr/solrj/src/java/org/apache/solr/client/solrj/util/SolrIdentifierValidator.java#L50-L52]
 (a regex which among other elements includes the '.' character), but that same 
collection name won't then pass validation when deployed/undeployed via the 
PackageTool because of the [packagemanager.PackageUtils validateCollection() 
method|https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/packagemanager/PackageUtils.java#L271].

A change [like this, using the existing 
SolrIdentifierValidator|https://github.com/apache/solr/commit/638fd768ebd7ed7908029ced08e56bed05a4a2a5]
 would bring the two validation steps back in line, although there's presumably 
a better approach.

*Potential risks*

As highlighted by Gus Heck [in this 
thread|https://lists.apache.org/thread/h7hnksgqwxxl7nkwkhn01r6jn8xjkjjs] 
changing the validation of collection names could be a risky change to make. 
The source of the PackageUtils regex appears to be 
[https://github.com/apache/lucene-solr/pull/994] from before Solr split from 
the Lucene project, and it seems that the regex wasn't crafted for a specific 
subset of use cases that specifically excluded the '.' character - it just 
appears to be the regex implemented at the time.

Using the {{SolrIdentifierValidator}} approach mentioned above as an example, 
other than disallowing a collection name that begins with a '-' character, the 
{{SolrIdentifierValidator.identifierPattern}} would be a strict expansion of 
the allowed collection names for the {{{}PackageUtils.validateCollections{}}}. 
Any other solution (such as [this more naive 
example|https://github.com/apache/solr/blame/998fffdccf51a0560589e2cb413e9da127a5f26e/solr/core/src/java/org/apache/solr/packagemanager/PackageUtils.java#L271])
 could similarly mitigate a lot of the potential risk by only expanding the 
allowed collection names.

  was:
It's possible to create a collection via the CreateCollectionAPI which [passes 
validation from the 
SolrIdentifierValidation|https://github.com/apache/solr/blob/main/solr/solrj/src/java/org/apache/solr/client/solrj/util/SolrIdentifierValidator.java#L50-L52]
 (a regex which includes the '.' character), but that same collection name 
won't then pass validation when deployed/undeployed via the PackageTool because 
of the [packagemanager.PackageUtils validateCollection() 
method|https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/packagemanager/PackageUtils.java#L271].

A change [like this, using the existing 
SolrIdentifierValidator|https://github.com/apache/solr/commit/638fd768ebd7ed7908029ced08e56bed05a4a2a5]
 would bring the two validation steps back in line, although there's presumably 
a better approach.

*Potential risks*

As highlighted by Gus Heck [in this 
thread|https://lists.apache.org/thread/h7hnksgqwxxl7nkwkhn01r6jn8xjkjjs] 
changing the validation of collection names could be a risky change to make. 
The source of the PackageUtils regex appears to be 
[https://github.com/apache/lucene-solr/pull/994] from before Solr split from 
the Lucene project, and it seems that the regex wasn't crafted for a specific 
subset of use cases that specifically excluded the '.' character - it just 
appears to be the regex implemented at the time.

Using the {{SolrIdentifierValidator}} approach mentioned above as an example, 
other than disallowing a collection name that begins with a '-' character, the 
{{SolrIdentifierValidator.identifierPattern}} would be a strict expansion of 
the allowed collection names for the {{PackageUtils.validateCollections}}. Any 
other solution (such as [this more naive 
example|https://github.com/apache/solr/blame/998fffdccf51a0560589e2cb413e9da127a5f26e/solr/core/src/java/org/apache/solr/packagemanager/PackageUtils.java#L271])
 could similarly mitigate a lot of the potential risk by only expanding the 
allowed collection names.


> PackageUtils collection validation is more restrictive than 
> CreateCollectionAPI allows
> --------------------------------------------------------------------------------------
>
>                 Key: SOLR-16820
>                 URL: https://issues.apache.org/jira/browse/SOLR-16820
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Package Manager
>            Reporter: Will White
>            Priority: Minor
>              Labels: packagemanager
>
> It's possible to create a collection via the CreateCollectionAPI which 
> [passes validation from the 
> SolrIdentifierValidation|https://github.com/apache/solr/blob/main/solr/solrj/src/java/org/apache/solr/client/solrj/util/SolrIdentifierValidator.java#L50-L52]
>  (a regex which among other elements includes the '.' character), but that 
> same collection name won't then pass validation when deployed/undeployed via 
> the PackageTool because of the [packagemanager.PackageUtils 
> validateCollection() 
> method|https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/packagemanager/PackageUtils.java#L271].
> A change [like this, using the existing 
> SolrIdentifierValidator|https://github.com/apache/solr/commit/638fd768ebd7ed7908029ced08e56bed05a4a2a5]
>  would bring the two validation steps back in line, although there's 
> presumably a better approach.
> *Potential risks*
> As highlighted by Gus Heck [in this 
> thread|https://lists.apache.org/thread/h7hnksgqwxxl7nkwkhn01r6jn8xjkjjs] 
> changing the validation of collection names could be a risky change to make. 
> The source of the PackageUtils regex appears to be 
> [https://github.com/apache/lucene-solr/pull/994] from before Solr split from 
> the Lucene project, and it seems that the regex wasn't crafted for a specific 
> subset of use cases that specifically excluded the '.' character - it just 
> appears to be the regex implemented at the time.
> Using the {{SolrIdentifierValidator}} approach mentioned above as an example, 
> other than disallowing a collection name that begins with a '-' character, 
> the {{SolrIdentifierValidator.identifierPattern}} would be a strict expansion 
> of the allowed collection names for the 
> {{{}PackageUtils.validateCollections{}}}. Any other solution (such as [this 
> more naive 
> example|https://github.com/apache/solr/blame/998fffdccf51a0560589e2cb413e9da127a5f26e/solr/core/src/java/org/apache/solr/packagemanager/PackageUtils.java#L271])
>  could similarly mitigate a lot of the potential risk by only expanding the 
> allowed collection names.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to