Re: [PR] Whitelisting Onelake API & Workspace PL FQDNs [arrow-rs-object-store]

2026-02-05 Thread via GitHub


SmritiAgrawal04 closed pull request #552: Whitelisting Onelake API & Workspace 
PL FQDNs
URL: https://github.com/apache/arrow-rs-object-store/pull/552


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Whitelisting Onelake API & Workspace PL FQDNs [arrow-rs-object-store]

2026-01-19 Thread via GitHub


kevinjqliu commented on PR #552:
URL: 
https://github.com/apache/arrow-rs-object-store/pull/552#issuecomment-3769562668

   Hey folks, thanks for taking the time to review this PR. 
   I work on OneLake. @SmritiAgrawal04 and I will sync offline to clean up a PR 
before requesting another round of review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Whitelisting Onelake API & Workspace PL FQDNs [arrow-rs-object-store]

2026-01-18 Thread via GitHub


SmritiAgrawal04 commented on code in PR #552:
URL: 
https://github.com/apache/arrow-rs-object-store/pull/552#discussion_r2703289932


##
src/azure/builder.rs:
##
@@ -671,36 +673,86 @@ impl MicrosoftAzureBuilder {
 self.container_name = Some(validate(parsed.username())?);
 self.account_name = Some(validate(a)?);
 self.use_fabric_endpoint = true.into();
+} else if let Some(a) = 
host.strip_suffix(".blob.core.windows.net") {
+self.container_name = Some(validate(parsed.username())?);
+self.account_name = Some(validate(a)?);
+} else if let Some(a) = 
host.strip_suffix(".blob.fabric.microsoft.com") {
+self.container_name = Some(validate(parsed.username())?);
+self.account_name = Some(validate(a)?);
+self.use_fabric_endpoint = true.into();
+} else if let Some(a) = 
host.strip_suffix("-api.onelake.fabric.microsoft.com") {

Review Comment:
   Yeah it is a valid URL- added in the 
[documentation](https://learn.microsoft.com/en-us/fabric/onelake/onelake-access-api#additional-onelake-endpoints).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Whitelisting Onelake API & Workspace PL FQDNs [arrow-rs-object-store]

2026-01-18 Thread via GitHub


SmritiAgrawal04 commented on code in PR #552:
URL: 
https://github.com/apache/arrow-rs-object-store/pull/552#discussion_r2703287763


##
Cargo.toml:
##
@@ -40,6 +40,7 @@ humantime = "2.1"
 itertools = "0.14.0"
 parking_lot = { version = "0.12" }
 percent-encoding = "2.1"
+regex = "1.11.1"

Review Comment:
   Addressed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Whitelisting Onelake API & Workspace PL FQDNs [arrow-rs-object-store]

2026-01-18 Thread via GitHub


SmritiAgrawal04 commented on code in PR #552:
URL: 
https://github.com/apache/arrow-rs-object-store/pull/552#discussion_r2703280014


##
src/azure/builder.rs:
##
@@ -664,48 +666,99 @@ impl MicrosoftAzureBuilder {
 // or the convention for the hadoop driver 
abfs[s]://@.dfs.core.windows.net/
 if parsed.username().is_empty() {
 self.container_name = Some(validate(host)?);
+} else if let Some(a) = 
host.strip_suffix(".dfs.core.windows.net") {
+self.container_name = Some(validate(parsed.username())?);
+self.account_name = Some(validate(a)?);
+} else if let Some(a) = 
host.strip_suffix(".dfs.fabric.microsoft.com") {
+self.container_name = Some(validate(parsed.username())?);
+self.account_name = Some(validate(a)?);
+self.use_fabric_endpoint = true.into();
+} else if let Some(a) = 
host.strip_suffix(".blob.core.windows.net") {
+self.container_name = Some(validate(parsed.username())?);
+self.account_name = Some(validate(a)?);
+} else if let Some(a) = 
host.strip_suffix(".blob.fabric.microsoft.com") {
+self.container_name = Some(validate(parsed.username())?);
+self.account_name = Some(validate(a)?);
+self.use_fabric_endpoint = true.into();
+} else if let Some(a) = 
host.strip_suffix("-api.onelake.fabric.microsoft.com") {
+self.container_name = Some(validate(parsed.username())?);
+self.account_name = Some(validate(a)?);
+self.use_fabric_endpoint = true.into();
 } else {
-match host.split_once('.') {
-Some((a, "dfs.core.windows.net")) | Some((a, 
"blob.core.windows.net")) => {
-self.account_name = Some(validate(a)?);
-self.container_name = 
Some(validate(parsed.username())?);
-}
-Some((a, "dfs.fabric.microsoft.com")) | Some((a, 
"blob.fabric.microsoft.net")) => {
-self.account_name = Some(validate(a)?);
-self.container_name = 
Some(validate(parsed.username())?);
-self.use_fabric_endpoint = true.into();
-}
-_ => return Err(Error::UrlNotRecognised { url: 
url.into() }.into())
-
-}
+return Err(Error::UrlNotRecognised { url: url.into() 
}.into());
 }
 }
-"https" => match host.split_once('.') {
-Some((a, "dfs.core.windows.net")) | Some((a, 
"blob.core.windows.net")) => {
-self.account_name = Some(validate(a)?);
-let container = 
parsed.path_segments().unwrap().next().expect(
-"iterator always contains at least one string (which 
may be empty)",
-);
-if !container.is_empty() {
-self.container_name = Some(validate(container)?);
+"https" => {
+// Regex to match WS-PL FQDN:
+// "{workspaceid}.z??.(onelake|dfs|blob).fabric.microsoft.com"
+static WS_PL_REGEX: OnceLock = OnceLock::new();
+
+let ws_pl_regex = WS_PL_REGEX.get_or_init(|| {
+Regex::new(
+
r"(?i)^(?P[0-9a-f]{32})\.z(?P[0-9a-f]{2})\.(onelake|dfs|blob)\.fabric\.microsoft\.com$"

Review Comment:
   1. Workspace private link FQDNs follow a pattern unlike other FQDNs that had 
been whitelisted earlier. WS‑PL hostnames have a very specific, structured 
format with invariants that we want to validate, not merely parse. If you 
verify the FQDN syntax [here 
](https://learn.microsoft.com/en-us/fabric/security/security-workspace-level-private-links-overview#connecting-to-workspaces),
 you'd note that a simple split('.') can tell “there are N labels” and extract 
strings, but it doesn’t naturally enforce:
   
   * wsid is exactly 32 hex chars,
   * xy is exactly 2 hex chars,
   * and xy == wsid[0..2].
   
   2. Added the test cases for invalid ws-pl fqdn.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Whitelisting Onelake API & Workspace PL FQDNs [arrow-rs-object-store]

2026-01-17 Thread via GitHub


alamb commented on PR #552:
URL: 
https://github.com/apache/arrow-rs-object-store/pull/552#issuecomment-3764055824

   > I have addressed all comments. About, finding 
https://github.com/apache/arrow-rs-object-store/issues/1 & 
https://github.com/apache/arrow-rs-object-store/pull/3, we plan to add it to 
the public documentation, the PR for which is already out. I request to approve 
these changes meanwhile. Thanks
   
   Some comments remain unaddressed. In terms of documentation, please include 
it in this OR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Whitelisting Onelake API & Workspace PL FQDNs [arrow-rs-object-store]

2026-01-17 Thread via GitHub


alamb commented on code in PR #552:
URL: 
https://github.com/apache/arrow-rs-object-store/pull/552#discussion_r2701236996


##
Cargo.toml:
##
@@ -40,6 +40,7 @@ humantime = "2.1"
 itertools = "0.14.0"
 parking_lot = { version = "0.12" }
 percent-encoding = "2.1"
+regex = "1.11.1"

Review Comment:
   Since this is only used for azure, in order to keep the dependency chain 
minimal, can we please make this an optional dependency, following for example 
`base64`? 
   
   And then activate like this?
   ```
   azure = ["cloud", "httparse"]
   ```



##
src/azure/builder.rs:
##
@@ -671,36 +673,86 @@ impl MicrosoftAzureBuilder {
 self.container_name = Some(validate(parsed.username())?);
 self.account_name = Some(validate(a)?);
 self.use_fabric_endpoint = true.into();
+} else if let Some(a) = 
host.strip_suffix(".blob.core.windows.net") {
+self.container_name = Some(validate(parsed.username())?);
+self.account_name = Some(validate(a)?);
+} else if let Some(a) = 
host.strip_suffix(".blob.fabric.microsoft.com") {
+self.container_name = Some(validate(parsed.username())?);
+self.account_name = Some(validate(a)?);
+self.use_fabric_endpoint = true.into();
+} else if let Some(a) = 
host.strip_suffix("-api.onelake.fabric.microsoft.com") {

Review Comment:
   This comment is still outstanding I think. Is this a vaild url?



##
src/azure/builder.rs:
##
@@ -1211,6 +1264,38 @@ mod tests {
 assert_eq!(builder.container_name.as_deref(), Some("container"));
 assert!(builder.use_fabric_endpoint.get().unwrap());
 
+let mut builder = MicrosoftAzureBuilder::new();
+builder
+
.parse_url("https://Ab00.zAb.dfs.fabric.microsoft.com/";)
+.unwrap();
+assert_eq!(builder.account_name, 
Some("ab00.zab".to_string()));

Review Comment:
   is the account name really supposed to have the 
`ab00` in it? It seems confusing that the container 
name is also `ab00`



##
src/azure/builder.rs:
##
@@ -664,48 +666,99 @@ impl MicrosoftAzureBuilder {
 // or the convention for the hadoop driver 
abfs[s]://@.dfs.core.windows.net/
 if parsed.username().is_empty() {
 self.container_name = Some(validate(host)?);
+} else if let Some(a) = 
host.strip_suffix(".dfs.core.windows.net") {
+self.container_name = Some(validate(parsed.username())?);
+self.account_name = Some(validate(a)?);
+} else if let Some(a) = 
host.strip_suffix(".dfs.fabric.microsoft.com") {
+self.container_name = Some(validate(parsed.username())?);
+self.account_name = Some(validate(a)?);
+self.use_fabric_endpoint = true.into();
+} else if let Some(a) = 
host.strip_suffix(".blob.core.windows.net") {
+self.container_name = Some(validate(parsed.username())?);
+self.account_name = Some(validate(a)?);
+} else if let Some(a) = 
host.strip_suffix(".blob.fabric.microsoft.com") {
+self.container_name = Some(validate(parsed.username())?);
+self.account_name = Some(validate(a)?);
+self.use_fabric_endpoint = true.into();
+} else if let Some(a) = 
host.strip_suffix("-api.onelake.fabric.microsoft.com") {
+self.container_name = Some(validate(parsed.username())?);
+self.account_name = Some(validate(a)?);
+self.use_fabric_endpoint = true.into();
 } else {
-match host.split_once('.') {
-Some((a, "dfs.core.windows.net")) | Some((a, 
"blob.core.windows.net")) => {
-self.account_name = Some(validate(a)?);
-self.container_name = 
Some(validate(parsed.username())?);
-}
-Some((a, "dfs.fabric.microsoft.com")) | Some((a, 
"blob.fabric.microsoft.net")) => {
-self.account_name = Some(validate(a)?);
-self.container_name = 
Some(validate(parsed.username())?);
-self.use_fabric_endpoint = true.into();
-}
-_ => return Err(Error::UrlNotRecognised { url: 
url.into() }.into())
-
-}
+return Err(Error::UrlNotRecognised { url: url.into() 
}.into());
 }
 }
-"https" => match host.split_once('.') {
-Some((a, "dfs.core.windows.

Re: [PR] Whitelisting Onelake API & Workspace PL FQDNs [arrow-rs-object-store]

2026-01-15 Thread via GitHub


SmritiAgrawal04 commented on PR #552:
URL: 
https://github.com/apache/arrow-rs-object-store/pull/552#issuecomment-3758297354

   > Thanks @SmritiAgrawal04 -- i left some more questions
   > 
   > BTW I asked codex for a review of this PR like this
   > 
   > ```
   > please review this PR (checked out in this directory): 
https://github.com/apache/arrow-rs-object-store/pull/552
   > 
   >   Specifically ensure that the format of the URLs supported matches 
publically published microsoft urls and provide documentation references
   > ```
   > 
   > And this was the output which I found helpful
   > 
   > 
───
   > 
   > • Findings
   > 
   > * High: -api.onelake.fabric.microsoft.com is accepted for both abfs(s) and 
https, but I can’t find this FQDN in
   >   the public OneLake URL docs. The published endpoints are 
onelake.dfs.fabric.microsoft.com and workspace-scoped
   >   https://{workspaceId}.z{xy}.dfs.fabric.microsoft.com/... (see 
https://learn.microsoft.com/en-us/fabric/onelake/
   >   onelake-access-api). If -api.onelake is internal or not publicly 
documented, this should be gated or removed.
   > * Medium: The WS‑PL regex no longer verifies that z{xy} matches the first 
two hex chars of the workspace id (src/
   >   azure/builder.rs:692-708). The published WS‑PL format is https://
   >   {workspaceId}.z{first2}.dfs.fabric.microsoft.com (same doc). Without the 
check, non‑published FQDNs are
   >   accepted.
   > * Medium: WS‑PL regex allows .onelake.fabric.microsoft.com 
(src/azure/builder.rs:692-699). The public docs list
   >   dfs (and blob endpoints for blob access), but I don’t see onelake as a 
WS‑PL subdomain. Please confirm with a
   >   Microsoft reference or remove to match published formats.
   > * Low: WS‑PL regex only allows lowercase hex ([0-9a-f]{32}) 
(src/azure/builder.rs:697). If Microsoft treats
   >   workspace IDs as case‑insensitive GUIDs in URLs, this will reject valid 
uppercase forms; consider documenting
   >   that only lowercase is supported or make the regex case‑insensitive.
   > 
   > Doc references
   > 
   > * OneLake access API URL formats: 
https://learn.microsoft.com/en-us/fabric/onelake/onelake-access-api
   > * Private Link / workspace-specific OneLake endpoints (if this PR targets 
WS‑PL): please confirm the exact doc
   >   section that defines the workspaceId.z{xy}.dfs.fabric.microsoft.com 
pattern and whether any onelake WS‑PL host
   >   is documented (I could not find it in public docs).
   > 
   > Questions / assumptions
   > 
   > * Is *-api.onelake.fabric.microsoft.com a publicly documented endpoint? If 
yes, can you point to the Microsoft
   >   doc so we can cite it in code/tests?
   > * Should WS‑PL accept only dfs/blob subdomains, or is 
.onelake.fabric.microsoft.com explicitly published?
   
   Hi @alamb,
   
   I have addressed all comments. About, finding #1 & #3, we plan to add it to 
the public documentation, the PR for which is already out. I request to approve 
these changes meanwhile. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Whitelisting Onelake API & Workspace PL FQDNs [arrow-rs-object-store]

2026-01-14 Thread via GitHub


tustvold commented on PR #552:
URL: 
https://github.com/apache/arrow-rs-object-store/pull/552#issuecomment-3751629051

   FYI https://github.com/apache/arrow-rs-object-store/pull/604 may be related.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Whitelisting Onelake API & Workspace PL FQDNs [arrow-rs-object-store]

2026-01-13 Thread via GitHub


SmritiAgrawal04 commented on code in PR #552:
URL: 
https://github.com/apache/arrow-rs-object-store/pull/552#discussion_r2685583360


##
src/azure/builder.rs:
##
@@ -671,36 +673,86 @@ impl MicrosoftAzureBuilder {
 self.container_name = Some(validate(parsed.username())?);
 self.account_name = Some(validate(a)?);
 self.use_fabric_endpoint = true.into();
+} else if let Some(a) = 
host.strip_suffix(".blob.core.windows.net") {
+self.container_name = Some(validate(parsed.username())?);
+self.account_name = Some(validate(a)?);
+} else if let Some(a) = 
host.strip_suffix(".blob.fabric.microsoft.com") {
+self.container_name = Some(validate(parsed.username())?);
+self.account_name = Some(validate(a)?);
+self.use_fabric_endpoint = true.into();
+} else if let Some(a) = 
host.strip_suffix("-api.onelake.fabric.microsoft.com") {
+self.container_name = Some(validate(parsed.username())?);
+self.account_name = Some(validate(a)?);
+self.use_fabric_endpoint = true.into();
 } else {
 return Err(Error::UrlNotRecognised { url: url.into() 
}.into());
 }
 }
-"https" => match host.split_once('.') {
-Some((a, "dfs.core.windows.net")) | Some((a, 
"blob.core.windows.net")) => {
-self.account_name = Some(validate(a)?);
-let container = 
parsed.path_segments().unwrap().next().expect(
-"iterator always contains at least one string (which 
may be empty)",
-);
-if !container.is_empty() {
-self.container_name = Some(validate(container)?);
-}
+"https" => {
+// Regex to match WS-PL FQDN:
+// "{workspaceid}.z??.(onelake|dfs|blob).fabric.microsoft.com"

Review Comment:
   Added for WS-PL DFS/ Blob endpoints. We are waiting for PM to confirm on 
ABFSS & WS-PL onelake domains.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Whitelisting Onelake API & Workspace PL FQDNs [arrow-rs-object-store]

2026-01-13 Thread via GitHub


SmritiAgrawal04 commented on PR #552:
URL: 
https://github.com/apache/arrow-rs-object-store/pull/552#issuecomment-3742964508

   > Thanks @SmritiAgrawal04 -- i left some more questions
   > 
   > BTW I asked codex for a review of this PR like this
   > 
   > ```
   > please review this PR (checked out in this directory): 
https://github.com/apache/arrow-rs-object-store/pull/552
   > 
   >   Specifically ensure that the format of the URLs supported matches 
publically published microsoft urls and provide documentation references
   > ```
   > 
   > And this was the output which I found helpful
   > 
   > 
───
   > 
   > • Findings
   > 
   > * High: -api.onelake.fabric.microsoft.com is accepted for both abfs(s) and 
https, but I can’t find this FQDN in
   >   the public OneLake URL docs. The published endpoints are 
onelake.dfs.fabric.microsoft.com and workspace-scoped
   >   https://{workspaceId}.z{xy}.dfs.fabric.microsoft.com/... (see 
https://learn.microsoft.com/en-us/fabric/onelake/
   >   onelake-access-api). If -api.onelake is internal or not publicly 
documented, this should be gated or removed.
   > * Medium: The WS‑PL regex no longer verifies that z{xy} matches the first 
two hex chars of the workspace id (src/
   >   azure/builder.rs:692-708). The published WS‑PL format is https://
   >   {workspaceId}.z{first2}.dfs.fabric.microsoft.com (same doc). Without the 
check, non‑published FQDNs are
   >   accepted.
   > * Medium: WS‑PL regex allows .onelake.fabric.microsoft.com 
(src/azure/builder.rs:692-699). The public docs list
   >   dfs (and blob endpoints for blob access), but I don’t see onelake as a 
WS‑PL subdomain. Please confirm with a
   >   Microsoft reference or remove to match published formats.
   > * Low: WS‑PL regex only allows lowercase hex ([0-9a-f]{32}) 
(src/azure/builder.rs:697). If Microsoft treats
   >   workspace IDs as case‑insensitive GUIDs in URLs, this will reject valid 
uppercase forms; consider documenting
   >   that only lowercase is supported or make the regex case‑insensitive.
   > 
   > Doc references
   > 
   > * OneLake access API URL formats: 
https://learn.microsoft.com/en-us/fabric/onelake/onelake-access-api
   > * Private Link / workspace-specific OneLake endpoints (if this PR targets 
WS‑PL): please confirm the exact doc
   >   section that defines the workspaceId.z{xy}.dfs.fabric.microsoft.com 
pattern and whether any onelake WS‑PL host
   >   is documented (I could not find it in public docs).
   > 
   > Questions / assumptions
   > 
   > * Is *-api.onelake.fabric.microsoft.com a publicly documented endpoint? If 
yes, can you point to the Microsoft
   >   doc so we can cite it in code/tests?
   > * Should WS‑PL accept only dfs/blob subdomains, or is 
.onelake.fabric.microsoft.com explicitly published?
   
   
   
   Hi @


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Whitelisting Onelake API & Workspace PL FQDNs [arrow-rs-object-store]

2026-01-13 Thread via GitHub


SmritiAgrawal04 closed pull request #552: Whitelisting Onelake API & Workspace 
PL FQDNs
URL: https://github.com/apache/arrow-rs-object-store/pull/552


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Whitelisting Onelake API & Workspace PL FQDNs [arrow-rs-object-store]

2026-01-12 Thread via GitHub


alamb commented on code in PR #552:
URL: 
https://github.com/apache/arrow-rs-object-store/pull/552#discussion_r2683053095


##
src/azure/builder.rs:
##
@@ -671,36 +673,86 @@ impl MicrosoftAzureBuilder {
 self.container_name = Some(validate(parsed.username())?);
 self.account_name = Some(validate(a)?);
 self.use_fabric_endpoint = true.into();
+} else if let Some(a) = 
host.strip_suffix(".blob.core.windows.net") {
+self.container_name = Some(validate(parsed.username())?);
+self.account_name = Some(validate(a)?);
+} else if let Some(a) = 
host.strip_suffix(".blob.fabric.microsoft.com") {
+self.container_name = Some(validate(parsed.username())?);
+self.account_name = Some(validate(a)?);
+self.use_fabric_endpoint = true.into();
+} else if let Some(a) = 
host.strip_suffix("-api.onelake.fabric.microsoft.com") {
+self.container_name = Some(validate(parsed.username())?);
+self.account_name = Some(validate(a)?);
+self.use_fabric_endpoint = true.into();
 } else {
 return Err(Error::UrlNotRecognised { url: url.into() 
}.into());
 }
 }
-"https" => match host.split_once('.') {
-Some((a, "dfs.core.windows.net")) | Some((a, 
"blob.core.windows.net")) => {
-self.account_name = Some(validate(a)?);
-let container = 
parsed.path_segments().unwrap().next().expect(
-"iterator always contains at least one string (which 
may be empty)",
-);
-if !container.is_empty() {
-self.container_name = Some(validate(container)?);
-}
+"https" => {
+// Regex to match WS-PL FQDN:
+// "{workspaceid}.z??.(onelake|dfs|blob).fabric.microsoft.com"

Review Comment:
   Can you please also add an example URL for each of the APIs you are adding 
support for?



##
src/azure/builder.rs:
##
@@ -671,36 +673,86 @@ impl MicrosoftAzureBuilder {
 self.container_name = Some(validate(parsed.username())?);
 self.account_name = Some(validate(a)?);
 self.use_fabric_endpoint = true.into();
+} else if let Some(a) = 
host.strip_suffix(".blob.core.windows.net") {
+self.container_name = Some(validate(parsed.username())?);
+self.account_name = Some(validate(a)?);
+} else if let Some(a) = 
host.strip_suffix(".blob.fabric.microsoft.com") {
+self.container_name = Some(validate(parsed.username())?);
+self.account_name = Some(validate(a)?);
+self.use_fabric_endpoint = true.into();
+} else if let Some(a) = 
host.strip_suffix("-api.onelake.fabric.microsoft.com") {

Review Comment:
 - Is *-api.onelake.fabric.microsoft.com a publicly documented endpoint? If 
yes, can you point to the Microsoft doc so we can cite it in code/tests?
   
   I don't see it in 
https://learn.microsoft.com/en-us/fabric/onelake/onelake-access-api



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Whitelisting Onelake API & Workspace PL FQDNs [arrow-rs-object-store]

2026-01-12 Thread via GitHub


alamb commented on code in PR #552:
URL: 
https://github.com/apache/arrow-rs-object-store/pull/552#discussion_r2683050830


##
src/azure/builder.rs:
##
@@ -671,36 +673,86 @@ impl MicrosoftAzureBuilder {
 self.container_name = Some(validate(parsed.username())?);
 self.account_name = Some(validate(a)?);
 self.use_fabric_endpoint = true.into();
+} else if let Some(a) = 
host.strip_suffix(".blob.core.windows.net") {
+self.container_name = Some(validate(parsed.username())?);
+self.account_name = Some(validate(a)?);
+} else if let Some(a) = 
host.strip_suffix(".blob.fabric.microsoft.com") {
+self.container_name = Some(validate(parsed.username())?);
+self.account_name = Some(validate(a)?);
+self.use_fabric_endpoint = true.into();
+} else if let Some(a) = 
host.strip_suffix("-api.onelake.fabric.microsoft.com") {
+self.container_name = Some(validate(parsed.username())?);
+self.account_name = Some(validate(a)?);
+self.use_fabric_endpoint = true.into();
 } else {
 return Err(Error::UrlNotRecognised { url: url.into() 
}.into());
 }
 }
-"https" => match host.split_once('.') {
-Some((a, "dfs.core.windows.net")) | Some((a, 
"blob.core.windows.net")) => {
-self.account_name = Some(validate(a)?);
-let container = 
parsed.path_segments().unwrap().next().expect(
-"iterator always contains at least one string (which 
may be empty)",
-);
-if !container.is_empty() {
-self.container_name = Some(validate(container)?);
-}
+"https" => {
+// Regex to match WS-PL FQDN:
+// "{workspaceid}.z??.(onelake|dfs|blob).fabric.microsoft.com"
+static WS_PL_REGEX: OnceLock = OnceLock::new();
+let ws_pl_regex = WS_PL_REGEX.get_or_init(|| {
+Regex::new(
+
r"^(?P[0-9a-f]{32})\.z(?P[0-9a-f]{2})\.(onelake|dfs|blob)\.fabric\.microsoft\.com$"
+).unwrap()
+});
+
+// WS-PL Fabric endpoint
+if let Some(captures) = ws_pl_regex.captures(host) {
+let workspaceid = 
captures.name("workspaceid").unwrap().as_str();
+let xy = captures.name("xy").unwrap().as_str();
+
+self.account_name = Some(format!("{workspaceid}.z{xy}"));
+self.container_name = Some(validate(workspaceid)?);
+self.use_fabric_endpoint = true.into();
+return Ok(());
 }
-Some((a, "dfs.fabric.microsoft.com")) | Some((a, 
"blob.fabric.microsoft.com")) => {
-self.account_name = Some(validate(a)?);
-// Attempt to infer the container name from the URL

Review Comment:
   why remove this comment? It seems helpful



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Whitelisting Onelake API & Workspace PL FQDNs [arrow-rs-object-store]

2026-01-11 Thread via GitHub


SmritiAgrawal04 commented on code in PR #552:
URL: 
https://github.com/apache/arrow-rs-object-store/pull/552#discussion_r2680933572


##
src/azure/builder.rs:
##
@@ -1184,10 +1253,34 @@ mod tests {
 
 let mut builder = MicrosoftAzureBuilder::new();
 builder
-.parse_url("https://account.blob.fabric.microsoft.com/container";)
+.parse_url("https://account.blob.fabric.microsoft.com/";)

Review Comment:
   By mistake. Reverted.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



[PR] Whitelisting Onelake API & Workspace PL FQDNs [arrow-rs-object-store]

2026-01-11 Thread via GitHub


SmritiAgrawal04 opened a new pull request, #552:
URL: https://github.com/apache/arrow-rs-object-store/pull/552

   # Which issue does this PR close?
   
   
   
   Closes #.
   
   # Rationale for this change

   
   
   # What changes are included in this PR?
   
   
   
   # Are there any user-facing changes?
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Whitelisting Onelake API & Workspace PL FQDNs [arrow-rs-object-store]

2026-01-11 Thread via GitHub


SmritiAgrawal04 closed pull request #552: Whitelisting Onelake API & Workspace 
PL FQDNs
URL: https://github.com/apache/arrow-rs-object-store/pull/552


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Whitelisting Onelake API & Workspace PL FQDNs [arrow-rs-object-store]

2026-01-09 Thread via GitHub


crepererum commented on code in PR #552:
URL: 
https://github.com/apache/arrow-rs-object-store/pull/552#discussion_r2676594518


##
src/azure/builder.rs:
##
@@ -1184,10 +1253,34 @@ mod tests {
 
 let mut builder = MicrosoftAzureBuilder::new();
 builder
-.parse_url("https://account.blob.fabric.microsoft.com/container";)
+.parse_url("https://account.blob.fabric.microsoft.com/";)

Review Comment:
   why did this test case change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Whitelisting Onelake API & Workspace PL FQDNs [arrow-rs-object-store]

2026-01-08 Thread via GitHub


SmritiAgrawal04 commented on PR #552:
URL: 
https://github.com/apache/arrow-rs-object-store/pull/552#issuecomment-3727207748

   Hi @alamb & @crepererum,
   
   I added the unit tests as suggested. I request to review the PR please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] Whitelisting Onelake API & Workspace PL FQDNs [arrow-rs-object-store]

2025-12-21 Thread via GitHub


kashyap-kunal commented on code in PR #552:
URL: 
https://github.com/apache/arrow-rs-object-store/pull/552#discussion_r2638720561


##
src/azure/builder.rs:
##
@@ -675,32 +679,79 @@ impl MicrosoftAzureBuilder {
 return Err(Error::UrlNotRecognised { url: url.into() 
}.into());
 }
 }
-"https" => match host.split_once('.') {
-Some((a, "dfs.core.windows.net")) | Some((a, 
"blob.core.windows.net")) => {
-self.account_name = Some(validate(a)?);
-let container = 
parsed.path_segments().unwrap().next().expect(
+"https" => {
+const DFS_FABRIC_SUFFIX: &str = "dfs.fabric.microsoft.com";
+const BLOB_FABRIC_SUFFIX: &str = "blob.fabric.microsoft.com";
+const DFS_AZURE_SUFFIX: &str = "dfs.core.windows.net";
+const BLOB_AZURE_SUFFIX: &str = "blob.core.windows.net";
+
+// Regex to match WS-PL FQDN: 
"{workspaceid}.z??.dfs.fabric.microsoft.com"
+// workspaceid = 32 hex chars, z?? = z + first two chars of 
workspaceid
+lazy_static::lazy_static! {
+static ref WS_PL_REGEX: Regex = 
Regex::new(r"^(?P[0-9a-f]{32})\.z(?P[0-9a-f]{2})\.(dfs|blob)\.fabric\.microsoft\.com$").unwrap();

Review Comment:
   Let's add support for .onelake.fabric.microsoft.com also



##
src/azure/builder.rs:
##
@@ -675,32 +679,79 @@ impl MicrosoftAzureBuilder {
 return Err(Error::UrlNotRecognised { url: url.into() 
}.into());
 }
 }
-"https" => match host.split_once('.') {
-Some((a, "dfs.core.windows.net")) | Some((a, 
"blob.core.windows.net")) => {
-self.account_name = Some(validate(a)?);
-let container = 
parsed.path_segments().unwrap().next().expect(
+"https" => {
+const DFS_FABRIC_SUFFIX: &str = "dfs.fabric.microsoft.com";
+const BLOB_FABRIC_SUFFIX: &str = "blob.fabric.microsoft.com";
+const DFS_AZURE_SUFFIX: &str = "dfs.core.windows.net";
+const BLOB_AZURE_SUFFIX: &str = "blob.core.windows.net";
+
+// Regex to match WS-PL FQDN: 
"{workspaceid}.z??.dfs.fabric.microsoft.com"
+// workspaceid = 32 hex chars, z?? = z + first two chars of 
workspaceid
+lazy_static::lazy_static! {
+static ref WS_PL_REGEX: Regex = 
Regex::new(r"^(?P[0-9a-f]{32})\.z(?P[0-9a-f]{2})\.(dfs|blob)\.fabric\.microsoft\.com$").unwrap();
+}
+
+if let Some(captures) = WS_PL_REGEX.captures(host) {
+let workspaceid = 
captures.name("workspaceid").unwrap().as_str();
+let xy = captures.name("xy").unwrap().as_str();
+
+// Validate z?? matches first 2 chars of workspaceid
+if &workspaceid[0..2] != xy {

Review Comment:
   remove this validation



##
src/azure/builder.rs:
##
@@ -675,32 +679,79 @@ impl MicrosoftAzureBuilder {
 return Err(Error::UrlNotRecognised { url: url.into() 
}.into());
 }
 }
-"https" => match host.split_once('.') {
-Some((a, "dfs.core.windows.net")) | Some((a, 
"blob.core.windows.net")) => {
-self.account_name = Some(validate(a)?);
-let container = 
parsed.path_segments().unwrap().next().expect(
+"https" => {
+const DFS_FABRIC_SUFFIX: &str = "dfs.fabric.microsoft.com";
+const BLOB_FABRIC_SUFFIX: &str = "blob.fabric.microsoft.com";
+const DFS_AZURE_SUFFIX: &str = "dfs.core.windows.net";
+const BLOB_AZURE_SUFFIX: &str = "blob.core.windows.net";
+
+// Regex to match WS-PL FQDN: 
"{workspaceid}.z??.dfs.fabric.microsoft.com"
+// workspaceid = 32 hex chars, z?? = z + first two chars of 
workspaceid
+lazy_static::lazy_static! {
+static ref WS_PL_REGEX: Regex = 
Regex::new(r"^(?P[0-9a-f]{32})\.z(?P[0-9a-f]{2})\.(dfs|blob)\.fabric\.microsoft\.com$").unwrap();
+}
+
+if let Some(captures) = WS_PL_REGEX.captures(host) {
+let workspaceid = 
captures.name("workspaceid").unwrap().as_str();
+let xy = captures.name("xy").unwrap().as_str();
+
+// Validate z?? matches first 2 chars of workspaceid
+if &workspaceid[0..2] != xy {
+return Err(Error::UrlNotRecognised { url: url.into() 
}.into());
+}
+
+self.account_name = Some(validate(workspaceid)?);
+self.use_fabric_endpoint = true;
+
+let container = parsed
+.path_segments()
+  

[PR] Whitelisting Onelake API & Workspace PL FQDNs [arrow-rs-object-store]

2025-11-25 Thread via GitHub


SmritiAgrawal04 opened a new pull request, #552:
URL: https://github.com/apache/arrow-rs-object-store/pull/552

   # Which issue does this PR close?
   
   
   
   Closes #.
   
   # Rationale for this change

   
   
   # What changes are included in this PR?
   
   
   
   # Are there any user-facing changes?
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]