Hi Kou, I retrieved the client secret value and the other values from the Azure portal. They are located in various locations within a given account (as everyone is likely aware). Is your question what does the code look like to generate the configurations? If so, I'll repeat those:
> configureStatus = azureOptions.ConfigureClientSecretCredential( > tenant_id, client_id, client_secret ); > > Then I see the unauthorized error, thus: > > adls_read > Parquet file read commencing... > configureStatus = OK > 1 > Parquet read error: GetToken(): error response: 401 Unauthorized > > And if I use the managed identity configuration, thus: > > configureStatus = > azureOptions.ConfigureManagedIdentityCredential( client_id ); Where those values were hard-coded in a small, stand-alone test program, thus: arrow::fs::AzureOptions azureOptions; arrow::Status configureStatus = arrow::Status::OK(); azureOptions.account_name = "ecmtest"; std::string client_id = "snip"; std::string client_secret = "snip"; std::string tenant_id = "snip"; std::string account_key = "snip"; With the actual values being snipped for security reasons. Again, I took them from the Azure portal for our testing account. As mentioned, if I attempt the client secret configuration and then attempt to use the filesystem object that results from that attempt, I get a 401 error, thus: > Parquet read error: GetToken(): error response: 401 Unauthorized And if I try again to use that object, I am told the following: ERROR: Authentication failure connecting to Azure services. ERROR: To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code FZH3GDOAY to authenticate. So this reminds me of the process that one does manually in Azure where you have to navigate to a URL to get a token, then use that token. Is that what is occurring here? I thought that this would be handled "under the covers" so that whatever token is returned and needed would be retrieved and stored within the filesystem object. And thus available to use whenever needed. But from my perspective, it would be a black box. Instead, I see the 401 error, which is unexpected. I had expected for all of this to "just work" but it hasn't, with the exception of the account key configuration. So I must determine where I am going wrong when attempting to do this stuff, for our customers will want to authenticate via the client secret configuration, at least most of the time. And have you heard of the term "Device Code Flow"? Apparently that is an Azure phenomenon, but I was unfamiliar with it until a co-worker mentioned that most customers will do this when using our product. Is device code flow somehow supported in the Azure filesystem object in the Arrow library? The managed identity configuration simply hangs when I attempt configure and use that authentication approach. I also thought that this would "just work". Note that I am not using any environment variables, as I am not allowed to do so. Therefore, I must specify them manually via the Arrow Azure filesystem object's API. Any feedback is appreciated. Thanks! Jerry -----Original Message----- From: Sutou Kouhei <k...@clear-code.com> Sent: Thursday, July 11, 2024 2:56 AM To: user@arrow.apache.org Subject: Re: Using the new Azure filesystem object (C++) EXTERNAL Hi, Could you share how did you generate values for the client secret configuration and the managed identity configuration? I'll try them. Thanks, -- kou In <dm3pr05mb1054325d88f8a46fd0b169c92f3...@dm3pr05mb10543.namprd05.prod.outlook.com> "RE: Using the new Azure filesystem object (C++)" on Thu, 11 Jul 2024 06:37:42 +0000, "Jerry Adair via user" <user@arrow.apache.org> wrote: > Hi Kou! > > Well, I thought it was strange too. I was not aware that if data lake > storage is available then AzureFS will use it automatically. Thank you for > that information, it helps. With that in mind, I commented out both of those > lines and just let the default values be assigned (which occurs in azurefs.h). > > With that modification, if I attempt an account key configuration, thus: > > configureStatus = azureOptions.ConfigureAccountKeyCredential( > account_key ); > > Then it works! I can read the Parquet file via the methods in the Parquet > library! > > However if I use the client secret configuration, thus: > > configureStatus = azureOptions.ConfigureClientSecretCredential( > tenant_id, client_id, client_secret ); > > Then I see the unauthorized error, thus: > > adls_read > Parquet file read commencing... > configureStatus = OK > 1 > Parquet read error: GetToken(): error response: 401 Unauthorized > > And if I use the managed identity configuration, thus: > > configureStatus = > azureOptions.ConfigureManagedIdentityCredential( client_id ); > > Then I see the hang, thus: > > adls_read > Parquet file read commencing... > configureStatus = OK > 1 > ^C > > So I dunno about those configuration attempts. I have double-checked the > values via the Azure portal that we use and those values are correct. So > perhaps there is some other type of limitation that is being imposed here? > I'd like to offer the user different means of authenticating to get their > credentials, ergo they could use client secret or account key or managed > identity, etc. However at the moment only account key is working. I'll > continue to see what I can figure out. If you've seen this type of > phenomenon in the past and recognize the error that is at-play, I'd > appreciate any feedback. > > Thanks! > Jerry > > > -----Original Message----- > From: Sutou Kouhei <k...@clear-code.com> > Sent: Wednesday, July 10, 2024 4:34 PM > To: user@arrow.apache.org > Subject: Re: Using the new Azure filesystem object (C++) > > EXTERNAL > > Hi, > >> azureOptions.blob_storage_authority = ".dfs.core.windows.net"; // If I >> don't do this, then the >> // >> blob.core.windows.net is used; >> >> // I want dfs not blob, so... not certain >> >> // why that happens either > > This is strange. In general, you should not do this. > AzureFS uses both of blob storage API and data lake storage API. If data lake > storage API is available, AzureFS uses it automatically. So you should not > change blob_storage_authority. > > If you don't have this line, what was happen? > > > Thanks, > -- > kou > > In > > <dm3pr05mb1054334eeaeae4a95805de322f3...@dm3pr05mb10543.namprd05.prod.outlook.com> > "Using the new Azure filesystem object (C++)" on Wed, 10 Jul 2024 16:58:52 > +0000, > "Jerry Adair via user" <user@arrow.apache.org> wrote: > >> Hi- >> >> I am attempting to use the new Azure filesystem object in C++. >> Arrow/Parquet version 16.0.0. I already have code that works for GCS and >> AWS/S3. I have been waiting for quite a while to see the new Azure >> filesystem object released. Now that it has in this version (16.0.0) I have >> been trying to use it. Without success. I presumed that it would work in >> the same manner in which the GCS and S3/AWS filesystem objects work. You >> create the object, then you can use it in the same manner that you used the >> other filesystem objects. Note that I am not using Arrow methods to >> read/write the data but rather the Parquet methods. This works for local, >> GCS and S3/AWS. However I cannot open a file on Azure. It seems like no >> matter which authentication method I try to use, it doesn't work. And I get >> different results depending on which auth approach I take (client secret >> versus account key, etc.). Here is a code summary of what I am trying to do: >> >> arrow::fs::AzureOptions azureOptions; >> arrow::Status configureStatus = arrow::Status::OK(); >> >> // exact values obfuscated >> azureOptions.account_name = "mytest"; >> azureOptions.dfs_storage_authority = ".dfs.core.windows.net"; >> azureOptions.blob_storage_authority = ".dfs.core.windows.net"; // If I >> don't do this, then the >> // >> blob.core.windows.net is used; >> // I >> want dfs not blob, so... not certain >> // why >> that happens either >> std::string client_id = "3f061894-blah"; >> std::string client_secret = "2c796e9eblah"; >> std::string tenant_id = "b1c14d5c-blah"; >> //std::string account_key = "flMhWgNts+i/blah=="; >> >> >> //configureStatus = azureOptions.ConfigureAccountKeyCredential( >> account_key ); >> configureStatus = azureOptions.ConfigureClientSecretCredential( >> tenant_id, client_id, client_secret ); >> //configureStatus = azureOptions.ConfigureManagedIdentityCredential( >> client_id ); >> if( false == configureStatus.ok() ) >> { >> // Uh-oh, throw >> >> } >> >> std::shared_ptr<arrow::fs::AzureFileSystem> azureFileSystem; >> arrow::Result<std::shared_ptr<arrow::fs::AzureFileSystem>> >> azureFileSystemResult = arrow::fs::AzureFileSystem::Make( azureOptions ); >> if( true == azureFileSystemResult.ok() ) >> { >> azureFileSystem = azureFileSystemResult.ValueOrDie(); >> >> } >> else >> { >> // Uh-oh, throw >> >> } >> >> //const std::string path( "parquet/ParquetFiles/plain.parquet" ); >> const std::string path( "parquet/ParquetFiles/plain.parquet" ); >> std::shared_ptr<arrow::io::RandomAccessFile> arrowFile; >> std::cout << "1\n"; >> arrow::Result<std::shared_ptr<arrow::io::RandomAccessFile>> >> openResult = azureFileSystem->OpenInputFile( path ); std::cout << >> "2\n"; >> >> And that is where things run off the rails. At this point, all I want to do >> is open the input file, create a Parquet file reader like so: >> >> std::unique_ptr<parquet::ParquetFileReader> parquet_reader = >> parquet::ParquetFileReader::Open( arrowFile ); >> >> Then go about my business of reading/writing Parquet data as per normal. >> Ergo, just as I do for the other filesystem objects. But the >> OpenInputFile() method fails for the Azure use case scenario. If I attempt >> the account key configuration, then the error I see is: >> >> adls_read >> Parquet file read commencing... >> 1 >> Parquet read error: map::at >> >> Where the "1" is just a marker to show how far I got in the process of >> reading a pre-existing Parquet file from the Azure server. Ergo, a low-brow >> means of debugging. The cout is shown above. I don't get to "2", obviously. >> >> When attempting the client secret credential auth, I see the following >> failure: >> >> adls_read >> Parquet file read commencing... >> 1 >> Parquet read error: GetToken(): error response: 401 Unauthorized >> >> Then when attempting the Managed Identity auth configuration, I get the >> following: >> >> adls_read >> Parquet file read commencing... >> 1 >> ^C >> >> Where the process just hangs and I have to interrupt out of it. Note that I >> didn't have this level of difficulty when I implemented our support for GCS >> and S3/AWS. Those were relatively straightforward. Azure however has been >> more difficult; this should just work. I mean, you create the filesystem >> object, then you are supposed to be able to use it in the same manner that >> you use any other Arrow filesystem object. However I can't open a file and >> I suspect it is due to some type of handshaking issue with Azure. Azure has >> all of these moving parts; tenant ID, application/client ID, client secret, >> object ID (which we don't use in this case) and the list goes on. Finally, >> I saw this in the azurefs.h header at line 102: >> >> // TODO(GH-38598): Add support for more auth methods. >> // std::string connection_string; >> // std::string sas_token; >> >> But it seemed clear to me that this was referring to other auth methods than >> those that have been implemented thus far (ergo client secret, account key, >> etc.). Am I correct? >> >> So my questions are: >> >> 1. Any ideas where I am going wrong here? >> 2. Has anyone else used the Azure filesystem object? >> 3. Has it worked for you? >> 4. If so, what was your approach? >> >> Note that I did peruse the azurefs_test.cc for examples. I did see various >> approaches. One involved invoking the MakeDataLakeServiceClient() method. >> It wasn't clear if I needed to do that or not, but then I saw that this is >> done during the private implementation of the AzureFileSystem's Make() >> method, thus: >> >> static Result<std::unique_ptr<AzureFileSystem::Impl>> Make(AzureOptions >> options, >> io::IOContext >> io_context) { >> auto self = std::unique_ptr<AzureFileSystem::Impl>( >> new AzureFileSystem::Impl(std::move(options), >> std::move(io_context))); >> ARROW_ASSIGN_OR_RAISE(self->blob_service_client_, >> self->options_.MakeBlobServiceClient()); >> ARROW_ASSIGN_OR_RAISE(self->datalake_service_client_, >> self->options_.MakeDataLakeServiceClient()); >> return self; >> } >> >> So it seemed like I wouldn't need to do it separately. >> >> Anyway, I need to get this working ASAP, so I am open to feedback. I'll >> continue plugging away. >> >> Thanks! >> Jerry