Hi, Could you share how did you generate values for the client secret configuration and the managed identity configuration? I'll try them.
Thanks, -- kou In <dm3pr05mb1054325d88f8a46fd0b169c92f3...@dm3pr05mb10543.namprd05.prod.outlook.com> "RE: Using the new Azure filesystem object (C++)" on Thu, 11 Jul 2024 06:37:42 +0000, "Jerry Adair via user" <[email protected]> wrote: > Hi Kou! > > Well, I thought it was strange too. I was not aware that if data lake > storage is available then AzureFS will use it automatically. Thank you for > that information, it helps. With that in mind, I commented out both of those > lines and just let the default values be assigned (which occurs in azurefs.h). > > With that modification, if I attempt an account key configuration, thus: > > configureStatus = azureOptions.ConfigureAccountKeyCredential( > account_key ); > > Then it works! I can read the Parquet file via the methods in the Parquet > library! > > However if I use the client secret configuration, thus: > > configureStatus = azureOptions.ConfigureClientSecretCredential( > tenant_id, client_id, client_secret ); > > Then I see the unauthorized error, thus: > > adls_read > Parquet file read commencing... > configureStatus = OK > 1 > Parquet read error: GetToken(): error response: 401 Unauthorized > > And if I use the managed identity configuration, thus: > > configureStatus = azureOptions.ConfigureManagedIdentityCredential( > client_id ); > > Then I see the hang, thus: > > adls_read > Parquet file read commencing... > configureStatus = OK > 1 > ^C > > So I dunno about those configuration attempts. I have double-checked the > values via the Azure portal that we use and those values are correct. So > perhaps there is some other type of limitation that is being imposed here? > I'd like to offer the user different means of authenticating to get their > credentials, ergo they could use client secret or account key or managed > identity, etc. However at the moment only account key is working. I'll > continue to see what I can figure out. If you've seen this type of > phenomenon in the past and recognize the error that is at-play, I'd > appreciate any feedback. > > Thanks! > Jerry > > > -----Original Message----- > From: Sutou Kouhei <[email protected]> > Sent: Wednesday, July 10, 2024 4:34 PM > To: [email protected] > Subject: Re: Using the new Azure filesystem object (C++) > > EXTERNAL > > Hi, > >> azureOptions.blob_storage_authority = ".dfs.core.windows.net"; // If I >> don't do this, then the >> // >> blob.core.windows.net is used; >> // I >> want dfs not blob, so... not certain >> >> // why that happens either > > This is strange. In general, you should not do this. > AzureFS uses both of blob storage API and data lake storage API. If data lake > storage API is available, AzureFS uses it automatically. So you should not > change blob_storage_authority. > > If you don't have this line, what was happen? > > > Thanks, > -- > kou > > In > > <dm3pr05mb1054334eeaeae4a95805de322f3...@dm3pr05mb10543.namprd05.prod.outlook.com> > "Using the new Azure filesystem object (C++)" on Wed, 10 Jul 2024 16:58:52 > +0000, > "Jerry Adair via user" <[email protected]> wrote: > >> Hi- >> >> I am attempting to use the new Azure filesystem object in C++. >> Arrow/Parquet version 16.0.0. I already have code that works for GCS and >> AWS/S3. I have been waiting for quite a while to see the new Azure >> filesystem object released. Now that it has in this version (16.0.0) I have >> been trying to use it. Without success. I presumed that it would work in >> the same manner in which the GCS and S3/AWS filesystem objects work. You >> create the object, then you can use it in the same manner that you used the >> other filesystem objects. Note that I am not using Arrow methods to >> read/write the data but rather the Parquet methods. This works for local, >> GCS and S3/AWS. However I cannot open a file on Azure. It seems like no >> matter which authentication method I try to use, it doesn't work. And I get >> different results depending on which auth approach I take (client secret >> versus account key, etc.). Here is a code summary of what I am trying to do: >> >> arrow::fs::AzureOptions azureOptions; >> arrow::Status configureStatus = arrow::Status::OK(); >> >> // exact values obfuscated >> azureOptions.account_name = "mytest"; >> azureOptions.dfs_storage_authority = ".dfs.core.windows.net"; >> azureOptions.blob_storage_authority = ".dfs.core.windows.net"; // If I >> don't do this, then the >> // >> blob.core.windows.net is used; >> // I >> want dfs not blob, so... not certain >> // why >> that happens either >> std::string client_id = "3f061894-blah"; >> std::string client_secret = "2c796e9eblah"; >> std::string tenant_id = "b1c14d5c-blah"; >> //std::string account_key = "flMhWgNts+i/blah=="; >> >> >> //configureStatus = azureOptions.ConfigureAccountKeyCredential( >> account_key ); >> configureStatus = azureOptions.ConfigureClientSecretCredential( >> tenant_id, client_id, client_secret ); >> //configureStatus = azureOptions.ConfigureManagedIdentityCredential( >> client_id ); >> if( false == configureStatus.ok() ) >> { >> // Uh-oh, throw >> >> } >> >> std::shared_ptr<arrow::fs::AzureFileSystem> azureFileSystem; >> arrow::Result<std::shared_ptr<arrow::fs::AzureFileSystem>> >> azureFileSystemResult = arrow::fs::AzureFileSystem::Make( azureOptions ); >> if( true == azureFileSystemResult.ok() ) >> { >> azureFileSystem = azureFileSystemResult.ValueOrDie(); >> >> } >> else >> { >> // Uh-oh, throw >> >> } >> >> //const std::string path( "parquet/ParquetFiles/plain.parquet" ); >> const std::string path( "parquet/ParquetFiles/plain.parquet" ); >> std::shared_ptr<arrow::io::RandomAccessFile> arrowFile; >> std::cout << "1\n"; >> arrow::Result<std::shared_ptr<arrow::io::RandomAccessFile>> >> openResult = azureFileSystem->OpenInputFile( path ); std::cout << >> "2\n"; >> >> And that is where things run off the rails. At this point, all I want to do >> is open the input file, create a Parquet file reader like so: >> >> std::unique_ptr<parquet::ParquetFileReader> parquet_reader = >> parquet::ParquetFileReader::Open( arrowFile ); >> >> Then go about my business of reading/writing Parquet data as per normal. >> Ergo, just as I do for the other filesystem objects. But the >> OpenInputFile() method fails for the Azure use case scenario. If I attempt >> the account key configuration, then the error I see is: >> >> adls_read >> Parquet file read commencing... >> 1 >> Parquet read error: map::at >> >> Where the "1" is just a marker to show how far I got in the process of >> reading a pre-existing Parquet file from the Azure server. Ergo, a low-brow >> means of debugging. The cout is shown above. I don't get to "2", obviously. >> >> When attempting the client secret credential auth, I see the following >> failure: >> >> adls_read >> Parquet file read commencing... >> 1 >> Parquet read error: GetToken(): error response: 401 Unauthorized >> >> Then when attempting the Managed Identity auth configuration, I get the >> following: >> >> adls_read >> Parquet file read commencing... >> 1 >> ^C >> >> Where the process just hangs and I have to interrupt out of it. Note that I >> didn't have this level of difficulty when I implemented our support for GCS >> and S3/AWS. Those were relatively straightforward. Azure however has been >> more difficult; this should just work. I mean, you create the filesystem >> object, then you are supposed to be able to use it in the same manner that >> you use any other Arrow filesystem object. However I can't open a file and >> I suspect it is due to some type of handshaking issue with Azure. Azure has >> all of these moving parts; tenant ID, application/client ID, client secret, >> object ID (which we don't use in this case) and the list goes on. Finally, >> I saw this in the azurefs.h header at line 102: >> >> // TODO(GH-38598): Add support for more auth methods. >> // std::string connection_string; >> // std::string sas_token; >> >> But it seemed clear to me that this was referring to other auth methods than >> those that have been implemented thus far (ergo client secret, account key, >> etc.). Am I correct? >> >> So my questions are: >> >> 1. Any ideas where I am going wrong here? >> 2. Has anyone else used the Azure filesystem object? >> 3. Has it worked for you? >> 4. If so, what was your approach? >> >> Note that I did peruse the azurefs_test.cc for examples. I did see various >> approaches. One involved invoking the MakeDataLakeServiceClient() method. >> It wasn't clear if I needed to do that or not, but then I saw that this is >> done during the private implementation of the AzureFileSystem's Make() >> method, thus: >> >> static Result<std::unique_ptr<AzureFileSystem::Impl>> Make(AzureOptions >> options, >> io::IOContext >> io_context) { >> auto self = std::unique_ptr<AzureFileSystem::Impl>( >> new AzureFileSystem::Impl(std::move(options), >> std::move(io_context))); >> ARROW_ASSIGN_OR_RAISE(self->blob_service_client_, >> self->options_.MakeBlobServiceClient()); >> ARROW_ASSIGN_OR_RAISE(self->datalake_service_client_, >> self->options_.MakeDataLakeServiceClient()); >> return self; >> } >> >> So it seemed like I wouldn't need to do it separately. >> >> Anyway, I need to get this working ASAP, so I am open to feedback. I'll >> continue plugging away. >> >> Thanks! >> Jerry
