Hi, That's good to know and thanks for sharing how to do it. I'm trying this now but I haven't found how to generate a client secret on portal.azure.com yet. ;)
Thanks, -- kou In <dm3pr05mb10543198c7ac80f64ad9320caf3...@dm3pr05mb10543.namprd05.prod.outlook.com> "RE: Using the new Azure filesystem object (C++)" on Thu, 25 Jul 2024 06:04:55 +0000, "Jerry Adair via user" <[email protected]> wrote: > Hi Kou, > > Thank you for the help. Well, after enough digging, I figured it out. The > answer was and is that the code in the library works as expected. And as I > suspected, the issue was permissions related and lied on the Azure side. > Specifically, to enable the client secret method of authentication to work, > you must create a Storage Blob Data Contributor role for the storage account > that you want to access. Once I created this role, I was able to run the > sample, standalone program that uses the Arrow C++ library to access Parquet > data on an ADLS server. > > Thanks again! > Jerry > > > -----Original Message----- > From: Sutou Kouhei <[email protected]> > Sent: Wednesday, July 24, 2024 3:42 AM > To: [email protected] > Subject: Re: Using the new Azure filesystem object (C++) > > EXTERNAL > > Hi, > > Sorry for not responding this. I don't have enough time to try this yet... I > hope that I can try this tomorrow... > > (If anyone can help this, please do it.) > > Thanks, > -- > kou > > In > > <dm3pr05mb10543135ae1d24029ca133277f3...@dm3pr05mb10543.namprd05.prod.outlook.com> > "RE: Using the new Azure filesystem object (C++)" on Wed, 24 Jul 2024 > 05:08:52 +0000, > "Jerry Adair via user" <[email protected]> wrote: > >> Hi Kou, >> >> Alright, I have made it past the 401 error, which means that the recipient >> doesn't know who you are. I did this by creating a new storage account >> within our tenant in the Azure portal. Because I was the owner of the new >> account, I could create a client secret for it. I also learned that you >> need the value of that client secret and not the secret ID when invoking the >> ConfigureClientSecretCredential() method within the AzureOptions object. >> However, I now encounter a 403 error code: >> >> Parquet read error: Unable to retrieve information for the file named >> parquet/ParquetTestData/plain.parquet on the Azure server. Status = >> IOError: GetProperties for >> 'https://protect.checkpoint.com/v2/___https://ecmtest4.blob.core.windows.net/parquet/ParquetTestData/plain.parquet___.YzJ1OnNhc2luc3RpdHV0ZTpjOm86MDBlN2NjNWIyNjgzM2ZhNDJiYjU0N2VmYTk2ODZlNjI6NjoxMjMzOjBlNDgyZDVhY2FkNWEzN2VmNzYxN2Q0YzZjNDg0Y2YwMzA1YjhmZTVlYTA0YmY5ZTdhY2Y0Y2VjZmE5MzBjMzM6cDpUOk4' >> failed. GetFileInfo is unable to determine whether the path exists. Azure >> Error: [] 403 This request is not authorized to perform this operation using >> this permission. >> >> The 403 error code means that the recipient knows who you are but you don't >> have permissions to complete the task that you are attempting. So now I am >> down-to a permissions issue, or so it would seem. Therefore I have been >> experimenting within the Azure portal, enabling all types of permissions and >> such to get this to work. However none of that experimentation has resulted >> in a successful access of the resource on the Azure server (ADLS). >> >> Do you have any feedback on this? What type of permission setting would >> enable access? What is preventing my test program from accessing the >> resource? >> >> Thanks, >> Jerry >> >> >> -----Original Message----- >> From: Sutou Kouhei <[email protected]> >> Sent: Thursday, July 11, 2024 2:56 AM >> To: [email protected] >> Subject: Re: Using the new Azure filesystem object (C++) >> >> EXTERNAL >> >> Hi, >> >> Could you share how did you generate values for the client secret >> configuration and the managed identity configuration? >> I'll try them. >> >> Thanks, >> -- >> kou >> >> In >> >> <dm3pr05mb1054325d88f8a46fd0b169c92f3...@dm3pr05mb10543.namprd05.prod.outlook.com> >> "RE: Using the new Azure filesystem object (C++)" on Thu, 11 Jul 2024 >> 06:37:42 +0000, >> "Jerry Adair via user" <[email protected]> wrote: >> >>> Hi Kou! >>> >>> Well, I thought it was strange too. I was not aware that if data lake >>> storage is available then AzureFS will use it automatically. Thank you for >>> that information, it helps. With that in mind, I commented out both of >>> those lines and just let the default values be assigned (which occurs in >>> azurefs.h). >>> >>> With that modification, if I attempt an account key configuration, thus: >>> >>> configureStatus = azureOptions.ConfigureAccountKeyCredential( >>> account_key ); >>> >>> Then it works! I can read the Parquet file via the methods in the Parquet >>> library! >>> >>> However if I use the client secret configuration, thus: >>> >>> configureStatus = azureOptions.ConfigureClientSecretCredential( >>> tenant_id, client_id, client_secret ); >>> >>> Then I see the unauthorized error, thus: >>> >>> adls_read >>> Parquet file read commencing... >>> configureStatus = OK >>> 1 >>> Parquet read error: GetToken(): error response: 401 Unauthorized >>> >>> And if I use the managed identity configuration, thus: >>> >>> configureStatus = >>> azureOptions.ConfigureManagedIdentityCredential( client_id ); >>> >>> Then I see the hang, thus: >>> >>> adls_read >>> Parquet file read commencing... >>> configureStatus = OK >>> 1 >>> ^C >>> >>> So I dunno about those configuration attempts. I have double-checked the >>> values via the Azure portal that we use and those values are correct. So >>> perhaps there is some other type of limitation that is being imposed here? >>> I'd like to offer the user different means of authenticating to get their >>> credentials, ergo they could use client secret or account key or managed >>> identity, etc. However at the moment only account key is working. I'll >>> continue to see what I can figure out. If you've seen this type of >>> phenomenon in the past and recognize the error that is at-play, I'd >>> appreciate any feedback. >>> >>> Thanks! >>> Jerry >>> >>> >>> -----Original Message----- >>> From: Sutou Kouhei <[email protected]> >>> Sent: Wednesday, July 10, 2024 4:34 PM >>> To: [email protected] >>> Subject: Re: Using the new Azure filesystem object (C++) >>> >>> EXTERNAL >>> >>> Hi, >>> >>>> azureOptions.blob_storage_authority = ".dfs.core.windows.net"; // If >>>> I don't do this, then the >>>> >>>> // blob.core.windows.net is used; >>>> >>>> // I want dfs not blob, so... not certain >>>> >>>> // why that happens either >>> >>> This is strange. In general, you should not do this. >>> AzureFS uses both of blob storage API and data lake storage API. If data >>> lake storage API is available, AzureFS uses it automatically. So you should >>> not change blob_storage_authority. >>> >>> If you don't have this line, what was happen? >>> >>> >>> Thanks, >>> -- >>> kou >>> >>> In >>> >>> <dm3pr05mb1054334eeaeae4a95805de322f3...@dm3pr05mb10543.namprd05.prod.outlook.com> >>> "Using the new Azure filesystem object (C++)" on Wed, 10 Jul 2024 >>> 16:58:52 +0000, >>> "Jerry Adair via user" <[email protected]> wrote: >>> >>>> Hi- >>>> >>>> I am attempting to use the new Azure filesystem object in C++. >>>> Arrow/Parquet version 16.0.0. I already have code that works for GCS and >>>> AWS/S3. I have been waiting for quite a while to see the new Azure >>>> filesystem object released. Now that it has in this version (16.0.0) I >>>> have been trying to use it. Without success. I presumed that it would >>>> work in the same manner in which the GCS and S3/AWS filesystem objects >>>> work. You create the object, then you can use it in the same manner that >>>> you used the other filesystem objects. Note that I am not using Arrow >>>> methods to read/write the data but rather the Parquet methods. This works >>>> for local, GCS and S3/AWS. However I cannot open a file on Azure. It >>>> seems like no matter which authentication method I try to use, it doesn't >>>> work. And I get different results depending on which auth approach I take >>>> (client secret versus account key, etc.). Here is a code summary of what >>>> I am trying to do: >>>> >>>> arrow::fs::AzureOptions azureOptions; >>>> arrow::Status configureStatus = arrow::Status::OK(); >>>> >>>> // exact values obfuscated >>>> azureOptions.account_name = "mytest"; >>>> azureOptions.dfs_storage_authority = ".dfs.core.windows.net"; >>>> azureOptions.blob_storage_authority = ".dfs.core.windows.net"; // If >>>> I don't do this, then the >>>> // >>>> blob.core.windows.net is used; >>>> // I >>>> want dfs not blob, so... not certain >>>> // >>>> why that happens either >>>> std::string client_id = "3f061894-blah"; >>>> std::string client_secret = "2c796e9eblah"; >>>> std::string tenant_id = "b1c14d5c-blah"; >>>> //std::string account_key = "flMhWgNts+i/blah=="; >>>> >>>> >>>> //configureStatus = azureOptions.ConfigureAccountKeyCredential( >>>> account_key ); >>>> configureStatus = azureOptions.ConfigureClientSecretCredential( >>>> tenant_id, client_id, client_secret ); >>>> //configureStatus = azureOptions.ConfigureManagedIdentityCredential( >>>> client_id ); >>>> if( false == configureStatus.ok() ) >>>> { >>>> // Uh-oh, throw >>>> >>>> } >>>> >>>> std::shared_ptr<arrow::fs::AzureFileSystem> azureFileSystem; >>>> arrow::Result<std::shared_ptr<arrow::fs::AzureFileSystem>> >>>> azureFileSystemResult = arrow::fs::AzureFileSystem::Make( azureOptions ); >>>> if( true == azureFileSystemResult.ok() ) >>>> { >>>> azureFileSystem = azureFileSystemResult.ValueOrDie(); >>>> >>>> } >>>> else >>>> { >>>> // Uh-oh, throw >>>> >>>> } >>>> >>>> //const std::string path( "parquet/ParquetFiles/plain.parquet" ); >>>> const std::string path( "parquet/ParquetFiles/plain.parquet" ); >>>> std::shared_ptr<arrow::io::RandomAccessFile> arrowFile; >>>> std::cout << "1\n"; >>>> arrow::Result<std::shared_ptr<arrow::io::RandomAccessFile>> >>>> openResult = azureFileSystem->OpenInputFile( path ); std::cout << >>>> "2\n"; >>>> >>>> And that is where things run off the rails. At this point, all I want to >>>> do is open the input file, create a Parquet file reader like so: >>>> >>>> std::unique_ptr<parquet::ParquetFileReader> parquet_reader >>>> = parquet::ParquetFileReader::Open( arrowFile ); >>>> >>>> Then go about my business of reading/writing Parquet data as per normal. >>>> Ergo, just as I do for the other filesystem objects. But the >>>> OpenInputFile() method fails for the Azure use case scenario. If I >>>> attempt the account key configuration, then the error I see is: >>>> >>>> adls_read >>>> Parquet file read commencing... >>>> 1 >>>> Parquet read error: map::at >>>> >>>> Where the "1" is just a marker to show how far I got in the process of >>>> reading a pre-existing Parquet file from the Azure server. Ergo, a >>>> low-brow means of debugging. The cout is shown above. I don't get to >>>> "2", obviously. >>>> >>>> When attempting the client secret credential auth, I see the following >>>> failure: >>>> >>>> adls_read >>>> Parquet file read commencing... >>>> 1 >>>> Parquet read error: GetToken(): error response: 401 Unauthorized >>>> >>>> Then when attempting the Managed Identity auth configuration, I get the >>>> following: >>>> >>>> adls_read >>>> Parquet file read commencing... >>>> 1 >>>> ^C >>>> >>>> Where the process just hangs and I have to interrupt out of it. Note that >>>> I didn't have this level of difficulty when I implemented our support for >>>> GCS and S3/AWS. Those were relatively straightforward. Azure however has >>>> been more difficult; this should just work. I mean, you create the >>>> filesystem object, then you are supposed to be able to use it in the same >>>> manner that you use any other Arrow filesystem object. However I can't >>>> open a file and I suspect it is due to some type of handshaking issue with >>>> Azure. Azure has all of these moving parts; tenant ID, application/client >>>> ID, client secret, object ID (which we don't use in this case) and the >>>> list goes on. Finally, I saw this in the azurefs.h header at line 102: >>>> >>>> // TODO(GH-38598): Add support for more auth methods. >>>> // std::string connection_string; >>>> // std::string sas_token; >>>> >>>> But it seemed clear to me that this was referring to other auth methods >>>> than those that have been implemented thus far (ergo client secret, >>>> account key, etc.). Am I correct? >>>> >>>> So my questions are: >>>> >>>> 1. Any ideas where I am going wrong here? >>>> 2. Has anyone else used the Azure filesystem object? >>>> 3. Has it worked for you? >>>> 4. If so, what was your approach? >>>> >>>> Note that I did peruse the azurefs_test.cc for examples. I did see >>>> various approaches. One involved invoking the MakeDataLakeServiceClient() >>>> method. It wasn't clear if I needed to do that or not, but then I saw >>>> that this is done during the private implementation of the >>>> AzureFileSystem's Make() method, thus: >>>> >>>> static Result<std::unique_ptr<AzureFileSystem::Impl>> Make(AzureOptions >>>> options, >>>> io::IOContext >>>> io_context) { >>>> auto self = std::unique_ptr<AzureFileSystem::Impl>( >>>> new AzureFileSystem::Impl(std::move(options), >>>> std::move(io_context))); >>>> ARROW_ASSIGN_OR_RAISE(self->blob_service_client_, >>>> self->options_.MakeBlobServiceClient()); >>>> ARROW_ASSIGN_OR_RAISE(self->datalake_service_client_, >>>> self->options_.MakeDataLakeServiceClient()); >>>> return self; >>>> } >>>> >>>> So it seemed like I wouldn't need to do it separately. >>>> >>>> Anyway, I need to get this working ASAP, so I am open to feedback. I'll >>>> continue plugging away. >>>> >>>> Thanks! >>>> Jerry
