Hi,

That's good to know and thanks for sharing how to do it. I'm
trying this now but I haven't found how to generate a client
secret on portal.azure.com yet. ;)


Thanks,
-- 
kou

In 
 
<dm3pr05mb10543198c7ac80f64ad9320caf3...@dm3pr05mb10543.namprd05.prod.outlook.com>
  "RE: Using the new Azure filesystem object (C++)" on Thu, 25 Jul 2024 
06:04:55 +0000,
  "Jerry Adair via user" <[email protected]> wrote:

> Hi Kou,
> 
> Thank you for the help.  Well, after enough digging, I figured it out.  The 
> answer was and is that the code in the library works as expected.  And as I 
> suspected, the issue was permissions related and lied on the Azure side.  
> Specifically, to enable the client secret method of authentication to work, 
> you must create a Storage Blob Data Contributor role for the storage account 
> that you want to access.  Once I created this role, I was able to run the 
> sample, standalone program that uses the Arrow C++ library to access Parquet 
> data on an ADLS server.
> 
> Thanks again!
> Jerry
> 
> 
> -----Original Message-----
> From: Sutou Kouhei <[email protected]>
> Sent: Wednesday, July 24, 2024 3:42 AM
> To: [email protected]
> Subject: Re: Using the new Azure filesystem object (C++)
> 
> EXTERNAL
> 
> Hi,
> 
> Sorry for not responding this. I don't have enough time to try this yet... I 
> hope that I can try this tomorrow...
> 
> (If anyone can help this, please do it.)
> 
> Thanks,
> --
> kou
> 
> In
>  
> <dm3pr05mb10543135ae1d24029ca133277f3...@dm3pr05mb10543.namprd05.prod.outlook.com>
>   "RE: Using the new Azure filesystem object (C++)" on Wed, 24 Jul 2024 
> 05:08:52 +0000,
>   "Jerry Adair via user" <[email protected]> wrote:
> 
>> Hi Kou,
>>
>> Alright, I have made it past the 401 error, which means that the recipient 
>> doesn't know who you are.  I did this by creating a new storage account 
>> within our tenant in the Azure portal.  Because I was the owner of the new 
>> account, I could create a client secret for it.  I also learned that you 
>> need the value of that client secret and not the secret ID when invoking the 
>> ConfigureClientSecretCredential() method within the AzureOptions object.  
>> However, I now encounter a 403 error code:
>>
>> Parquet read error: Unable to retrieve information for the file named 
>> parquet/ParquetTestData/plain.parquet on the Azure server.  Status = 
>> IOError: GetProperties for 
>> 'https://protect.checkpoint.com/v2/___https://ecmtest4.blob.core.windows.net/parquet/ParquetTestData/plain.parquet___.YzJ1OnNhc2luc3RpdHV0ZTpjOm86MDBlN2NjNWIyNjgzM2ZhNDJiYjU0N2VmYTk2ODZlNjI6NjoxMjMzOjBlNDgyZDVhY2FkNWEzN2VmNzYxN2Q0YzZjNDg0Y2YwMzA1YjhmZTVlYTA0YmY5ZTdhY2Y0Y2VjZmE5MzBjMzM6cDpUOk4'
>>  failed. GetFileInfo is unable to determine whether the path exists. Azure 
>> Error: [] 403 This request is not authorized to perform this operation using 
>> this permission.
>>
>> The 403 error code means that the recipient knows who you are but you don't 
>> have permissions to complete the task that you are attempting.  So now I am 
>> down-to a permissions issue, or so it would seem.  Therefore I have been 
>> experimenting within the Azure portal, enabling all types of permissions and 
>> such to get this to work.  However none of that experimentation has resulted 
>> in a successful access of the resource on the Azure server (ADLS).
>>
>> Do you have any feedback on this?  What type of permission setting would 
>> enable access?  What is preventing my test program from accessing the 
>> resource?
>>
>> Thanks,
>> Jerry
>>
>>
>> -----Original Message-----
>> From: Sutou Kouhei <[email protected]>
>> Sent: Thursday, July 11, 2024 2:56 AM
>> To: [email protected]
>> Subject: Re: Using the new Azure filesystem object (C++)
>>
>> EXTERNAL
>>
>> Hi,
>>
>> Could you share how did you generate values for the client secret 
>> configuration and the managed identity configuration?
>> I'll try them.
>>
>> Thanks,
>> --
>> kou
>>
>> In
>>  
>> <dm3pr05mb1054325d88f8a46fd0b169c92f3...@dm3pr05mb10543.namprd05.prod.outlook.com>
>>   "RE: Using the new Azure filesystem object (C++)" on Thu, 11 Jul 2024 
>> 06:37:42 +0000,
>>   "Jerry Adair via user" <[email protected]> wrote:
>>
>>> Hi Kou!
>>>
>>> Well, I thought it was strange too.  I was not aware that if data lake 
>>> storage is available then AzureFS will use it automatically.  Thank you for 
>>> that information, it helps.  With that in mind, I commented out both of 
>>> those lines and just let the default values be assigned (which occurs in 
>>> azurefs.h).
>>>
>>> With that modification, if I attempt an account key configuration, thus:
>>>
>>>       configureStatus = azureOptions.ConfigureAccountKeyCredential(
>>> account_key );
>>>
>>> Then it works!  I can read the Parquet file via the methods in the Parquet 
>>> library!
>>>
>>> However if I use the client secret configuration, thus:
>>>
>>>       configureStatus = azureOptions.ConfigureClientSecretCredential(
>>> tenant_id, client_id, client_secret );
>>>
>>> Then I see the unauthorized error, thus:
>>>
>>> adls_read
>>> Parquet file read commencing...
>>> configureStatus = OK
>>> 1
>>> Parquet read error: GetToken(): error response: 401 Unauthorized
>>>
>>> And if I use the managed identity configuration, thus:
>>>
>>>       configureStatus =
>>> azureOptions.ConfigureManagedIdentityCredential( client_id );
>>>
>>> Then I see the hang, thus:
>>>
>>> adls_read
>>> Parquet file read commencing...
>>> configureStatus = OK
>>> 1
>>> ^C
>>>
>>> So I dunno about those configuration attempts.  I have double-checked the 
>>> values via the Azure portal that we use and those values are correct.  So 
>>> perhaps there is some other type of limitation that is being imposed here?  
>>> I'd like to offer the user different means of authenticating to get their 
>>> credentials, ergo they could use client secret or account key or managed 
>>> identity, etc.  However at the moment only account key is working.  I'll 
>>> continue to see what I can figure out.  If you've seen this type of 
>>> phenomenon in the past and recognize the error that is at-play, I'd 
>>> appreciate any feedback.
>>>
>>> Thanks!
>>> Jerry
>>>
>>>
>>> -----Original Message-----
>>> From: Sutou Kouhei <[email protected]>
>>> Sent: Wednesday, July 10, 2024 4:34 PM
>>> To: [email protected]
>>> Subject: Re: Using the new Azure filesystem object (C++)
>>>
>>> EXTERNAL
>>>
>>> Hi,
>>>
>>>>       azureOptions.blob_storage_authority = ".dfs.core.windows.net"; // If 
>>>> I don't do this, then the
>>>>
>>>> // blob.core.windows.net is used;
>>>>
>>>> // I want dfs not blob, so... not certain
>>>>
>>>> // why that happens either
>>>
>>> This is strange. In general, you should not do this.
>>> AzureFS uses both of blob storage API and data lake storage API. If data 
>>> lake storage API is available, AzureFS uses it automatically. So you should 
>>> not change blob_storage_authority.
>>>
>>> If you don't have this line, what was happen?
>>>
>>>
>>> Thanks,
>>> --
>>> kou
>>>
>>> In
>>>  
>>> <dm3pr05mb1054334eeaeae4a95805de322f3...@dm3pr05mb10543.namprd05.prod.outlook.com>
>>>   "Using the new Azure filesystem object (C++)" on Wed, 10 Jul 2024 
>>> 16:58:52 +0000,
>>>   "Jerry Adair via user" <[email protected]> wrote:
>>>
>>>> Hi-
>>>>
>>>> I am attempting to use the new Azure filesystem object in C++.  
>>>> Arrow/Parquet version 16.0.0.  I already have code that works for GCS and 
>>>> AWS/S3.  I have been waiting for quite a while to see the new Azure 
>>>> filesystem object released.  Now that it has in this version (16.0.0) I 
>>>> have been trying to use it.  Without success.  I presumed that it would 
>>>> work in the same manner in which the GCS and S3/AWS filesystem objects 
>>>> work.  You create the object, then you can use it in the same manner that 
>>>> you used the other filesystem objects.  Note that I am not using Arrow 
>>>> methods to read/write the data but rather the Parquet methods.  This works 
>>>> for local, GCS and S3/AWS.  However I cannot open a file on Azure.  It 
>>>> seems like no matter which authentication method I try to use, it doesn't 
>>>> work.  And I get different results depending on which auth approach I take 
>>>> (client secret versus account key, etc.).  Here is a code summary of what 
>>>> I am trying to do:
>>>>
>>>>       arrow::fs::AzureOptions   azureOptions;
>>>>       arrow::Status             configureStatus = arrow::Status::OK();
>>>>
>>>>      // exact values obfuscated
>>>>       azureOptions.account_name = "mytest";
>>>>       azureOptions.dfs_storage_authority = ".dfs.core.windows.net";
>>>>       azureOptions.blob_storage_authority = ".dfs.core.windows.net"; // If 
>>>> I don't do this, then the
>>>>                                                                      // 
>>>> blob.core.windows.net is used;
>>>>                                                                      // I 
>>>> want dfs not blob, so... not certain
>>>>                                                                      // 
>>>> why that happens either
>>>>       std::string  client_id  = "3f061894-blah";
>>>>       std::string  client_secret  = "2c796e9eblah";
>>>>       std::string  tenant_id  = "b1c14d5c-blah";
>>>>       //std::string  account_key  = "flMhWgNts+i/blah==";
>>>>
>>>>
>>>>       //configureStatus = azureOptions.ConfigureAccountKeyCredential( 
>>>> account_key );
>>>>       configureStatus = azureOptions.ConfigureClientSecretCredential( 
>>>> tenant_id, client_id, client_secret );
>>>>       //configureStatus = azureOptions.ConfigureManagedIdentityCredential( 
>>>> client_id );
>>>>       if( false == configureStatus.ok() )
>>>>       {
>>>>          // Uh-oh, throw
>>>>
>>>>       }
>>>>
>>>>       std::shared_ptr<arrow::fs::AzureFileSystem>   azureFileSystem;
>>>>       arrow::Result<std::shared_ptr<arrow::fs::AzureFileSystem>>   
>>>> azureFileSystemResult = arrow::fs::AzureFileSystem::Make( azureOptions );
>>>>       if( true == azureFileSystemResult.ok() )
>>>>       {
>>>>          azureFileSystem = azureFileSystemResult.ValueOrDie();
>>>>
>>>>       }
>>>>       else
>>>>       {
>>>>          // Uh-oh, throw
>>>>
>>>>       }
>>>>
>>>>          //const std::string path( "parquet/ParquetFiles/plain.parquet" );
>>>>          const std::string path( "parquet/ParquetFiles/plain.parquet" );
>>>>          std::shared_ptr<arrow::io::RandomAccessFile> arrowFile;
>>>> std::cout << "1\n";
>>>>          arrow::Result<std::shared_ptr<arrow::io::RandomAccessFile>>
>>>> openResult = azureFileSystem->OpenInputFile( path ); std::cout <<
>>>> "2\n";
>>>>
>>>> And that is where things run off the rails.  At this point, all I want to 
>>>> do is open the input file, create a Parquet file reader like so:
>>>>
>>>>          std::unique_ptr<parquet::ParquetFileReader> parquet_reader
>>>> = parquet::ParquetFileReader::Open( arrowFile );
>>>>
>>>> Then go about my business of reading/writing Parquet data as per normal.  
>>>> Ergo, just as I do for the other filesystem objects.  But the 
>>>> OpenInputFile() method fails for the Azure use case scenario.  If I 
>>>> attempt the account key configuration, then the error I see is:
>>>>
>>>> adls_read
>>>> Parquet file read commencing...
>>>> 1
>>>> Parquet read error: map::at
>>>>
>>>> Where the "1" is just a marker to show how far I got in the process of 
>>>> reading a pre-existing Parquet file from the Azure server.  Ergo, a 
>>>> low-brow means of debugging.  The cout is shown above.  I don't get to 
>>>> "2", obviously.
>>>>
>>>> When attempting the client secret credential auth, I see the following 
>>>> failure:
>>>>
>>>> adls_read
>>>> Parquet file read commencing...
>>>> 1
>>>> Parquet read error: GetToken(): error response: 401 Unauthorized
>>>>
>>>> Then when attempting the Managed Identity auth configuration, I get the 
>>>> following:
>>>>
>>>> adls_read
>>>> Parquet file read commencing...
>>>> 1
>>>> ^C
>>>>
>>>> Where the process just hangs and I have to interrupt out of it.  Note that 
>>>> I didn't have this level of difficulty when I implemented our support for 
>>>> GCS and S3/AWS.  Those were relatively straightforward.  Azure however has 
>>>> been more difficult;  this should just work.  I mean, you create the 
>>>> filesystem object, then you are supposed to be able to use it in the same 
>>>> manner that you use any other Arrow filesystem object.  However I can't 
>>>> open a file and I suspect it is due to some type of handshaking issue with 
>>>> Azure.  Azure has all of these moving parts; tenant ID, application/client 
>>>> ID, client secret, object ID (which we don't use in this case) and the 
>>>> list goes on.  Finally, I saw this in the azurefs.h header at line 102:
>>>>
>>>>   // TODO(GH-38598): Add support for more auth methods.
>>>>   // std::string connection_string;
>>>>   // std::string sas_token;
>>>>
>>>> But it seemed clear to me that this was referring to other auth methods 
>>>> than those that have been implemented thus far (ergo client secret, 
>>>> account key, etc.).  Am I correct?
>>>>
>>>> So my questions are:
>>>>
>>>>   1.  Any ideas where I am going wrong here?
>>>>   2.  Has anyone else used the Azure filesystem object?
>>>>   3.  Has it worked for you?
>>>>   4.  If so, what was your approach?
>>>>
>>>> Note that I did peruse the azurefs_test.cc for examples.  I did see 
>>>> various approaches.  One involved invoking the MakeDataLakeServiceClient() 
>>>> method.  It wasn't clear if I needed to do that or not, but then I saw 
>>>> that this is done during the private implementation of the 
>>>> AzureFileSystem's Make() method, thus:
>>>>
>>>>   static Result<std::unique_ptr<AzureFileSystem::Impl>> Make(AzureOptions 
>>>> options,
>>>>                                                              io::IOContext 
>>>> io_context) {
>>>>     auto self = std::unique_ptr<AzureFileSystem::Impl>(
>>>>         new AzureFileSystem::Impl(std::move(options), 
>>>> std::move(io_context)));
>>>>     ARROW_ASSIGN_OR_RAISE(self->blob_service_client_,
>>>>                           self->options_.MakeBlobServiceClient());
>>>>     ARROW_ASSIGN_OR_RAISE(self->datalake_service_client_,
>>>>                           self->options_.MakeDataLakeServiceClient());
>>>>     return self;
>>>>   }
>>>>
>>>> So it seemed like I wouldn't need to do it separately.
>>>>
>>>> Anyway, I need to get this working ASAP, so I am open to feedback.  I'll 
>>>> continue plugging away.
>>>>
>>>> Thanks!
>>>> Jerry

Reply via email to