Hi Kou,

Alright, I have made it past the 401 error, which means that the recipient 
doesn't know who you are.  I did this by creating a new storage account within 
our tenant in the Azure portal.  Because I was the owner of the new account, I 
could create a client secret for it.  I also learned that you need the value of 
that client secret and not the secret ID when invoking the 
ConfigureClientSecretCredential() method within the AzureOptions object.  
However, I now encounter a 403 error code:

Parquet read error: Unable to retrieve information for the file named 
parquet/ParquetTestData/plain.parquet on the Azure server.  Status = IOError: 
GetProperties for 
'https://ecmtest4.blob.core.windows.net/parquet/ParquetTestData/plain.parquet' 
failed. GetFileInfo is unable to determine whether the path exists. Azure 
Error: [] 403 This request is not authorized to perform this operation using 
this permission.

The 403 error code means that the recipient knows who you are but you don't 
have permissions to complete the task that you are attempting.  So now I am 
down-to a permissions issue, or so it would seem.  Therefore I have been 
experimenting within the Azure portal, enabling all types of permissions and 
such to get this to work.  However none of that experimentation has resulted in 
a successful access of the resource on the Azure server (ADLS).

Do you have any feedback on this?  What type of permission setting would enable 
access?  What is preventing my test program from accessing the resource?

Thanks,
Jerry


-----Original Message-----
From: Sutou Kouhei <[email protected]> 
Sent: Thursday, July 11, 2024 2:56 AM
To: [email protected]
Subject: Re: Using the new Azure filesystem object (C++)

EXTERNAL

Hi,

Could you share how did you generate values for the client secret configuration 
and the managed identity configuration?
I'll try them.

Thanks,
--
kou

In
 
<dm3pr05mb1054325d88f8a46fd0b169c92f3...@dm3pr05mb10543.namprd05.prod.outlook.com>
  "RE: Using the new Azure filesystem object (C++)" on Thu, 11 Jul 2024 
06:37:42 +0000,
  "Jerry Adair via user" <[email protected]> wrote:

> Hi Kou!
>
> Well, I thought it was strange too.  I was not aware that if data lake 
> storage is available then AzureFS will use it automatically.  Thank you for 
> that information, it helps.  With that in mind, I commented out both of those 
> lines and just let the default values be assigned (which occurs in azurefs.h).
>
> With that modification, if I attempt an account key configuration, thus:
>
>       configureStatus = azureOptions.ConfigureAccountKeyCredential( 
> account_key );
>
> Then it works!  I can read the Parquet file via the methods in the Parquet 
> library!
>
> However if I use the client secret configuration, thus:
>
>       configureStatus = azureOptions.ConfigureClientSecretCredential( 
> tenant_id, client_id, client_secret );
>
> Then I see the unauthorized error, thus:
>
> adls_read
> Parquet file read commencing...
> configureStatus = OK
> 1
> Parquet read error: GetToken(): error response: 401 Unauthorized
>
> And if I use the managed identity configuration, thus:
>
>       configureStatus = 
> azureOptions.ConfigureManagedIdentityCredential( client_id );
>
> Then I see the hang, thus:
>
> adls_read
> Parquet file read commencing...
> configureStatus = OK
> 1
> ^C
>
> So I dunno about those configuration attempts.  I have double-checked the 
> values via the Azure portal that we use and those values are correct.  So 
> perhaps there is some other type of limitation that is being imposed here?  
> I'd like to offer the user different means of authenticating to get their 
> credentials, ergo they could use client secret or account key or managed 
> identity, etc.  However at the moment only account key is working.  I'll 
> continue to see what I can figure out.  If you've seen this type of 
> phenomenon in the past and recognize the error that is at-play, I'd 
> appreciate any feedback.
>
> Thanks!
> Jerry
>
>
> -----Original Message-----
> From: Sutou Kouhei <[email protected]>
> Sent: Wednesday, July 10, 2024 4:34 PM
> To: [email protected]
> Subject: Re: Using the new Azure filesystem object (C++)
>
> EXTERNAL
>
> Hi,
>
>>       azureOptions.blob_storage_authority = ".dfs.core.windows.net"; // If I 
>> don't do this, then the
>>                                                                      // 
>> blob.core.windows.net is used;
>>                                                                      
>> // I want dfs not blob, so... not certain
>>
>> // why that happens either
>
> This is strange. In general, you should not do this.
> AzureFS uses both of blob storage API and data lake storage API. If data lake 
> storage API is available, AzureFS uses it automatically. So you should not 
> change blob_storage_authority.
>
> If you don't have this line, what was happen?
>
>
> Thanks,
> --
> kou
>
> In
>  
> <dm3pr05mb1054334eeaeae4a95805de322f3...@dm3pr05mb10543.namprd05.prod.outlook.com>
>   "Using the new Azure filesystem object (C++)" on Wed, 10 Jul 2024 16:58:52 
> +0000,
>   "Jerry Adair via user" <[email protected]> wrote:
>
>> Hi-
>>
>> I am attempting to use the new Azure filesystem object in C++.  
>> Arrow/Parquet version 16.0.0.  I already have code that works for GCS and 
>> AWS/S3.  I have been waiting for quite a while to see the new Azure 
>> filesystem object released.  Now that it has in this version (16.0.0) I have 
>> been trying to use it.  Without success.  I presumed that it would work in 
>> the same manner in which the GCS and S3/AWS filesystem objects work.  You 
>> create the object, then you can use it in the same manner that you used the 
>> other filesystem objects.  Note that I am not using Arrow methods to 
>> read/write the data but rather the Parquet methods.  This works for local, 
>> GCS and S3/AWS.  However I cannot open a file on Azure.  It seems like no 
>> matter which authentication method I try to use, it doesn't work.  And I get 
>> different results depending on which auth approach I take (client secret 
>> versus account key, etc.).  Here is a code summary of what I am trying to do:
>>
>>       arrow::fs::AzureOptions   azureOptions;
>>       arrow::Status             configureStatus = arrow::Status::OK();
>>
>>      // exact values obfuscated
>>       azureOptions.account_name = "mytest";
>>       azureOptions.dfs_storage_authority = ".dfs.core.windows.net";
>>       azureOptions.blob_storage_authority = ".dfs.core.windows.net"; // If I 
>> don't do this, then the
>>                                                                      // 
>> blob.core.windows.net is used;
>>                                                                      // I 
>> want dfs not blob, so... not certain
>>                                                                      // why 
>> that happens either
>>       std::string  client_id  = "3f061894-blah";
>>       std::string  client_secret  = "2c796e9eblah";
>>       std::string  tenant_id  = "b1c14d5c-blah";
>>       //std::string  account_key  = "flMhWgNts+i/blah==";
>>
>>
>>       //configureStatus = azureOptions.ConfigureAccountKeyCredential( 
>> account_key );
>>       configureStatus = azureOptions.ConfigureClientSecretCredential( 
>> tenant_id, client_id, client_secret );
>>       //configureStatus = azureOptions.ConfigureManagedIdentityCredential( 
>> client_id );
>>       if( false == configureStatus.ok() )
>>       {
>>          // Uh-oh, throw
>>
>>       }
>>
>>       std::shared_ptr<arrow::fs::AzureFileSystem>   azureFileSystem;
>>       arrow::Result<std::shared_ptr<arrow::fs::AzureFileSystem>>   
>> azureFileSystemResult = arrow::fs::AzureFileSystem::Make( azureOptions );
>>       if( true == azureFileSystemResult.ok() )
>>       {
>>          azureFileSystem = azureFileSystemResult.ValueOrDie();
>>
>>       }
>>       else
>>       {
>>          // Uh-oh, throw
>>
>>       }
>>
>>          //const std::string path( "parquet/ParquetFiles/plain.parquet" );
>>          const std::string path( "parquet/ParquetFiles/plain.parquet" );
>>          std::shared_ptr<arrow::io::RandomAccessFile> arrowFile; 
>> std::cout << "1\n";
>>          arrow::Result<std::shared_ptr<arrow::io::RandomAccessFile>>
>> openResult = azureFileSystem->OpenInputFile( path ); std::cout << 
>> "2\n";
>>
>> And that is where things run off the rails.  At this point, all I want to do 
>> is open the input file, create a Parquet file reader like so:
>>
>>          std::unique_ptr<parquet::ParquetFileReader> parquet_reader = 
>> parquet::ParquetFileReader::Open( arrowFile );
>>
>> Then go about my business of reading/writing Parquet data as per normal.  
>> Ergo, just as I do for the other filesystem objects.  But the 
>> OpenInputFile() method fails for the Azure use case scenario.  If I attempt 
>> the account key configuration, then the error I see is:
>>
>> adls_read
>> Parquet file read commencing...
>> 1
>> Parquet read error: map::at
>>
>> Where the "1" is just a marker to show how far I got in the process of 
>> reading a pre-existing Parquet file from the Azure server.  Ergo, a low-brow 
>> means of debugging.  The cout is shown above.  I don't get to "2", obviously.
>>
>> When attempting the client secret credential auth, I see the following 
>> failure:
>>
>> adls_read
>> Parquet file read commencing...
>> 1
>> Parquet read error: GetToken(): error response: 401 Unauthorized
>>
>> Then when attempting the Managed Identity auth configuration, I get the 
>> following:
>>
>> adls_read
>> Parquet file read commencing...
>> 1
>> ^C
>>
>> Where the process just hangs and I have to interrupt out of it.  Note that I 
>> didn't have this level of difficulty when I implemented our support for GCS 
>> and S3/AWS.  Those were relatively straightforward.  Azure however has been 
>> more difficult;  this should just work.  I mean, you create the filesystem 
>> object, then you are supposed to be able to use it in the same manner that 
>> you use any other Arrow filesystem object.  However I can't open a file and 
>> I suspect it is due to some type of handshaking issue with Azure.  Azure has 
>> all of these moving parts; tenant ID, application/client ID, client secret, 
>> object ID (which we don't use in this case) and the list goes on.  Finally, 
>> I saw this in the azurefs.h header at line 102:
>>
>>   // TODO(GH-38598): Add support for more auth methods.
>>   // std::string connection_string;
>>   // std::string sas_token;
>>
>> But it seemed clear to me that this was referring to other auth methods than 
>> those that have been implemented thus far (ergo client secret, account key, 
>> etc.).  Am I correct?
>>
>> So my questions are:
>>
>>   1.  Any ideas where I am going wrong here?
>>   2.  Has anyone else used the Azure filesystem object?
>>   3.  Has it worked for you?
>>   4.  If so, what was your approach?
>>
>> Note that I did peruse the azurefs_test.cc for examples.  I did see various 
>> approaches.  One involved invoking the MakeDataLakeServiceClient() method.  
>> It wasn't clear if I needed to do that or not, but then I saw that this is 
>> done during the private implementation of the AzureFileSystem's Make() 
>> method, thus:
>>
>>   static Result<std::unique_ptr<AzureFileSystem::Impl>> Make(AzureOptions 
>> options,
>>                                                              io::IOContext 
>> io_context) {
>>     auto self = std::unique_ptr<AzureFileSystem::Impl>(
>>         new AzureFileSystem::Impl(std::move(options), 
>> std::move(io_context)));
>>     ARROW_ASSIGN_OR_RAISE(self->blob_service_client_,
>>                           self->options_.MakeBlobServiceClient());
>>     ARROW_ASSIGN_OR_RAISE(self->datalake_service_client_,
>>                           self->options_.MakeDataLakeServiceClient());
>>     return self;
>>   }
>>
>> So it seemed like I wouldn't need to do it separately.
>>
>> Anyway, I need to get this working ASAP, so I am open to feedback.  I'll 
>> continue plugging away.
>>
>> Thanks!
>> Jerry

Reply via email to