Hi,

Could you share how did you generate values for the client
secret configuration and the managed identity configuration?
I'll try them.

Thanks,
-- 
kou

In 
 
<dm3pr05mb1054325d88f8a46fd0b169c92f3...@dm3pr05mb10543.namprd05.prod.outlook.com>
  "RE: Using the new Azure filesystem object (C++)" on Thu, 11 Jul 2024 
06:37:42 +0000,
  "Jerry Adair via user" <[email protected]> wrote:

> Hi Kou!
> 
> Well, I thought it was strange too.  I was not aware that if data lake 
> storage is available then AzureFS will use it automatically.  Thank you for 
> that information, it helps.  With that in mind, I commented out both of those 
> lines and just let the default values be assigned (which occurs in azurefs.h).
> 
> With that modification, if I attempt an account key configuration, thus:
> 
>       configureStatus = azureOptions.ConfigureAccountKeyCredential( 
> account_key );
> 
> Then it works!  I can read the Parquet file via the methods in the Parquet 
> library!
> 
> However if I use the client secret configuration, thus:
> 
>       configureStatus = azureOptions.ConfigureClientSecretCredential( 
> tenant_id, client_id, client_secret );
> 
> Then I see the unauthorized error, thus:
> 
> adls_read
> Parquet file read commencing...
> configureStatus = OK
> 1
> Parquet read error: GetToken(): error response: 401 Unauthorized
> 
> And if I use the managed identity configuration, thus:
> 
>       configureStatus = azureOptions.ConfigureManagedIdentityCredential( 
> client_id );
> 
> Then I see the hang, thus:
> 
> adls_read
> Parquet file read commencing...
> configureStatus = OK
> 1
> ^C
> 
> So I dunno about those configuration attempts.  I have double-checked the 
> values via the Azure portal that we use and those values are correct.  So 
> perhaps there is some other type of limitation that is being imposed here?  
> I'd like to offer the user different means of authenticating to get their 
> credentials, ergo they could use client secret or account key or managed 
> identity, etc.  However at the moment only account key is working.  I'll 
> continue to see what I can figure out.  If you've seen this type of 
> phenomenon in the past and recognize the error that is at-play, I'd 
> appreciate any feedback.
> 
> Thanks!
> Jerry
> 
> 
> -----Original Message-----
> From: Sutou Kouhei <[email protected]> 
> Sent: Wednesday, July 10, 2024 4:34 PM
> To: [email protected]
> Subject: Re: Using the new Azure filesystem object (C++)
> 
> EXTERNAL
> 
> Hi,
> 
>>       azureOptions.blob_storage_authority = ".dfs.core.windows.net"; // If I 
>> don't do this, then the
>>                                                                      // 
>> blob.core.windows.net is used;
>>                                                                      // I 
>> want dfs not blob, so... not certain
>>                                                                      
>> // why that happens either
> 
> This is strange. In general, you should not do this.
> AzureFS uses both of blob storage API and data lake storage API. If data lake 
> storage API is available, AzureFS uses it automatically. So you should not 
> change blob_storage_authority.
> 
> If you don't have this line, what was happen?
> 
> 
> Thanks,
> --
> kou
> 
> In
>  
> <dm3pr05mb1054334eeaeae4a95805de322f3...@dm3pr05mb10543.namprd05.prod.outlook.com>
>   "Using the new Azure filesystem object (C++)" on Wed, 10 Jul 2024 16:58:52 
> +0000,
>   "Jerry Adair via user" <[email protected]> wrote:
> 
>> Hi-
>>
>> I am attempting to use the new Azure filesystem object in C++.  
>> Arrow/Parquet version 16.0.0.  I already have code that works for GCS and 
>> AWS/S3.  I have been waiting for quite a while to see the new Azure 
>> filesystem object released.  Now that it has in this version (16.0.0) I have 
>> been trying to use it.  Without success.  I presumed that it would work in 
>> the same manner in which the GCS and S3/AWS filesystem objects work.  You 
>> create the object, then you can use it in the same manner that you used the 
>> other filesystem objects.  Note that I am not using Arrow methods to 
>> read/write the data but rather the Parquet methods.  This works for local, 
>> GCS and S3/AWS.  However I cannot open a file on Azure.  It seems like no 
>> matter which authentication method I try to use, it doesn't work.  And I get 
>> different results depending on which auth approach I take (client secret 
>> versus account key, etc.).  Here is a code summary of what I am trying to do:
>>
>>       arrow::fs::AzureOptions   azureOptions;
>>       arrow::Status             configureStatus = arrow::Status::OK();
>>
>>      // exact values obfuscated
>>       azureOptions.account_name = "mytest";
>>       azureOptions.dfs_storage_authority = ".dfs.core.windows.net";
>>       azureOptions.blob_storage_authority = ".dfs.core.windows.net"; // If I 
>> don't do this, then the
>>                                                                      // 
>> blob.core.windows.net is used;
>>                                                                      // I 
>> want dfs not blob, so... not certain
>>                                                                      // why 
>> that happens either
>>       std::string  client_id  = "3f061894-blah";
>>       std::string  client_secret  = "2c796e9eblah";
>>       std::string  tenant_id  = "b1c14d5c-blah";
>>       //std::string  account_key  = "flMhWgNts+i/blah==";
>>
>>
>>       //configureStatus = azureOptions.ConfigureAccountKeyCredential( 
>> account_key );
>>       configureStatus = azureOptions.ConfigureClientSecretCredential( 
>> tenant_id, client_id, client_secret );
>>       //configureStatus = azureOptions.ConfigureManagedIdentityCredential( 
>> client_id );
>>       if( false == configureStatus.ok() )
>>       {
>>          // Uh-oh, throw
>>
>>       }
>>
>>       std::shared_ptr<arrow::fs::AzureFileSystem>   azureFileSystem;
>>       arrow::Result<std::shared_ptr<arrow::fs::AzureFileSystem>>   
>> azureFileSystemResult = arrow::fs::AzureFileSystem::Make( azureOptions );
>>       if( true == azureFileSystemResult.ok() )
>>       {
>>          azureFileSystem = azureFileSystemResult.ValueOrDie();
>>
>>       }
>>       else
>>       {
>>          // Uh-oh, throw
>>
>>       }
>>
>>          //const std::string path( "parquet/ParquetFiles/plain.parquet" );
>>          const std::string path( "parquet/ParquetFiles/plain.parquet" );
>>          std::shared_ptr<arrow::io::RandomAccessFile> arrowFile; 
>> std::cout << "1\n";
>>          arrow::Result<std::shared_ptr<arrow::io::RandomAccessFile>> 
>> openResult = azureFileSystem->OpenInputFile( path ); std::cout << 
>> "2\n";
>>
>> And that is where things run off the rails.  At this point, all I want to do 
>> is open the input file, create a Parquet file reader like so:
>>
>>          std::unique_ptr<parquet::ParquetFileReader> parquet_reader = 
>> parquet::ParquetFileReader::Open( arrowFile );
>>
>> Then go about my business of reading/writing Parquet data as per normal.  
>> Ergo, just as I do for the other filesystem objects.  But the 
>> OpenInputFile() method fails for the Azure use case scenario.  If I attempt 
>> the account key configuration, then the error I see is:
>>
>> adls_read
>> Parquet file read commencing...
>> 1
>> Parquet read error: map::at
>>
>> Where the "1" is just a marker to show how far I got in the process of 
>> reading a pre-existing Parquet file from the Azure server.  Ergo, a low-brow 
>> means of debugging.  The cout is shown above.  I don't get to "2", obviously.
>>
>> When attempting the client secret credential auth, I see the following 
>> failure:
>>
>> adls_read
>> Parquet file read commencing...
>> 1
>> Parquet read error: GetToken(): error response: 401 Unauthorized
>>
>> Then when attempting the Managed Identity auth configuration, I get the 
>> following:
>>
>> adls_read
>> Parquet file read commencing...
>> 1
>> ^C
>>
>> Where the process just hangs and I have to interrupt out of it.  Note that I 
>> didn't have this level of difficulty when I implemented our support for GCS 
>> and S3/AWS.  Those were relatively straightforward.  Azure however has been 
>> more difficult;  this should just work.  I mean, you create the filesystem 
>> object, then you are supposed to be able to use it in the same manner that 
>> you use any other Arrow filesystem object.  However I can't open a file and 
>> I suspect it is due to some type of handshaking issue with Azure.  Azure has 
>> all of these moving parts; tenant ID, application/client ID, client secret, 
>> object ID (which we don't use in this case) and the list goes on.  Finally, 
>> I saw this in the azurefs.h header at line 102:
>>
>>   // TODO(GH-38598): Add support for more auth methods.
>>   // std::string connection_string;
>>   // std::string sas_token;
>>
>> But it seemed clear to me that this was referring to other auth methods than 
>> those that have been implemented thus far (ergo client secret, account key, 
>> etc.).  Am I correct?
>>
>> So my questions are:
>>
>>   1.  Any ideas where I am going wrong here?
>>   2.  Has anyone else used the Azure filesystem object?
>>   3.  Has it worked for you?
>>   4.  If so, what was your approach?
>>
>> Note that I did peruse the azurefs_test.cc for examples.  I did see various 
>> approaches.  One involved invoking the MakeDataLakeServiceClient() method.  
>> It wasn't clear if I needed to do that or not, but then I saw that this is 
>> done during the private implementation of the AzureFileSystem's Make() 
>> method, thus:
>>
>>   static Result<std::unique_ptr<AzureFileSystem::Impl>> Make(AzureOptions 
>> options,
>>                                                              io::IOContext 
>> io_context) {
>>     auto self = std::unique_ptr<AzureFileSystem::Impl>(
>>         new AzureFileSystem::Impl(std::move(options), 
>> std::move(io_context)));
>>     ARROW_ASSIGN_OR_RAISE(self->blob_service_client_,
>>                           self->options_.MakeBlobServiceClient());
>>     ARROW_ASSIGN_OR_RAISE(self->datalake_service_client_,
>>                           self->options_.MakeDataLakeServiceClient());
>>     return self;
>>   }
>>
>> So it seemed like I wouldn't need to do it separately.
>>
>> Anyway, I need to get this working ASAP, so I am open to feedback.  I'll 
>> continue plugging away.
>>
>> Thanks!
>> Jerry

Reply via email to