Re: [PR] Move Parquet encryption tests into the arrow_reader integration tests [arrow-rs]
alamb merged PR #7279: URL: https://github.com/apache/arrow-rs/pull/7279 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Parquet encryption tests into the arrow_reader integration tests [arrow-rs]
alamb commented on PR #7279: URL: https://github.com/apache/arrow-rs/pull/7279#issuecomment-2720964935 Since these are tests only changes I am going to merge it in to avoid conflicts and get us ready for the next round -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Parquet encryption tests into the arrow_reader integration tests [arrow-rs]
alamb commented on code in PR #7279: URL: https://github.com/apache/arrow-rs/pull/7279#discussion_r199376 ## parquet/tests/arrow_reader/encryption_util.rs: ## @@ -15,67 +15,17 @@ // specific language governing permissions and limitations // under the License. -use crate::arrow::arrow_reader::{ -ArrowReaderMetadata, ArrowReaderOptions, ParquetRecordBatchReaderBuilder, -}; -use crate::arrow::ParquetRecordBatchStreamBuilder; -use crate::encryption::decrypt::FileDecryptionProperties; -use crate::errors::ParquetError; -use crate::file::metadata::FileMetaData; use arrow_array::cast::AsArray; use arrow_array::{types, RecordBatch}; -use futures::TryStreamExt; -use std::fs::File; +use parquet::file::metadata::ParquetMetaData; -pub(crate) fn verify_encryption_test_file_read( -file: File, -decryption_properties: FileDecryptionProperties, -) { -let options = ArrowReaderOptions::default() -.with_file_decryption_properties(decryption_properties.clone()); -let metadata = ArrowReaderMetadata::load(&file, options.clone()).unwrap(); -let file_metadata = metadata.metadata.file_metadata(); - -let builder = ParquetRecordBatchReaderBuilder::try_new_with_options(file, options).unwrap(); -let record_reader = builder.build().unwrap(); -let record_batches = record_reader -.map(|x| x.unwrap()) -.collect::>(); - -verify_encryption_test_data(record_batches, file_metadata.clone(), metadata); -} - -pub(crate) async fn verify_encryption_test_file_read_async( -file: &mut tokio::fs::File, -decryption_properties: FileDecryptionProperties, -) -> Result<(), ParquetError> { -let options = ArrowReaderOptions::new().with_file_decryption_properties(decryption_properties); - -let metadata = ArrowReaderMetadata::load_async(file, options.clone()).await?; -let arrow_reader_metadata = ArrowReaderMetadata::load_async(file, options).await?; -let file_metadata = metadata.metadata.file_metadata(); - -let record_reader = ParquetRecordBatchStreamBuilder::new_with_metadata( -file.try_clone().await?, -arrow_reader_metadata.clone(), -) -.build()?; -let record_batches = record_reader.try_collect::>().await?; - -verify_encryption_test_data(record_batches, file_metadata.clone(), metadata); -Ok(()) -} - -/// Tests reading an encrypted file from the parquet-testing repository -fn verify_encryption_test_data( -record_batches: Vec, -file_metadata: FileMetaData, -metadata: ArrowReaderMetadata, -) { +/// Verifies data read from an encrypted file from the parquet-testing repository +pub fn verify_encryption_test_data(record_batches: Vec, metadata: &ParquetMetaData) { +let file_metadata = metadata.file_metadata(); assert_eq!(file_metadata.num_rows(), 50); assert_eq!(file_metadata.schema_descr().num_columns(), 8); -metadata.metadata.row_groups().iter().for_each(|rg| { +metadata.row_groups().iter().for_each(|rg| { Review Comment: that is a nice change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Parquet encryption tests into the arrow_reader integration tests [arrow-rs]
alamb commented on PR #7279: URL: https://github.com/apache/arrow-rs/pull/7279#issuecomment-2720965135 Thanks agian! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Parquet encryption tests into the arrow_reader integration tests [arrow-rs]
adamreeve commented on code in PR #7279: URL: https://github.com/apache/arrow-rs/pull/7279#discussion_r1992593092 ## parquet/tests/arrow_reader/mod.rs: ## @@ -38,6 +38,15 @@ use tempfile::NamedTempFile; mod bad_data; #[cfg(feature = "crc")] mod checksum; +#[cfg(feature = "encryption")] +mod encryption; +#[cfg(all(feature = "encryption", feature = "async"))] +mod encryption_async; +mod encryption_common; Review Comment: Yeah `encryption_agnostic` sounds better, I couldn't think of a good name for this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Parquet encryption tests into the arrow_reader integration tests [arrow-rs]
rok commented on code in PR #7279: URL: https://github.com/apache/arrow-rs/pull/7279#discussion_r1992544852 ## parquet/tests/arrow_reader/mod.rs: ## @@ -38,6 +38,15 @@ use tempfile::NamedTempFile; mod bad_data; #[cfg(feature = "crc")] mod checksum; +#[cfg(feature = "encryption")] +mod encryption; +#[cfg(all(feature = "encryption", feature = "async"))] +mod encryption_async; +mod encryption_common; Review Comment: Nit: how about `read_encrypted_with_plaintext_footer` or `encryption_agnostic`? `encryption_common` slightly implies some sort of shared utility and made me wonder why it's not behind `#[cfg(feature = "encryption")]`. Not a big deal, feel free to ignore. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org