Re: [PR] Move Parquet encryption tests into the arrow_reader integration tests [arrow-rs]

2025-03-13 Thread via GitHub


alamb merged PR #7279:
URL: https://github.com/apache/arrow-rs/pull/7279


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Move Parquet encryption tests into the arrow_reader integration tests [arrow-rs]

2025-03-13 Thread via GitHub


alamb commented on PR #7279:
URL: https://github.com/apache/arrow-rs/pull/7279#issuecomment-2720964935

   Since these are tests only changes I am going to merge it in to avoid 
conflicts and get us ready for the next round


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Move Parquet encryption tests into the arrow_reader integration tests [arrow-rs]

2025-03-13 Thread via GitHub


alamb commented on code in PR #7279:
URL: https://github.com/apache/arrow-rs/pull/7279#discussion_r199376


##
parquet/tests/arrow_reader/encryption_util.rs:
##
@@ -15,67 +15,17 @@
 // specific language governing permissions and limitations
 // under the License.
 
-use crate::arrow::arrow_reader::{
-ArrowReaderMetadata, ArrowReaderOptions, ParquetRecordBatchReaderBuilder,
-};
-use crate::arrow::ParquetRecordBatchStreamBuilder;
-use crate::encryption::decrypt::FileDecryptionProperties;
-use crate::errors::ParquetError;
-use crate::file::metadata::FileMetaData;
 use arrow_array::cast::AsArray;
 use arrow_array::{types, RecordBatch};
-use futures::TryStreamExt;
-use std::fs::File;
+use parquet::file::metadata::ParquetMetaData;
 
-pub(crate) fn verify_encryption_test_file_read(
-file: File,
-decryption_properties: FileDecryptionProperties,
-) {
-let options = ArrowReaderOptions::default()
-.with_file_decryption_properties(decryption_properties.clone());
-let metadata = ArrowReaderMetadata::load(&file, options.clone()).unwrap();
-let file_metadata = metadata.metadata.file_metadata();
-
-let builder = ParquetRecordBatchReaderBuilder::try_new_with_options(file, 
options).unwrap();
-let record_reader = builder.build().unwrap();
-let record_batches = record_reader
-.map(|x| x.unwrap())
-.collect::>();
-
-verify_encryption_test_data(record_batches, file_metadata.clone(), 
metadata);
-}
-
-pub(crate) async fn verify_encryption_test_file_read_async(
-file: &mut tokio::fs::File,
-decryption_properties: FileDecryptionProperties,
-) -> Result<(), ParquetError> {
-let options = 
ArrowReaderOptions::new().with_file_decryption_properties(decryption_properties);
-
-let metadata = ArrowReaderMetadata::load_async(file, 
options.clone()).await?;
-let arrow_reader_metadata = ArrowReaderMetadata::load_async(file, 
options).await?;
-let file_metadata = metadata.metadata.file_metadata();
-
-let record_reader = ParquetRecordBatchStreamBuilder::new_with_metadata(
-file.try_clone().await?,
-arrow_reader_metadata.clone(),
-)
-.build()?;
-let record_batches = record_reader.try_collect::>().await?;
-
-verify_encryption_test_data(record_batches, file_metadata.clone(), 
metadata);
-Ok(())
-}
-
-/// Tests reading an encrypted file from the parquet-testing repository
-fn verify_encryption_test_data(
-record_batches: Vec,
-file_metadata: FileMetaData,
-metadata: ArrowReaderMetadata,
-) {
+/// Verifies data read from an encrypted file from the parquet-testing 
repository
+pub fn verify_encryption_test_data(record_batches: Vec, metadata: 
&ParquetMetaData) {
+let file_metadata = metadata.file_metadata();
 assert_eq!(file_metadata.num_rows(), 50);
 assert_eq!(file_metadata.schema_descr().num_columns(), 8);
 
-metadata.metadata.row_groups().iter().for_each(|rg| {
+metadata.row_groups().iter().for_each(|rg| {

Review Comment:
   that is a nice change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Move Parquet encryption tests into the arrow_reader integration tests [arrow-rs]

2025-03-13 Thread via GitHub


alamb commented on PR #7279:
URL: https://github.com/apache/arrow-rs/pull/7279#issuecomment-2720965135

   Thanks agian!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Move Parquet encryption tests into the arrow_reader integration tests [arrow-rs]

2025-03-12 Thread via GitHub


adamreeve commented on code in PR #7279:
URL: https://github.com/apache/arrow-rs/pull/7279#discussion_r1992593092


##
parquet/tests/arrow_reader/mod.rs:
##
@@ -38,6 +38,15 @@ use tempfile::NamedTempFile;
 mod bad_data;
 #[cfg(feature = "crc")]
 mod checksum;
+#[cfg(feature = "encryption")]
+mod encryption;
+#[cfg(all(feature = "encryption", feature = "async"))]
+mod encryption_async;
+mod encryption_common;

Review Comment:
   Yeah `encryption_agnostic` sounds better, I couldn't think of a good name 
for this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Move Parquet encryption tests into the arrow_reader integration tests [arrow-rs]

2025-03-12 Thread via GitHub


rok commented on code in PR #7279:
URL: https://github.com/apache/arrow-rs/pull/7279#discussion_r1992544852


##
parquet/tests/arrow_reader/mod.rs:
##
@@ -38,6 +38,15 @@ use tempfile::NamedTempFile;
 mod bad_data;
 #[cfg(feature = "crc")]
 mod checksum;
+#[cfg(feature = "encryption")]
+mod encryption;
+#[cfg(all(feature = "encryption", feature = "async"))]
+mod encryption_async;
+mod encryption_common;

Review Comment:
   Nit: how about `read_encrypted_with_plaintext_footer` or 
`encryption_agnostic`? `encryption_common` slightly implies some sort of shared 
utility and made me wonder why it's not behind `#[cfg(feature = 
"encryption")]`. Not a big deal, feel free to ignore.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org