Re: [PR] feat: create builder for disk manager [datafusion]

2025-05-29 Thread via GitHub


xudong963 merged PR #16191:
URL: https://github.com/apache/datafusion/pull/16191


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: create builder for disk manager [datafusion]

2025-05-28 Thread via GitHub


zhuqi-lucas commented on code in PR #16191:
URL: https://github.com/apache/datafusion/pull/16191#discussion_r2113090658


##
datafusion/execution/src/disk_manager.rs:
##
@@ -32,7 +32,95 @@ use crate::memory_pool::human_readable_size;
 
 const DEFAULT_MAX_TEMP_DIRECTORY_SIZE: u64 = 100 * 1024 * 1024 * 1024; // 100GB
 
+/// Builder pattern for the [DiskManager] structure
+#[derive(Clone, Debug)]
+pub struct DiskManagerBuilder {
+/// The storage mode of the disk manager
+mode: DiskManagerMode,
+/// The maximum amount of data (in bytes) stored inside the temporary 
directories.
+/// Default to 100GB
+max_temp_directory_size: u64,
+}
+
+impl Default for DiskManagerBuilder {
+fn default() -> Self {
+Self {
+mode: DiskManagerMode::OsTmpDirectory,
+max_temp_directory_size: DEFAULT_MAX_TEMP_DIRECTORY_SIZE,
+}
+}
+}
+
+impl DiskManagerBuilder {
+pub fn set_mode(&mut self, mode: DiskManagerMode) {
+self.mode = mode;
+}
+
+pub fn with_mode(mut self, mode: DiskManagerMode) -> Self {
+self.set_mode(mode);
+self
+}
+
+pub fn set_max_temp_directory_size(&mut self, value: u64) {
+self.max_temp_directory_size = value;
+}
+
+pub fn with_max_temp_directory_size(mut self, value: u64) -> Self {
+self.set_max_temp_directory_size(value);
+self
+}
+
+/// Create a DiskManager given the builder
+pub fn build(self) -> Result {
+match self.mode {
+DiskManagerMode::OsTmpDirectory => Ok(DiskManager {
+local_dirs: Mutex::new(Some(vec![])),
+max_temp_directory_size: self.max_temp_directory_size,
+used_disk_space: Arc::new(AtomicU64::new(0)),
+}),
+DiskManagerMode::Directories(conf_dirs) => {
+let local_dirs = create_local_dirs(conf_dirs)?;
+debug!(
+"Created local dirs {local_dirs:?} as DataFusion working 
directory"
+);
+Ok(DiskManager {
+local_dirs: Mutex::new(Some(local_dirs)),

Review Comment:
   Thanks , got it!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: create builder for disk manager [datafusion]

2025-05-28 Thread via GitHub


alamb commented on code in PR #16191:
URL: https://github.com/apache/datafusion/pull/16191#discussion_r2112173381


##
datafusion/execution/src/disk_manager.rs:
##
@@ -91,6 +177,11 @@ pub struct DiskManager {
 }
 
 impl DiskManager {
+/// Creates a builder for [DiskManager]
+pub fn builder() -> DiskManagerBuilder {
+DiskManagerBuilder::default()
+}
+
 /// Create a DiskManager given the configuration
 pub fn try_new(config: DiskManagerConfig) -> Result> {

Review Comment:
   thank you



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: create builder for disk manager [datafusion]

2025-05-28 Thread via GitHub


jdrouet commented on code in PR #16191:
URL: https://github.com/apache/datafusion/pull/16191#discussion_r2111797754


##
datafusion/execution/src/disk_manager.rs:
##
@@ -32,7 +32,95 @@ use crate::memory_pool::human_readable_size;
 
 const DEFAULT_MAX_TEMP_DIRECTORY_SIZE: u64 = 100 * 1024 * 1024 * 1024; // 100GB
 
+/// Builder pattern for the [DiskManager] structure
+#[derive(Clone, Debug)]
+pub struct DiskManagerBuilder {
+/// The storage mode of the disk manager
+mode: DiskManagerMode,
+/// The maximum amount of data (in bytes) stored inside the temporary 
directories.
+/// Default to 100GB
+max_temp_directory_size: u64,
+}
+
+impl Default for DiskManagerBuilder {
+fn default() -> Self {
+Self {
+mode: DiskManagerMode::OsTmpDirectory,
+max_temp_directory_size: DEFAULT_MAX_TEMP_DIRECTORY_SIZE,
+}
+}
+}
+
+impl DiskManagerBuilder {
+pub fn set_mode(&mut self, mode: DiskManagerMode) {
+self.mode = mode;
+}
+
+pub fn with_mode(mut self, mode: DiskManagerMode) -> Self {
+self.set_mode(mode);
+self
+}
+
+pub fn set_max_temp_directory_size(&mut self, value: u64) {
+self.max_temp_directory_size = value;
+}
+
+pub fn with_max_temp_directory_size(mut self, value: u64) -> Self {
+self.set_max_temp_directory_size(value);
+self
+}
+
+/// Create a DiskManager given the builder
+pub fn build(self) -> Result {
+match self.mode {
+DiskManagerMode::OsTmpDirectory => Ok(DiskManager {
+local_dirs: Mutex::new(Some(vec![])),
+max_temp_directory_size: self.max_temp_directory_size,
+used_disk_space: Arc::new(AtomicU64::new(0)),
+}),
+DiskManagerMode::Directories(conf_dirs) => {
+let local_dirs = create_local_dirs(conf_dirs)?;
+debug!(
+"Created local dirs {local_dirs:?} as DataFusion working 
directory"
+);
+Ok(DiskManager {
+local_dirs: Mutex::new(Some(local_dirs)),

Review Comment:
   Actually, I just moved the existing to a builder. So right now, we have the 
same behavior as before, meaning that each dir has the same limit.
   For more details, you'd have to go back to the `current_file_disk_usage` 
computation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: create builder for disk manager [datafusion]

2025-05-28 Thread via GitHub


jdrouet commented on code in PR #16191:
URL: https://github.com/apache/datafusion/pull/16191#discussion_r2111306335


##
datafusion/execution/src/disk_manager.rs:
##
@@ -32,6 +32,92 @@ use crate::memory_pool::human_readable_size;
 
 const DEFAULT_MAX_TEMP_DIRECTORY_SIZE: u64 = 100 * 1024 * 1024 * 1024; // 100GB
 
+/// Builder pattern for the [DiskManager] structure
+#[derive(Clone, Debug)]
+pub struct DiskManagerBuilder {
+/// The storage mode of the disk manager
+mode: DiskManagerMode,
+/// The maximum amount of data (in bytes) stored inside the temporary 
directories.
+/// Default to 100GB
+max_temp_directory_size: u64,
+}
+
+impl Default for DiskManagerBuilder {
+fn default() -> Self {
+Self {
+mode: DiskManagerMode::OsTmpDirectory,
+max_temp_directory_size: DEFAULT_MAX_TEMP_DIRECTORY_SIZE,
+}
+}
+}
+
+impl DiskManagerBuilder {
+pub fn set_mode(&mut self, mode: DiskManagerMode) {
+self.mode = mode;
+}
+
+pub fn with_mode(mut self, mode: DiskManagerMode) -> Self {
+self.set_mode(mode);
+self
+}
+
+pub fn set_max_temp_directory_size(&mut self, value: u64) {
+self.max_temp_directory_size = value;
+}
+
+pub fn with_max_temp_directory_size(mut self, value: u64) -> Self {
+self.set_max_temp_directory_size(value);
+self
+}
+
+/// Create a DiskManager given the builder
+pub fn build(self) -> Result {
+match self.mode {
+DiskManagerMode::OsTmpDirectory => Ok(DiskManager {
+local_dirs: Mutex::new(Some(vec![])),
+max_temp_directory_size: self.max_temp_directory_size,
+used_disk_space: Arc::new(AtomicU64::new(0)),
+}),
+DiskManagerMode::Directories(conf_dirs) => {
+let local_dirs = create_local_dirs(conf_dirs)?;
+debug!(
+"Created local dirs {local_dirs:?} as DataFusion working 
directory"
+);
+Ok(DiskManager {
+local_dirs: Mutex::new(Some(local_dirs)),
+max_temp_directory_size: self.max_temp_directory_size,
+used_disk_space: Arc::new(AtomicU64::new(0)),
+})
+}
+DiskManagerMode::Disabled => Ok(DiskManager {
+local_dirs: Mutex::new(None),
+max_temp_directory_size: self.max_temp_directory_size,
+used_disk_space: Arc::new(AtomicU64::new(0)),
+}),
+}
+}
+}
+
+#[derive(Clone, Debug)]
+pub enum DiskManagerMode {
+/// Create a new [DiskManager] that creates temporary files within
+/// a temporary directory chosen by the OS
+OsTmpDirectory,
+
+/// Create a new [DiskManager] that creates temporary files within
+/// the specified directories

Review Comment:
   addressed in 
https://github.com/apache/datafusion/pull/16191/commits/fa4552a010c71526115e5e08dd1165da3d400351



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: create builder for disk manager [datafusion]

2025-05-28 Thread via GitHub


zhuqi-lucas commented on code in PR #16191:
URL: https://github.com/apache/datafusion/pull/16191#discussion_r2111412349


##
datafusion/execution/src/disk_manager.rs:
##
@@ -32,7 +32,95 @@ use crate::memory_pool::human_readable_size;
 
 const DEFAULT_MAX_TEMP_DIRECTORY_SIZE: u64 = 100 * 1024 * 1024 * 1024; // 100GB
 
+/// Builder pattern for the [DiskManager] structure
+#[derive(Clone, Debug)]
+pub struct DiskManagerBuilder {
+/// The storage mode of the disk manager
+mode: DiskManagerMode,
+/// The maximum amount of data (in bytes) stored inside the temporary 
directories.
+/// Default to 100GB
+max_temp_directory_size: u64,
+}
+
+impl Default for DiskManagerBuilder {
+fn default() -> Self {
+Self {
+mode: DiskManagerMode::OsTmpDirectory,
+max_temp_directory_size: DEFAULT_MAX_TEMP_DIRECTORY_SIZE,
+}
+}
+}
+
+impl DiskManagerBuilder {
+pub fn set_mode(&mut self, mode: DiskManagerMode) {
+self.mode = mode;
+}
+
+pub fn with_mode(mut self, mode: DiskManagerMode) -> Self {
+self.set_mode(mode);
+self
+}
+
+pub fn set_max_temp_directory_size(&mut self, value: u64) {
+self.max_temp_directory_size = value;
+}
+
+pub fn with_max_temp_directory_size(mut self, value: u64) -> Self {
+self.set_max_temp_directory_size(value);
+self
+}
+
+/// Create a DiskManager given the builder
+pub fn build(self) -> Result {
+match self.mode {
+DiskManagerMode::OsTmpDirectory => Ok(DiskManager {
+local_dirs: Mutex::new(Some(vec![])),
+max_temp_directory_size: self.max_temp_directory_size,
+used_disk_space: Arc::new(AtomicU64::new(0)),
+}),
+DiskManagerMode::Directories(conf_dirs) => {
+let local_dirs = create_local_dirs(conf_dirs)?;
+debug!(
+"Created local dirs {local_dirs:?} as DataFusion working 
directory"
+);
+Ok(DiskManager {
+local_dirs: Mutex::new(Some(local_dirs)),

Review Comment:
   Thank you @jdrouet  for the work,  LGTM, and minor question, do we have each 
dir max limit config when we have multi dirs?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: create builder for disk manager [datafusion]

2025-05-28 Thread via GitHub


jdrouet commented on code in PR #16191:
URL: https://github.com/apache/datafusion/pull/16191#discussion_r2111307101


##
datafusion/execution/src/disk_manager.rs:
##
@@ -91,6 +177,11 @@ pub struct DiskManager {
 }
 
 impl DiskManager {
+/// Creates a builder for [DiskManager]
+pub fn builder() -> DiskManagerBuilder {
+DiskManagerBuilder::default()
+}
+
 /// Create a DiskManager given the configuration
 pub fn try_new(config: DiskManagerConfig) -> Result> {

Review Comment:
   Addressed in 
https://github.com/apache/datafusion/pull/16191/commits/9baa5c2e8cbd8dbdc903a258716034104ba23d33



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: create builder for disk manager [datafusion]

2025-05-27 Thread via GitHub


2010YOUY01 commented on code in PR #16191:
URL: https://github.com/apache/datafusion/pull/16191#discussion_r2110920091


##
datafusion/execution/src/disk_manager.rs:
##
@@ -32,6 +32,92 @@ use crate::memory_pool::human_readable_size;
 
 const DEFAULT_MAX_TEMP_DIRECTORY_SIZE: u64 = 100 * 1024 * 1024 * 1024; // 100GB
 
+/// Builder pattern for the [DiskManager] structure
+#[derive(Clone, Debug)]
+pub struct DiskManagerBuilder {
+/// The storage mode of the disk manager
+mode: DiskManagerMode,
+/// The maximum amount of data (in bytes) stored inside the temporary 
directories.
+/// Default to 100GB
+max_temp_directory_size: u64,
+}
+
+impl Default for DiskManagerBuilder {
+fn default() -> Self {
+Self {
+mode: DiskManagerMode::OsTmpDirectory,
+max_temp_directory_size: DEFAULT_MAX_TEMP_DIRECTORY_SIZE,
+}
+}
+}
+
+impl DiskManagerBuilder {
+pub fn set_mode(&mut self, mode: DiskManagerMode) {
+self.mode = mode;
+}
+
+pub fn with_mode(mut self, mode: DiskManagerMode) -> Self {
+self.set_mode(mode);
+self
+}
+
+pub fn set_max_temp_directory_size(&mut self, value: u64) {
+self.max_temp_directory_size = value;
+}
+
+pub fn with_max_temp_directory_size(mut self, value: u64) -> Self {
+self.set_max_temp_directory_size(value);
+self
+}
+
+/// Create a DiskManager given the builder
+pub fn build(self) -> Result {
+match self.mode {
+DiskManagerMode::OsTmpDirectory => Ok(DiskManager {
+local_dirs: Mutex::new(Some(vec![])),
+max_temp_directory_size: self.max_temp_directory_size,
+used_disk_space: Arc::new(AtomicU64::new(0)),
+}),
+DiskManagerMode::Directories(conf_dirs) => {
+let local_dirs = create_local_dirs(conf_dirs)?;
+debug!(
+"Created local dirs {local_dirs:?} as DataFusion working 
directory"
+);
+Ok(DiskManager {
+local_dirs: Mutex::new(Some(local_dirs)),
+max_temp_directory_size: self.max_temp_directory_size,
+used_disk_space: Arc::new(AtomicU64::new(0)),
+})
+}
+DiskManagerMode::Disabled => Ok(DiskManager {
+local_dirs: Mutex::new(None),
+max_temp_directory_size: self.max_temp_directory_size,
+used_disk_space: Arc::new(AtomicU64::new(0)),
+}),
+}
+}
+}
+
+#[derive(Clone, Debug)]
+pub enum DiskManagerMode {
+/// Create a new [DiskManager] that creates temporary files within
+/// a temporary directory chosen by the OS
+OsTmpDirectory,
+
+/// Create a new [DiskManager] that creates temporary files within
+/// the specified directories

Review Comment:
   ```suggestion
   /// the specified directories. One of the directories will be chosen
   /// at random for each temporary file created.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: create builder for disk manager [datafusion]

2025-05-27 Thread via GitHub


alamb commented on code in PR #16191:
URL: https://github.com/apache/datafusion/pull/16191#discussion_r2109919623


##
datafusion/execution/src/disk_manager.rs:
##
@@ -91,6 +177,11 @@ pub struct DiskManager {
 }
 
 impl DiskManager {
+/// Creates a builder for [DiskManager]
+pub fn builder() -> DiskManagerBuilder {
+DiskManagerBuilder::default()
+}
+
 /// Create a DiskManager given the configuration
 pub fn try_new(config: DiskManagerConfig) -> Result> {

Review Comment:
   What do you think about Deprecating `DiskManagerConfig` and 
`DiskManager::try_new` so we have a path to a single way to configure 
DiskManagers?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: create builder for disk manager [datafusion]

2025-05-26 Thread via GitHub


jdrouet commented on PR #16191:
URL: https://github.com/apache/datafusion/pull/16191#issuecomment-2909600032

   Correct me if I'm wrong, but the failing test doesn't seem related.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: create builder for disk manager [datafusion]

2025-05-07 Thread via GitHub


jdrouet closed pull request #15975: feat: create builder for disk manager
URL: https://github.com/apache/datafusion/pull/15975


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]