[jira] [Updated] (HBASE-29295) Optimize in-memory representation of mapreduce's TableSnapshotInputFormat's split objects

2025-05-08 Thread Ujjawal Kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-29295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ujjawal Kumar updated HBASE-29295:
--
Description: 
Similar to HBASE-24859, we have seen that while performing reads via snapshot 
in an MR job the memory consumption increases a lot on the client side.

It happens due to the same reason mentioned in the HBASE-24859, scan is 
embedded within TableSnapshotInputFormat's split object and can explode 
client's memory usage for tables with large no of regions. 

The solution would be same as the other Jira - Don't store the scan object in 
the split, instead read it via the conf while initializing the record reader 

  was:
Similar to HBASE-24859, we have seen that while performing reads via snapshot 
in an MR job the memory consumption increases a lot on the client side.

It happens due to the same reason mentioned in the HBASE-24859, scan is 
embedded within TableSnapshotInputFormat's split object and can explode 
client's memory usage for tables with large no of reasons. 

The solution would be same as the other Jira - Don't store the scan object in 
the split, instead read it via the conf while initializing the record reader 


> Optimize in-memory representation of mapreduce's TableSnapshotInputFormat's 
> split objects
> -
>
> Key: HBASE-29295
> URL: https://issues.apache.org/jira/browse/HBASE-29295
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce, snapshots
>Affects Versions: 2.5.10
>Reporter: Ujjawal Kumar
>Assignee: Ujjawal Kumar
>Priority: Major
> Attachments: Screenshot 2025-05-08 at 6.26.15 PM.png
>
>
> Similar to HBASE-24859, we have seen that while performing reads via snapshot 
> in an MR job the memory consumption increases a lot on the client side.
> It happens due to the same reason mentioned in the HBASE-24859, scan is 
> embedded within TableSnapshotInputFormat's split object and can explode 
> client's memory usage for tables with large no of regions. 
> The solution would be same as the other Jira - Don't store the scan object in 
> the split, instead read it via the conf while initializing the record reader 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-29295) Optimize in-memory representation of mapreduce's TableSnapshotInputFormat's split objects

2025-05-08 Thread Ujjawal Kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-29295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ujjawal Kumar updated HBASE-29295:
--
Affects Version/s: 2.5.10

> Optimize in-memory representation of mapreduce's TableSnapshotInputFormat's 
> split objects
> -
>
> Key: HBASE-29295
> URL: https://issues.apache.org/jira/browse/HBASE-29295
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce, snapshots
>Affects Versions: 2.5.10
>Reporter: Ujjawal Kumar
>Assignee: Ujjawal Kumar
>Priority: Major
> Attachments: Screenshot 2025-05-08 at 6.26.15 PM.png
>
>
> Similar to HBASE-24859, we have seen that while performing reads via snapshot 
> in an MR job the memory consumption increases a lot on the client side.
> It happens due to the same reason mentioned in the HBASE-24859, scan is 
> embedded within TableSnapshotInputFormat's split object and can explode 
> client's memory usage for tables with large no of reasons. 
> The solution would be same as the other Jira - Don't store the scan object in 
> the split, instead read it via the conf while initializing the record reader 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-29295) Optimize in-memory representation of mapreduce's TableSnapshotInputFormat's split objects

2025-05-08 Thread Ujjawal Kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-29295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ujjawal Kumar updated HBASE-29295:
--
Attachment: Screenshot 2025-05-08 at 6.26.15 PM.png

> Optimize in-memory representation of mapreduce's TableSnapshotInputFormat's 
> split objects
> -
>
> Key: HBASE-29295
> URL: https://issues.apache.org/jira/browse/HBASE-29295
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce, snapshots
>Reporter: Ujjawal Kumar
>Priority: Major
> Attachments: Screenshot 2025-05-08 at 6.26.15 PM.png
>
>
> Similar to HBASE-24859, we have seen that while performing reads via snapshot 
> in an MR job the memory consumption increases a lot on the client side.
> It happens due to the same reason mentioned in the HBASE-24859, scan is 
> embedded within TableSnapshotInputFormat's split object and can explode 
> client's memory usage for tables with large no of reasons. 
> The solution would be same as the other Jira - Don't store the scan object in 
> the split, instead read it via the conf while initializing the record reader 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)