Re: Hbase export is very slow - help needed

Jacob Mathews Wed, 06 Oct 2021 08:35:24 -0700

Hi Josh,

Thanks for your response.


Can you throw some light on how to tweak the concurrency on the Yarn side. is 
it accomplished by some settings in yarn-site.xml or mapred-site.xml?

I was also not able to identify any property that can be passed to the HBase 
Export tool to increase Mapper per region. Any suggestions on this would also 
be very helpful.

Here is a portion of the logs from HBase Export tool. Perhaps you see something 
in it which needs to be fixed.

----------------------------
021-10-04 04:22:03,822 INFO  [main] mapreduce.RegionSizeCalculator: Calculating 
region sizes for table "tsdb".
2021-10-04 04:22:04,410 INFO  [main] zookeeper.ReadOnlyZKClient: Close 
zookeeper connection 0x0b25b095 to altiplano-zookeeper:2181
2021-10-04 04:22:04,413 INFO  
[ReadOnlyZKClient-altiplano-zookeeper:2181@0x0b25b095] zookeeper.ZooKeeper: 
Session: 0x100000683990011 closed
2021-10-04 04:22:04,413 INFO  
[ReadOnlyZKClient-altiplano-zookeeper:2181@0x0b25b095-EventThread] 
zookeeper.ClientCnxn: EventThread shut down for session: 0x100000683990011
2021-10-04 04:22:04,463 INFO  [main] mapreduce.JobSubmitter: number of splits:2
2021-10-04 04:22:04,471 INFO  [main] Configuration.deprecation: 
io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2021-10-04 04:22:04,620 INFO  [main] mapreduce.JobSubmitter: Submitting tokens 
for job: job_1633098453918_0001
2021-10-04 04:22:05,399 INFO  [main] impl.YarnClientImpl: Submitted application 
application_1633098453918_0001
2021-10-04 04:22:05,427 INFO  [main] mapreduce.Job: The url to track the job: 
http://k8s-infra-altiplano-hdfs-yarn-rm:8088/proxy/application_1633098453918_0001/
2021-10-04 04:22:05,427 INFO  [main] mapreduce.Job: Running job: 
job_1633098453918_0001
2021-10-04 04:22:12,535 INFO  [main] mapreduce.Job: Job job_1633098453918_0001 
running in uber mode : false
2021-10-04 04:22:12,536 INFO  [main] mapreduce.Job:  map 0% reduce 0%
2021-10-04 04:35:42,713 INFO  [main] mapreduce.Job:  map 50% reduce 0%
2021-10-04 04:46:26,053 INFO  [main] mapreduce.Job:  map 100% reduce 0%
2021-10-04 04:46:27,103 INFO  [main] mapreduce.Job: Job job_1633098453918_0001 
completed successfully
2021-10-04 04:46:27,216 INFO  [main] mapreduce.Job: slots (ms)=0
                Total time spent by all map tasks (ms)=2900104
                Total vcore-milliseconds taken by all map tasks=2900104
                Total megabyte-milliseconds taken by all map tasks=2969706496
        Map-Reduce Framework
                Map input records=149787553
                Map output records=149787553
                Input split bytes=454
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=10706
                CPU time spent (ms)=999190
                Physical memory (bytes) snapshot=538783744
                Virtual memory (bytes) snapshot=4150628352
                Total committed heap usage (bytes)=134217728
        HBase Counters
                BYTES_IN_REMOTE_RESULTS=13097220466
                BYTES_IN_RESULTS=13097220466
                MILLIS_BETWEEN_NEXTS=2250916
                NOT_SERVING_REGION_EXCEPTION=0
                NUM_SCANNER_RESTARTS=0
                NUM_SCAN_RESULTS_STALE=0
                REGIONS_SCANNED=2
                REMOTE_RPC_CALLS=1497876
                REMOTE_RPC_RETRIES=0
                ROWS_FILTERED=0
                ROWS_SCANNED=149787553
                RPC_CALLS=1497876
                RPC_RETRIES=0
        File Input Format Counters 
                Bytes Read=0
        File Output Format Counters 
                Bytes Written=17139600166

------------------------------

Thanks
Jacob Mathews

On 2021/08/19 18:34:31, Josh Elser <[email protected]> wrote: 
> Export is a MapReduce job, and HBase will only configure a maximum of 
> one Mapper per Region in the table being scanned.
> 
> If you have multiple regions for your tsdb table, then it's possible 
> that you need to tweak the concurrency on the YARN side such that you 
> have multiple Mappers running in parallel?
> 
> Sounds like looking at the YARN Application log and UI is your next best 
> bet.
> 
> On 8/18/21 4:52 AM, Nguyen, Tai Van (EXT - VN) wrote:
> > Hi HBase Team
> > 
> > Image can see here :
> > 
> >   * Export with single regionserver: https://imgur.com/86wSUMV
> >     <https://imgur.com/86wSUMV>
> >   * Export with two regionservers: https://imgur.com/a/XMovlZx
> >     <https://imgur.com/a/XMovlZx>
> > 
> > Log show about time was:
> > 
> >     root@solaltiplano-track4-master:~/hbase-exporting/latest# cat 
> > hbase_export_compress_default.log | grep export
> >     Starting hbase export at Fri Jun 11 12:22:46 UTC 2021
> >     tsdb table exported in  6279 seconds
> >     tsdb-meta table exported in  6 seconds
> >     tsdb-tree table exported in  7 seconds
> >     tsdb-uid table exported in  90 seconds
> >     Ending hbase export at Fri Jun 11 14:09:08 UTC 2021
> > 
> > 
> >   *
> > 
> > 
> > Thanks,
> > Tai
> > 
> > 
> > ------------------------------------------------------------------------
> > *From:* Mathews, Jacob 1. (Nokia - IN/Bangalore) <[email protected]>
> > *Sent:* Monday, August 16, 2021 6:47 PM
> > *To:* Nguyen, Tai Van (EXT - VN) <[email protected]>
> > *Subject:* FW: Hbase export is very slow - help needed
> > 
> > *From:*Mathews, Jacob 1. (Nokia - IN/Bangalore)
> > *Sent:* Friday, August 6, 2021 12:38 PM
> > *To:* [email protected]
> > *Subject:* Hbase export is very slow - help needed
> > 
> > Hi HBase team,
> > 
> > We are trying to use Hbase export mentioned here: 
> > http://hbase.apache.org/book.html#export 
> > <http://hbase.apache.org/book.html#export>
> > 
> > But it is happening sequentially row by row as seen from the logs.
> > 
> > we tried many options of the Hbase export, but all were taking long time.
> > 
> > Backup folder contents size:
> > 
> > bash-4.2$ du -kh
> > 
> > 16K         ./tsdb-tree
> > 
> > 16K         ./tsdb-meta
> > 
> > 60M       ./tsdb-uid
> > 
> > 5.9G       ./tsdb
> > 
> > 6.0G       .
> > 
> > took around 104 minutes for 6gb compressed data.
> > 
> > Is there a way we can parallelise this and improve the export time.
> > 
> > Below are the charts from Hbase .
> > 
> > Export with single regionserver:
> > 
> > Export with two regionservers:
> > 
> > Scaling the HBase Region server also did not help, the export still 
> > happens sequentially.
> > 
> > Thanks
> > 
> > Jacob Mathews
> > 
>

Re: Hbase export is very slow - help needed

Reply via email to