Hi Josh, Thanks for your response.
Can you throw some light on how to tweak the concurrency on the Yarn side. is it accomplished by some settings in yarn-site.xml or mapred-site.xml? I was also not able to identify any property that can be passed to the HBase Export tool to increase Mapper per region. Any suggestions on this would also be very helpful. Here is a portion of the logs from HBase Export tool. Perhaps you see something in it which needs to be fixed. ---------------------------- 021-10-04 04:22:03,822 INFO [main] mapreduce.RegionSizeCalculator: Calculating region sizes for table "tsdb". 2021-10-04 04:22:04,410 INFO [main] zookeeper.ReadOnlyZKClient: Close zookeeper connection 0x0b25b095 to altiplano-zookeeper:2181 2021-10-04 04:22:04,413 INFO [ReadOnlyZKClient-altiplano-zookeeper:2181@0x0b25b095] zookeeper.ZooKeeper: Session: 0x100000683990011 closed 2021-10-04 04:22:04,413 INFO [ReadOnlyZKClient-altiplano-zookeeper:2181@0x0b25b095-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x100000683990011 2021-10-04 04:22:04,463 INFO [main] mapreduce.JobSubmitter: number of splits:2 2021-10-04 04:22:04,471 INFO [main] Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 2021-10-04 04:22:04,620 INFO [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1633098453918_0001 2021-10-04 04:22:05,399 INFO [main] impl.YarnClientImpl: Submitted application application_1633098453918_0001 2021-10-04 04:22:05,427 INFO [main] mapreduce.Job: The url to track the job: http://k8s-infra-altiplano-hdfs-yarn-rm:8088/proxy/application_1633098453918_0001/ 2021-10-04 04:22:05,427 INFO [main] mapreduce.Job: Running job: job_1633098453918_0001 2021-10-04 04:22:12,535 INFO [main] mapreduce.Job: Job job_1633098453918_0001 running in uber mode : false 2021-10-04 04:22:12,536 INFO [main] mapreduce.Job: map 0% reduce 0% 2021-10-04 04:35:42,713 INFO [main] mapreduce.Job: map 50% reduce 0% 2021-10-04 04:46:26,053 INFO [main] mapreduce.Job: map 100% reduce 0% 2021-10-04 04:46:27,103 INFO [main] mapreduce.Job: Job job_1633098453918_0001 completed successfully 2021-10-04 04:46:27,216 INFO [main] mapreduce.Job: slots (ms)=0 Total time spent by all map tasks (ms)=2900104 Total vcore-milliseconds taken by all map tasks=2900104 Total megabyte-milliseconds taken by all map tasks=2969706496 Map-Reduce Framework Map input records=149787553 Map output records=149787553 Input split bytes=454 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=10706 CPU time spent (ms)=999190 Physical memory (bytes) snapshot=538783744 Virtual memory (bytes) snapshot=4150628352 Total committed heap usage (bytes)=134217728 HBase Counters BYTES_IN_REMOTE_RESULTS=13097220466 BYTES_IN_RESULTS=13097220466 MILLIS_BETWEEN_NEXTS=2250916 NOT_SERVING_REGION_EXCEPTION=0 NUM_SCANNER_RESTARTS=0 NUM_SCAN_RESULTS_STALE=0 REGIONS_SCANNED=2 REMOTE_RPC_CALLS=1497876 REMOTE_RPC_RETRIES=0 ROWS_FILTERED=0 ROWS_SCANNED=149787553 RPC_CALLS=1497876 RPC_RETRIES=0 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=17139600166 ------------------------------ Thanks Jacob Mathews On 2021/08/19 18:34:31, Josh Elser <[email protected]> wrote: > Export is a MapReduce job, and HBase will only configure a maximum of > one Mapper per Region in the table being scanned. > > If you have multiple regions for your tsdb table, then it's possible > that you need to tweak the concurrency on the YARN side such that you > have multiple Mappers running in parallel? > > Sounds like looking at the YARN Application log and UI is your next best > bet. > > On 8/18/21 4:52 AM, Nguyen, Tai Van (EXT - VN) wrote: > > Hi HBase Team > > > > Image can see here : > > > > * Export with single regionserver: https://imgur.com/86wSUMV > > <https://imgur.com/86wSUMV> > > * Export with two regionservers: https://imgur.com/a/XMovlZx > > <https://imgur.com/a/XMovlZx> > > > > Log show about time was: > > > > root@solaltiplano-track4-master:~/hbase-exporting/latest# cat > > hbase_export_compress_default.log | grep export > > Starting hbase export at Fri Jun 11 12:22:46 UTC 2021 > > tsdb table exported in 6279 seconds > > tsdb-meta table exported in 6 seconds > > tsdb-tree table exported in 7 seconds > > tsdb-uid table exported in 90 seconds > > Ending hbase export at Fri Jun 11 14:09:08 UTC 2021 > > > > > > * > > > > > > Thanks, > > Tai > > > > > > ------------------------------------------------------------------------ > > *From:* Mathews, Jacob 1. (Nokia - IN/Bangalore) <[email protected]> > > *Sent:* Monday, August 16, 2021 6:47 PM > > *To:* Nguyen, Tai Van (EXT - VN) <[email protected]> > > *Subject:* FW: Hbase export is very slow - help needed > > > > *From:*Mathews, Jacob 1. (Nokia - IN/Bangalore) > > *Sent:* Friday, August 6, 2021 12:38 PM > > *To:* [email protected] > > *Subject:* Hbase export is very slow - help needed > > > > Hi HBase team, > > > > We are trying to use Hbase export mentioned here: > > http://hbase.apache.org/book.html#export > > <http://hbase.apache.org/book.html#export> > > > > But it is happening sequentially row by row as seen from the logs. > > > > we tried many options of the Hbase export, but all were taking long time. > > > > Backup folder contents size: > > > > bash-4.2$ du -kh > > > > 16K ./tsdb-tree > > > > 16K ./tsdb-meta > > > > 60M ./tsdb-uid > > > > 5.9G ./tsdb > > > > 6.0G . > > > > took around 104 minutes for 6gb compressed data. > > > > Is there a way we can parallelise this and improve the export time. > > > > Below are the charts from Hbase . > > > > Export with single regionserver: > > > > Export with two regionservers: > > > > Scaling the HBase Region server also did not help, the export still > > happens sequentially. > > > > Thanks > > > > Jacob Mathews > > >
