Thanks Appreciated!

-S

From: Christopher <[email protected]>
Sent: Friday, September 10, 2021 9:15 AM
To: accumulo-user <[email protected]>
Subject: [External] Re: [EXTERNAL EMAIL] - Re: accumulo and hdfs data locality

One correction to what Mike said. The last location column doesn't store where 
it was last hosted. The current location column does that. Rather, the last 
location column stores where it was hosted when it last wrote to HDFS. The goal 
is what Mike said: it tries to provide a mechanism for preserving locality 
during reboots. However, it may not work very well, especially since the last 
written file may be very small, and only a tiny fraction of the tablet's 
overall data.

On Fri, Sep 10, 2021, 09:03 Michael Wall 
<[email protected]<mailto:[email protected]>> wrote:
If a tablet moves, the data files in HDFS do not go with it.  However, during 
the next compaction one copy of the rfile should be written locally.

Note, the metadata has a last column for each tablet, to record where the table 
was last hosted.  On startup, Accumulo will try to assign a tablet to that last 
location if possible to hopefully take advantage of the locality.

To really take advantage of the data locality, you should configure Short 
Circuit reads in HDFS. See 
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html<https://urldefense.com/v3/__https:/hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html__;!!May37g!b9W1w_MMGpmVMieyumJWrQ8LWlc7ppoMsg88Co1bzYVvmOJbKue0TwatGzyxzq_1CQ$>

On Fri, Sep 10, 2021 at 8:47 AM Shailesh Ligade 
<[email protected]<mailto:[email protected]>> wrote:
Thank you,

Is there way to maintain that data locality, I mean over time with table 
splitting, hdfs rebalancing etc we may not have data locality…

Thanks again

-S

From: Christopher <[email protected]<mailto:[email protected]>>
Sent: Friday, September 10, 2021 8:40 AM
To: accumulo-user <[email protected]<mailto:[email protected]>>
Subject: [EXTERNAL EMAIL] - Re: accumulo and hdfs data locality

Data locality and simplified deployments are the only reasons I can think of. 
Accumulo doesn't do anything particularly special for data locality, but 
typically, an HDFS client (like Accumulo servers) will (or can be configured 
to) write one copy of any new blocks locally, which should permit efficient 
reads later. This works well with Accumulo's hosting behavior, where each 
tablet is hosted on a single server solely responsible for its reads and writes.

On Fri, Sep 10, 2021, 07:22 Shailesh Ligade 
<[email protected]<mailto:[email protected]>> wrote:
Hello I am suing Hadoop 3.3 and accumulo 1.10. Does accumulo take advantage of 
Hadoop data locality? What are the other benefits of having tserver and 
datanode process on the same instance?

-S


Reply via email to