Re: About Archival Storage

kevin Wed, 20 Jul 2016 00:26:32 -0700

Thanks a lot.

2016-07-20 15:20 GMT+08:00 Zheng, Kai <kai.zh...@intel.com>:


> There was some related discussion about this in
> https://issues.apache.org/jira/browse/HDFS-7343 where files and blocks
> temperature will be measured across the cluster and used to determine when
> and how to set storage polices accordingly.
>
>
>
> The effort was suspended somehow because the contributors are working on
> HDFS erasure coding feature. It will be revived soon and feedback are
> welcome!
>
>
>
> Regards,
>
> Kai
>
>
>
> *From:* kevin [mailto:kiss.kevin...@gmail.com]
> *Sent:* Wednesday, July 20, 2016 2:48 PM
> *To:* Rakesh Radhakrishnan <rake...@apache.org>
> *Cc:* user.hadoop <user@hadoop.apache.org>
> *Subject:* Re: About Archival Storage
>
>
>
> Ok,thanks.It means that I need to decide which data is hot and which time
> it is cold, then change it storage policy and tell  'Mover tool'  to move
> it.
>
>
>
> 2016-07-20 14:29 GMT+08:00 Rakesh Radhakrishnan <rake...@apache.org>:
>
> Based on storage policy the data from hot storage will be moved to cold
> storage. The storage policy defines the number of replicas to be located on
> each storage type. It is possible to change the storage policy on a
> directory(for example: HOT to COLD) and then invoke 'Mover tool' on that
> directory to make the policy effective. One can set/change the storage
> policy via HDFSCommand, "hdfs storagepolicies -setStoragePolicy -path
> <path> -policy <policy>". After setting the new policy, you need to run the
> tool, then it identifies the replicas to be moved based on the storage
> policy information, and schedules the movement between source and
> destination data nodes to satisfy the policy. Internally, the tool is
> comparing the 'storage type' of a block in order to fulfill the 'storage
> policy' requirement.
>
>
>
> Probably you can refer
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html
> to know more about storage types, storage policies and hdfs commands. Hope
> this helps.
>
>
>
> Rakesh
>
>
>
> On Wed, Jul 20, 2016 at 10:30 AM, kevin <kiss.kevin...@gmail.com> wrote:
>
> Thanks again. "automatically" what I mean is the hdfs mover knows the hot
> data have come to cold , I don't need to tell it what exactly files/dirs
> need to be move now ?
>
> Of course I should tell it what files/dirs need to monitoring.
>
>
>
> 2016-07-20 12:35 GMT+08:00 Rakesh Radhakrishnan <rake...@apache.org>:
>
> >>>I have another question is , hdfs mover (A New Data Migration Tool )
> know when to move data from hot to cold  automatically ?
>
> While running the tool, it reads the argument and get the separated list
> of hdfs files/dirs to migrate. Then it periodically scans these files in
> HDFS to check if the block placement satisfies the storage policy, if not
> satisfied it moves the replicas to a different storage type in order to
> fulfill the storage policy requirement. This cycle continues until it hits
> an error or no blocks to move etc. Could you please tell me, what do you
> meant by "automatically" ? FYI, HDFS-10285 is proposing an idea to
> introduce a daemon thread in Namenode to track the storage movements set by
> APIs from clients. This Daemon thread named as StoragePolicySatisfier(SPS)
> serves something similar to ReplicationMonitor. If interested you can read
> the https://goo.gl/NA5EY0 proposal/idea and welcome feedback.
>
>
>
> Sleep time between each cycle is, ('dfs.heartbeat.interval' * 2000) +
> ('dfs.namenode.replication.interval' * 1000) milliseconds;
>
>
>
> >>>It use algorithm like LRU、LFU ?
>
> It will simply iterating over the lists in the order of files/dirs given
> to this tool as an argument. afaik, its just maintains the order mentioned
> by the user.
>
>
>
> Regards,
>
> Rakesh
>
>
>
>
>
> On Wed, Jul 20, 2016 at 7:05 AM, kevin <kiss.kevin...@gmail.com> wrote:
>
> Thanks a lot Rakesh.
>
>
>
> I have another question is , hdfs mover (A New Data Migration Tool ) know
> when to move data from hot to cold  automatically ? It use algorithm
> like LRU、LFU ?
>
>
>
> 2016-07-19 19:55 GMT+08:00 Rakesh Radhakrishnan <rake...@apache.org>:
>
> >>>>Is that mean I should config dfs.replication with 1 ?  if more than
> one I should not use *Lazy_Persist*  policies ?
>
>
>
> The idea of Lazy_Persist policy is, while writing blocks, one replica will
> be placed in memory first and then it is lazily persisted into DISK. It
> doesn't means that, you are not allowed to configure dfs.replication > 1.
> If 'dfs.replication' is configured > 1 then the first replica will be
> placed in RAM_DISK and all the other replicas (n-1) will be written to the
> DISK. Here the (n-1) replicas will have the overhead of pipeline
> replication over the network and the DISK write latency on the write hot
> path. So you will not get better performance results.
>
>
>
> IIUC, for getting memory latency benefits, it is recommended to use
> replication=1. In this way, applications should be able to perform single
> replica writes to a local DN with low latency. HDFS will store block data
> in memory and lazily save it to disk avoiding incurring disk write latency
> on the hot path. By writing to local memory we can also avoid checksum
> computation on the hot path.
>
>
>
> Regards,
>
> Rakesh
>
>
>
> On Tue, Jul 19, 2016 at 3:25 PM, kevin <kiss.kevin...@gmail.com> wrote:
>
> I don't quite understand :"Note that the Lazy_Persist policy is useful
> only for single replica blocks. For blocks with more than one replicas, all
> the replicas will be written to DISK since writing only one of the replicas
> to RAM_DISK does not improve the overall performance."
>
>
>
> Is that mean I should config dfs.replication with 1 ?  if more than one I
> should not use *Lazy_Persist*  policies ?
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: About Archival Storage

Reply via email to