Thanks a lot. 2016-07-20 15:20 GMT+08:00 Zheng, Kai <kai.zh...@intel.com>:
> There was some related discussion about this in > https://issues.apache.org/jira/browse/HDFS-7343 where files and blocks > temperature will be measured across the cluster and used to determine when > and how to set storage polices accordingly. > > > > The effort was suspended somehow because the contributors are working on > HDFS erasure coding feature. It will be revived soon and feedback are > welcome! > > > > Regards, > > Kai > > > > *From:* kevin [mailto:kiss.kevin...@gmail.com] > *Sent:* Wednesday, July 20, 2016 2:48 PM > *To:* Rakesh Radhakrishnan <rake...@apache.org> > *Cc:* user.hadoop <user@hadoop.apache.org> > *Subject:* Re: About Archival Storage > > > > Ok,thanks.It means that I need to decide which data is hot and which time > it is cold, then change it storage policy and tell 'Mover tool' to move > it. > > > > 2016-07-20 14:29 GMT+08:00 Rakesh Radhakrishnan <rake...@apache.org>: > > Based on storage policy the data from hot storage will be moved to cold > storage. The storage policy defines the number of replicas to be located on > each storage type. It is possible to change the storage policy on a > directory(for example: HOT to COLD) and then invoke 'Mover tool' on that > directory to make the policy effective. One can set/change the storage > policy via HDFSCommand, "hdfs storagepolicies -setStoragePolicy -path > <path> -policy <policy>". After setting the new policy, you need to run the > tool, then it identifies the replicas to be moved based on the storage > policy information, and schedules the movement between source and > destination data nodes to satisfy the policy. Internally, the tool is > comparing the 'storage type' of a block in order to fulfill the 'storage > policy' requirement. > > > > Probably you can refer > https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html > to know more about storage types, storage policies and hdfs commands. Hope > this helps. > > > > Rakesh > > > > On Wed, Jul 20, 2016 at 10:30 AM, kevin <kiss.kevin...@gmail.com> wrote: > > Thanks again. "automatically" what I mean is the hdfs mover knows the hot > data have come to cold , I don't need to tell it what exactly files/dirs > need to be move now ? > > Of course I should tell it what files/dirs need to monitoring. > > > > 2016-07-20 12:35 GMT+08:00 Rakesh Radhakrishnan <rake...@apache.org>: > > >>>I have another question is , hdfs mover (A New Data Migration Tool ) > know when to move data from hot to cold automatically ? > > While running the tool, it reads the argument and get the separated list > of hdfs files/dirs to migrate. Then it periodically scans these files in > HDFS to check if the block placement satisfies the storage policy, if not > satisfied it moves the replicas to a different storage type in order to > fulfill the storage policy requirement. This cycle continues until it hits > an error or no blocks to move etc. Could you please tell me, what do you > meant by "automatically" ? FYI, HDFS-10285 is proposing an idea to > introduce a daemon thread in Namenode to track the storage movements set by > APIs from clients. This Daemon thread named as StoragePolicySatisfier(SPS) > serves something similar to ReplicationMonitor. If interested you can read > the https://goo.gl/NA5EY0 proposal/idea and welcome feedback. > > > > Sleep time between each cycle is, ('dfs.heartbeat.interval' * 2000) + > ('dfs.namenode.replication.interval' * 1000) milliseconds; > > > > >>>It use algorithm like LRU、LFU ? > > It will simply iterating over the lists in the order of files/dirs given > to this tool as an argument. afaik, its just maintains the order mentioned > by the user. > > > > Regards, > > Rakesh > > > > > > On Wed, Jul 20, 2016 at 7:05 AM, kevin <kiss.kevin...@gmail.com> wrote: > > Thanks a lot Rakesh. > > > > I have another question is , hdfs mover (A New Data Migration Tool ) know > when to move data from hot to cold automatically ? It use algorithm > like LRU、LFU ? > > > > 2016-07-19 19:55 GMT+08:00 Rakesh Radhakrishnan <rake...@apache.org>: > > >>>>Is that mean I should config dfs.replication with 1 ? if more than > one I should not use *Lazy_Persist* policies ? > > > > The idea of Lazy_Persist policy is, while writing blocks, one replica will > be placed in memory first and then it is lazily persisted into DISK. It > doesn't means that, you are not allowed to configure dfs.replication > 1. > If 'dfs.replication' is configured > 1 then the first replica will be > placed in RAM_DISK and all the other replicas (n-1) will be written to the > DISK. Here the (n-1) replicas will have the overhead of pipeline > replication over the network and the DISK write latency on the write hot > path. So you will not get better performance results. > > > > IIUC, for getting memory latency benefits, it is recommended to use > replication=1. In this way, applications should be able to perform single > replica writes to a local DN with low latency. HDFS will store block data > in memory and lazily save it to disk avoiding incurring disk write latency > on the hot path. By writing to local memory we can also avoid checksum > computation on the hot path. > > > > Regards, > > Rakesh > > > > On Tue, Jul 19, 2016 at 3:25 PM, kevin <kiss.kevin...@gmail.com> wrote: > > I don't quite understand :"Note that the Lazy_Persist policy is useful > only for single replica blocks. For blocks with more than one replicas, all > the replicas will be written to DISK since writing only one of the replicas > to RAM_DISK does not improve the overall performance." > > > > Is that mean I should config dfs.replication with 1 ? if more than one I > should not use *Lazy_Persist* policies ? > > > > > > > > > > > > >