[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844930#comment-13844930 ]
Vinod Kumar Vavilapalli commented on YARN-1492: ----------------------------------------------- Technical issue. This should be a YARN JIRA. As YARN handles distributed cache, it makes sense to have this discussion here. I don't follow the common lists much and I almost missed this (it's possible others too missed it because of that). If/when we create a branch, let's create it with a YARN JIRA number. I just moved the JIRA to YARN. Let me know if you disagree. > truly shared cache for jars (jobjar/libjar) > ------------------------------------------- > > Key: YARN-1492 > URL: https://issues.apache.org/jira/browse/YARN-1492 > Project: Hadoop YARN > Issue Type: New Feature > Affects Versions: 2.0.4-alpha > Reporter: Sangjin Lee > Assignee: Sangjin Lee > Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf, > shared_cache_design_v3.pdf, shared_cache_design_v4.pdf > > > Currently there is the distributed cache that enables you to cache jars and > files so that attempts from the same job can reuse them. However, sharing is > limited with the distributed cache because it is normally on a per-job basis. > On a large cluster, sometimes copying of jobjars and libjars becomes so > prevalent that it consumes a large portion of the network bandwidth, not to > speak of defeating the purpose of "bringing compute to where data is". This > is wasteful because in most cases code doesn't change much across many jobs. > I'd like to propose and discuss feasibility of introducing a truly shared > cache so that multiple jobs from multiple users can share and cache jars. > This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.1.4#6159)