[ 
https://issues.apache.org/jira/browse/GOBBLIN-109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Tiwari updated GOBBLIN-109:
------------------------------------
    Fix Version/s: 0.12

> Remove need for current.jst
> ---------------------------
>
>                 Key: GOBBLIN-109
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-109
>             Project: Apache Gobblin
>          Issue Type: Task
>            Reporter: Joel Baranick
>              Labels: Framework:StateManagement, enhancement
>             Fix For: 0.12
>
>
> Fix for #882
>  
> *Github Url* : https://github.com/linkedin/gobblin/pull/965 
> *Github Reporter* : [~jbaranick] 
> *Github Assignee* : [~jbaranick] 
> *Github Created At* : 2016-05-05T22:04:54Z 
> *Github Updated At* : 2017-04-22T18:44:42Z 
> h3. Comments 
> ----
> [~jbaranick] wrote on 2016-05-05T22:06:22Z : @sahilTakiar @zliu41: Can you 
> review this?
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-217293611 
> ----
> *coveralls* wrote on 2016-05-05T22:21:08Z : [![Coverage 
> Status](https://coveralls.io/builds/6068577/badge)](https://coveralls.io/builds/6068577)
> Coverage increased (+0.5%) to 45.026% when pulling 
> **193dddb831475f931999a9aca54c5c00e2d082d3 on kadaan:FixFor882** into 
> **41963701538ae90ed8042c8d34a2ed7211a9af42 on linkedin:master**.
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-217296762 
> ----
> *zliu41* wrote on 2016-05-06T16:07:44Z : @kadaan could you please give a 
> brief description of your approach? It seems you are still using 
> `current.jst`, which is a different approach than #882.
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-217485933 
> ----
> [~jbaranick] wrote on 2016-05-06T16:35:48Z : `current.jst` is not used.  
> There is a compromise here so that users of the API aren't broken.  New 
> callers can call `getCurrent` or `getAllCurrent` to get the latest state.  If 
> they want a specific state they can continue to call `get` or `getAll`.  If 
> `current` or `current.jst` is requested when calling `get` or `getAll` it 
> will  return the latest state just like `getCurrent` and `getAllCurrent`.  A 
> precondition will ensure that users of the API are not able to write a file 
> named `current` or `current.jst`.
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-217492590 
> ----
> *coveralls* wrote on 2016-05-06T16:57:15Z : [![Coverage 
> Status](https://coveralls.io/builds/6078675/badge)](https://coveralls.io/builds/6078675)
> Coverage increased (+0.1%) to 45.096% when pulling 
> **e5f0498095f1f6bcbf25e3ed0316ffd772275d73 on kadaan:FixFor882** into 
> **588d8c77fe3c84c752fd410f916868419c178465 on linkedin:master**.
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-217497736 
> ----
> *coveralls* wrote on 2016-05-12T07:47:06Z : [![Coverage 
> Status](https://coveralls.io/builds/6149251/badge)](https://coveralls.io/builds/6149251)
> Coverage increased (+0.1%) to 46.765% when pulling 
> **67e66222cd441d903b4197d380df8041cce2cc9d on kadaan:FixFor882** into 
> **5cd9d969f73456e46847c9d9e7ef33ad5376617c on linkedin:master**.
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-218684160 
> ----
> [~jbaranick] wrote on 2016-05-12T20:52:47Z : @zliu41 @sahilTakiar Can you 
> guys finish this review?
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-218882376 
> ----
> [~jbaranick] wrote on 2016-06-01T15:05:13Z : @zliu41 @sahilTakiar Can you 
> guys finish this review?
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-223022088 
> ----
> *coveralls* wrote on 2016-06-01T15:33:05Z : [![Coverage 
> Status](https://coveralls.io/builds/6419146/badge)](https://coveralls.io/builds/6419146)
> Coverage increased (+0.07%) to 46.308% when pulling 
> **668262a91516d9919a1cd30c141b058514890c8e on kadaan:FixFor882** into 
> **fe7dc7c35eebc3a4faee9987ecccaae3511118c5 on linkedin:master**.
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-223031389 
> ----
> *zliu41* wrote on 2016-06-01T19:09:53Z : @pcadabam @ibuenros @ydai1124 can 
> you review this PR? Thanks
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-223094308 
> ----
> *coveralls* wrote on 2016-06-01T20:12:06Z : [![Coverage 
> Status](https://coveralls.io/builds/6423539/badge)](https://coveralls.io/builds/6423539)
> Coverage increased (+0.2%) to 46.398% when pulling 
> **668262a91516d9919a1cd30c141b058514890c8e on kadaan:FixFor882** into 
> **fe7dc7c35eebc3a4faee9987ecccaae3511118c5 on linkedin:master**.
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-223110224 
> ----
> [~jbaranick] wrote on 2016-06-21T20:49:48Z : @pcadabam @ibuenros @ydai1124 
> Were any of you able to review this?
>  
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-227567417 
> ----
> [~jbaranick] wrote on 2017-01-17T16:37:42Z : @abti @pcadabam @ibuenros 
> @ydai1124 Can any of you check this out now? 
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-273222674 
> ----
> [~jbaranick] wrote on 2017-01-24T17:45:02Z : @ydai1124 Can you please review 
> again? 
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-274879786 
> ----
> [~jbaranick] wrote on 2017-02-06T20:23:17Z : @htran1 In response to your 
> general comment:
> > General comment is that StateStore should not know about 'current'...
> > The StateStore should exposed APIs used at the DatasetStateStore level to 
> > access the last modified state.
> The StateStore does expose an API to access current: `getCurrent` and 
> `getAllCurrent`.  Additionally, the `createAlias` API has been removed.  In 
> order to support a seamless upgrade, StateStore, also supports the prior 
> `get*` calls being called with a filename like `current`.  In this case it 
> will perform the same as the `getCurrent`/`getAllCurrent` calls.  The logic 
> behind this is that I cannot be sure that there are no dependencies from 
> third-party libraries on the existing `get*` calls.  By making the change in 
> this way, we can get rid of the `current` aliased files, while still ensuring 
> existing code will work. We can publish in release note that callers need to 
> upgrade to the latest API.
> > ... DatasetStateStore should not query directly.
> If desired, the logic of 
> `MysqlDatasetStateStore.getLatestDatasetStatesByUrns` can be pushed down into 
> `MysqlStateStore`, if you feel strongly about that. 
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-277801997 
> ----
> [~jbaranick] wrote on 2017-02-06T22:43:21Z : @htran1 Can you review this 
> again? 
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-277838696 
> ----
> [~jbaranick] wrote on 2017-02-07T00:38:05Z : @htran1 I don't think that test 
> failure is due to my PR.  It seems to be a problem with line 197 in 
> `GobblinClusterKillTest`: ` _clusterWorkers[0].disconnectHelixManager();`. 
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-277861767 
> ----
> [~jbaranick] wrote on 2017-02-09T17:01:39Z : @htran1 Can you review this 
> again? 
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-278704409 
> ----
> [~jbaranick] wrote on 2017-02-14T06:44:03Z : @htran1 Can you please review 
> this again? 
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-279621932 
> ----
> [~jbaranick] wrote on 2017-02-16T17:32:22Z : @htran1 Any more feedback? 
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-280400680 
> ----
> [~jbaranick] wrote on 2017-02-16T23:00:49Z : @ibuenros Any feedback? 
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-280491562 
> ----
> [~jbaranick] wrote on 2017-02-26T06:33:06Z : @htran1 I understand what you 
> are saying, but I believe that the current.jst is the cause for many 
> problems. It was discussed a year ago and recommended to be removed. One big 
> issue is that depending on the backing FS, the update of the current file 
> contents is eventual.  This can lead to incorrect state being returned.
> Regarding `#1` above, do you have any numbers that show what the performance 
> is for listing the state files?  Also, what state store implementation are 
> you using.  If it is the FS version, what backing FS is it?
> For `#2` I would recommend that we have a different mechanism for moving to a 
> prior state.  This is a good feature, but rewriting the current file seems to 
> be the wrong was to go about this.
> As for `#3`, it seems that the need to write a current pointer or not is 
> totally dependent on the filesystem which is used to store the state.  Why 
> not get rid of the concept of current outside of the state store and allow 
> implementations to use it or not.  At least that way, I can make an 
> implementation of FsStateStore that doesn't use current and the builtin 
> version can. 
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-282536560 
> ----
> [~hutran] wrote on 2017-02-26T08:58:35Z : For 1, we are using FS state store 
> on HDFS. I'm less concerned about this case than case 3 on NFS. As long as we 
> retain only a small number of jst files per dataset it will probably be okay 
> since it will probably add at most a few seconds to the startup of a job.
> For 2, I'll have to check with the team since this is the current method of 
> temporarily moving state back.
> 3 is a concern for us since the NFS system we are on is less scalable than 
> HDFS. I'm seeing delete of small files giving only 50 ops/s. Listing should 
> be faster, but with many concurrent jobs starting it may saturate our NFS 
> filer. We only have a single master node in this mode that does job planning, 
> so saving CPU cycles is more essential.
> The base state store interface doesn't know about current.jst (at least 
> before your changes). We have other uses of StateStore that stores non .jst 
> files.  The current.jst is a DatasetStateStore concept. So maybe it makes 
> more sense to add the functionality you need to a new variation of 
> FsDatasetStateStore that doesn't make use of the current file. I think you 
> can even add it as a config option to the current FsDatasetStateStore since 
> it should only be a few blocks that are different. In either case, the mysql 
> and zk dataset state stores won't need to be touched. This allows us to 
> continue using current.jst while users of S3 can use the new approach. 
>  
> *Github Url* : 
> https://github.com/linkedin/gobblin/pull/965#issuecomment-282542372



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to