[ 
https://issues.apache.org/jira/browse/SYSTEMML-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2324:
--------------------------------
    Description: We also need to implement the synchronization between workers 
and parameter server to be able to bring more parameter update strategies, 
e.g., the stale-synchronous strategy needs a hyperparameter "staleness" to 
define the waiting interval. The idea is to maintain a vector clock recording 
all workers' clock in the server. Each time when an iteration in worker side 
finishes, it sends a signal to server for incrementing its clock and then it 
sends another request for asking whether to wait or not. When the server 
receives this request, it will determine whether the worker should continue or 
not according to the different strategies. So we could define BSP with 
"staleness==0" and SSP with "staleness==N". For the ASP, we do not need to 
calculate the time gap between the quickest worker and the slowest one.  (was: 
We also need to implement the synchronization between workers and parameter 
server to be able to bring more parameter update strategies, e.g., the 
stale-synchronous strategy needs a hyperparameter "staleness" to define the 
waiting interval. The idea is to maintain a vector clock recording all workers' 
clock in the server. Each time when an iteration in side of worker finishes, it 
waits server to give a signal, i.e., to send a request for calculating the 
staleness according to the vector clock. And when the server receives the 
gradients from certain worker, it will increment the vector clock for this 
worker. So we could define BSP as "staleness==0", ASP as "staleness==-1" and 
SSP as "staleness==N".)

> Synchronization
> ---------------
>
>                 Key: SYSTEMML-2324
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2324
>             Project: SystemML
>          Issue Type: Sub-task
>            Reporter: LI Guobao
>            Assignee: LI Guobao
>            Priority: Major
>
> We also need to implement the synchronization between workers and parameter 
> server to be able to bring more parameter update strategies, e.g., the 
> stale-synchronous strategy needs a hyperparameter "staleness" to define the 
> waiting interval. The idea is to maintain a vector clock recording all 
> workers' clock in the server. Each time when an iteration in worker side 
> finishes, it sends a signal to server for incrementing its clock and then it 
> sends another request for asking whether to wait or not. When the server 
> receives this request, it will determine whether the worker should continue 
> or not according to the different strategies. So we could define BSP with 
> "staleness==0" and SSP with "staleness==N". For the ASP, we do not need to 
> calculate the time gap between the quickest worker and the slowest one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to