Re: CI Update

2019-12-06 Thread Pedro Larroy
Hi all. CI is back to normal after Jake's commit: https://github.com/apache/incubator-mxnet/pull/16968 please merge from master. If someone could look into the TVM building issues described above would be great. On Tue, Dec 3, 2019 at 11:11 AM Pedro Larroy wrote: > Some PRs were experiencing

Re: CI Update

2019-12-03 Thread Pedro Larroy
Some PRs were experiencing build timeouts in the past. I have diagnosed this to be a saturation of the EFS volume holding the compilation cache. Once CI is back online this problem is very likely to be solved and you should not see any more build timeout issues. On Tue, Dec 3, 2019 at 10:18 AM

Re: CI Update

2019-12-03 Thread Pedro Larroy
Also please take note that there's a stage building TVM which is executing compilation serially and takes a lot of time which impacts CI turnaround time: https://github.com/apache/incubator-mxnet/issues/16962 Pedro On Tue, Dec 3, 2019 at 9:49 AM Pedro Larroy wrote: > Hi MXNet community. We

Re: CI Update

2019-12-03 Thread Pedro Larroy
Hi MXNet community. We are in the process of updating the base AMIs for CI with an updated CUDA driver to fix the CI blockage. We would need help from the community to diagnose some of the build errors which don't seem related to the infrastructure. I have observed this build failure with tvm

CI Update

2019-12-02 Thread Pedro Larroy
Small update about CI, which is blocked. Seems there's a nvidia driver compatibility problem in the base AMI that is running in GPU instances and the nvidia docker images that we use for building and testing. We are working on providing a fix by updating the base images as doesn't seem to be