Something for checked in and CI is nuked. I tried revert of my last commit,
and that didn’t fix it, so apparently it wasn’t that.
Anyone have any ideas? It is super-broken. unit tests failing like crazy,
GPU builds hanging on shutdown. No successful builds today at all.
I've seen this before. Try rebasing and force pushing.
On Thu, Mar 29, 2018 at 3:51 PM, Indhu wrote:
> Hi,
>
> Looks like PR #10039 build failed because of git errors. Here is the error
> log:
> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-
>
Hi,
Looks like PR #10039 build failed because of git errors. Here is the error
log:
http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/PR-10039/4/console.
Does someone know what could be happening here?
Build error:
Adding as 3rdparty/dlpack~7c28089749287f42ea8f41abd1358e6dbac54187
Thank you, Chris!
What's interesting here (e.g. at [1]) is the matter of the fact that all
tests are actually finishing, but the process does not terminate. I have
experienced such a behaviour in my past C# and Java projects. In these
cases, it was related to threads being created as
Thanks for looking into this! Did this happen in no specific job in
particular or could it be pinned down to a single configuration? We have
never had hangs like this, so this definitely seems related to a recent
change.
-Marco
On Thu, Mar 29, 2018 at 7:26 PM, kellen sunderland <
I killed several builds which were > 11 hours old -- all stuck at this
python3 GPU hang problem
Debugging this a bit with Chris. I haven't looked at it closely but it
seems like there might be a genuine hang here between
CuDNNConvolutionOp::SelectAlgo and a customop lambda invoke. What
do you guys think?
Stack is here:
For our current POC:
b. Add mpi.kvstore in python. It depends upon mxnet submodule mpi_collectives
(new). (mpi_collectives is c++ library depending upon mxnet.)(Add new type
of kvstore in python layer.)
mpi_collectives doesn’t need to be a single c++ library. It’s source code can
be
You can check mpi.kvstore API Spec in our design doc:
e.g. We add pushpull and broadcast interface and disable original push and
pull in new kvstore.
From: Ye, Zhouhai
Sent: Tuesday, March 27, 2018 11:18 AM
To: 'Nan Zhu' ; dev@mxnet.incubator.apache.org
Cc: Li, Mu
Hi,
Nan Zhu
As we described in our design doc, there’s two possible code structure
(implementation) : (currently we implement second in our POC)
a. Implement mpi.kvstore same level as the current kvstores (CPP
src/kvstore) (Adhere to original kvstore factory pattern)
b. Add
Actually, the current design structure is very like kvstore_nccl as attached
picture shown.
I have updated the proposal into google doc as well. It’s more easy to add
comments and modify.
11 matches
Mail list logo