Hi,
I am trying to deploy the sample model to MAAS. I have executed the steps as
below and at the end I am unable to deploy the model to MAAS. I shared the logs
I see for appropriate steps, I can't figure out the any issues in logs. I
request someone who have expertise on using MAAS to go through logs and help me
how to troubleshoot the issue further and resolve it.
To Start MAAS service: $METRON_HOME/bin/maas_service.sh -zq <ZookeeperNode>:2181
After above command, I can see the application in Yarn resource manager
and in the logs I see the latest message is "19/01/04 07:17:36 INFO
service.ApplicationMaster: Ready to accept requests.."
Later I ran below command to deploy my model:
$METRON_HOME/bin/maas_deploy.sh -zq <ZookeeperNode>:2181 -lmp
/home/metron/mock_dga -hmp /user/metron/models -mo ADD -m 512 -n dga -v 1.0 -ni
1
Below is the logs I see in AppMaster.stderr
***************Deploy model logs start*******
19/01/04 07:20:23 INFO service.ApplicationMaster: [ADD]: Received
request for model dga:1.0x1 containers of size 512M at path /user/metron/models
19/01/04 07:20:25 INFO impl.AMRMClientImpl: Received new token for :
<YarnNode>:45454
19/01/04 07:20:25 INFO callback.ContainerRequestListener: Got response from RM
for container ask, allocatedCnt=1
19/01/04 07:20:25 INFO service.ApplicationMaster: Found container id of
8796093022210
19/01/04 07:20:25 INFO callback.ContainerRequestListener: Launching shell
command on a new container.,
containerId=container_e08_1545053754589_0089_01_000002,
containerNode=<YarnNode>:45454, containerNodeURI=<YarnNode>:8042,
containerResourceMemory=1024, containerResourceVirtualCores=1
19/01/04 07:20:25 INFO callback.LaunchContainer: Setting up container launch
container for containerid=container_e08_1545053754589_0089_01_000002
19/01/04 07:20:25 INFO callback.LaunchContainer: Local Directory Contents
19/01/04 07:20:25 INFO callback.LaunchContainer: 6 - tmp
19/01/04 07:20:25 INFO callback.LaunchContainer: 74 - container_tokens
19/01/04 07:20:25 INFO callback.LaunchContainer: 12 - .container_tokens.crc
19/01/04 07:20:25 INFO callback.LaunchContainer: 3930 - launch_container.sh
19/01/04 07:20:25 INFO callback.LaunchContainer: 40 - .launch_container.sh.crc
19/01/04 07:20:25 INFO callback.LaunchContainer: 655 -
default_container_executor_session.sh
19/01/04 07:20:25 INFO callback.LaunchContainer: 16 -
.default_container_executor_session.sh.crc
19/01/04 07:20:25 INFO callback.LaunchContainer: 709 -
default_container_executor.sh
19/01/04 07:20:25 INFO callback.LaunchContainer: 16 -
.default_container_executor.sh.crc
19/01/04 07:20:25 INFO callback.LaunchContainer: 20508144 - AppMaster.jar
19/01/04 07:20:25 INFO callback.LaunchContainer: Localizing /user/metron/models
19/01/04 07:20:25 INFO callback.LaunchContainer: Model payload:
/user/metron/models
19/01/04 07:20:25 INFO callback.LaunchContainer: AppJAR Location:
hdfs://<HDFSNODE>:8020/user/metron/MaaS/application_1545053754589_0089/AppMaster.jar
19/01/04 07:20:25 INFO callback.LaunchContainer: Localized dga.py ->
LocatedFileStatus{path=hdfs://<HDFSNODE>:8020/user/metron/models/dga/dga.py;
isDirectory=false; length=754; replication=3; blocksize=134217728;
modification_time=1544799906052; access_time=1544799905761; owner=metron;
group=metron; permission=rw-r--r--; isSymlink=false}
19/01/04 07:20:25 INFO callback.LaunchContainer: Localized rest.sh ->
LocatedFileStatus{path=hdfs://<HDFSNODE>:8020/user/metron/models/dga/rest.sh;
isDirectory=false; length=27; replication=3; blocksize=134217728;
modification_time=1544799906093; access_time=1544799906058; owner=metron;
group=metron; permission=rw-r--r--; isSymlink=false}
19/01/04 07:20:25 INFO callback.LaunchContainer: Localized dga.py ->
LocatedFileStatus{path=hdfs://<HDFSNODE>:8020/user/metron/models/dga.py;
isDirectory=false; length=754; replication=3; blocksize=134217728;
modification_time=1546604423149; access_time=1546604422374; owner=metron;
group=metron; permission=rw-r--r--; isSymlink=false}
19/01/04 07:20:25 INFO callback.LaunchContainer: Localized rest.sh ->
LocatedFileStatus{path=hdfs://<HDFSNODE>:8020/user/metron/models/rest.sh;
isDirectory=false; length=27; replication=3; blocksize=134217728;
modification_time=1544797916505; access_time=1544797916294; owner=metron;
group=metron; permission=rwxr-xr-x; isSymlink=false}
19/01/04 07:20:25 INFO callback.LaunchContainer: Localized run.sh ->
LocatedFileStatus{path=hdfs://<HDFSNODE>:8020/user/metron/models/run.sh;
isDirectory=false; length=26; replication=3; blocksize=134217728;
modification_time=1546604423282; access_time=1546604423214; owner=metron;
group=metron; permission=rw-r--r--; isSymlink=false}
19/01/04 07:20:25 INFO callback.LaunchContainer: AppMaster.jar localized:
scheme: "hdfs" host: "<HDFSNODE>" port: 8020 file:
"/user/metron/MaaS/application_1545053754589_0089/AppMaster.jar"
19/01/04 07:20:25 INFO callback.LaunchContainer: run.sh localized: scheme:
"hdfs" host: "<HDFSNODE>" port: 8020 file: "/user/metron/models/run.sh"
19/01/04 07:20:25 INFO callback.LaunchContainer: dga.py localized: scheme:
"hdfs" host: "<HDFSNODE>" port: 8020 file: "/user/metron/models/dga.py"
19/01/04 07:20:25 INFO callback.LaunchContainer: rest.sh localized: scheme:
"hdfs" host: "<HDFSNODE>" port: 8020 file: "/user/metron/models/rest.sh"
19/01/04 07:20:25 INFO callback.LaunchContainer: Executing container command:
{{JAVA_HOME}}/bin/java org.apache.metron.maas.service.runner.Runner -ci
8796093022210 -zq <ZookeeperNode>:2181 -zr /metron/maas/config -s run.sh -n dga
-hn <YarnNode> -v 1.0 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr
19/01/04 07:20:25 INFO impl.NMClientAsyncImpl: Processing Event EventType:
START_CONTAINER for Container container_e08_1545053754589_0089_01_000002
19/01/04 07:20:25 INFO impl.ContainerManagementProtocolProxy: Opening proxy :
<YarnNode>:45454
19/01/04 07:20:25 INFO impl.NMClientAsyncImpl: Processing Event EventType:
QUERY_CONTAINER for Container container_e08_1545053754589_0089_01_000002
19/01/04 07:20:25 INFO impl.ContainerManagementProtocolProxy: Opening proxy :
<YarnNode>:45454
****************************************
Now when I run the see list of models deployed/registered as below command, I
am not getting proper result:
$METRON_HOME/bin/maas_deploy.sh -zq <ZookeeperNode>:2181 -mo LIST
*********************Log in AppMaster.stderr I see ****************
19/01/04 07:21:57 ERROR service.ApplicationMaster: Received a null request...
************************************************************
********************In console I see below*****************
19/01/04 07:25:08 INFO zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir
19/01/04 07:25:08 INFO zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
19/01/04 07:25:08 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
19/01/04 07:25:08 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
19/01/04 07:25:08 INFO zookeeper.ZooKeeper: Client
environment:os.version=3.10.0-862.2.3.el7.x86_64
19/01/04 07:25:08 INFO zookeeper.ZooKeeper: Client environment:user.name=metron
19/01/04 07:25:08 INFO zookeeper.ZooKeeper: Client
environment:user.home=/home/metron
19/01/04 07:25:08 INFO zookeeper.ZooKeeper: Client
environment:user.dir=/home/metron
19/01/04 07:25:08 INFO zookeeper.ZooKeeper: Initiating client connection,
connectString=<ZookeeperNode>:2181 sessionTimeout=60000
watcher=org.apache.curator.ConnectionState@7cd2d3b6
19/01/04 07:25:09 INFO zookeeper.ClientCnxn: Opening socket connection to
server <ZookeeperNode>/<ZookeeperNode>:2181. Will not attempt to authenticate
using SASL (unknown error)
19/01/04 07:25:09 INFO zookeeper.ClientCnxn: Socket connection established to
<ZookeeperNode>/<ZookeeperNode>:2181, initiating session
19/01/04 07:25:09 INFO zookeeper.ClientCnxn: Session establishment complete on
server <ZookeeperNode>/<ZookeeperNode>:2181, sessionid = 0x167b7a3ae3cc404,
negotiated timeout = 60000
19/01/04 07:25:09 INFO state.ConnectionStateManager: State change: CONNECTED
19/01/04 07:25:09 INFO zookeeper.ZooKeeper: Session: 0x167b7a3ae3cc404 closed
19/01/04 07:25:09 INFO zookeeper.ClientCnxn: EventThread shut down
**************************************
I cannot see any error messages that are causing the issue to register Model in
MAAS. Please look at the logs and help me how to troubleshoot further to
resolve the issue.
Thanks,
Anil.