Jorge Pizarro created SPARK-22177: ------------------------------------- Summary: Error running ml_ops.sh(SPOT): Can not create a Path from an empty string Key: SPARK-22177 URL: https://issues.apache.org/jira/browse/SPARK-22177 Project: Spark Issue Type: Question Components: ML, Spark Submit, YARN Affects Versions: 2.2.0 Environment: CentOS 7 1708 Hadoop 2.6.0 Scala 2.11.8 SPOT 1.0 Reporter: Jorge Pizarro Priority: Minor
Error message running "./ml_ops.sh 20170922 dns 1e-4". Complete error message: [soluser@master spot-ml]$ bash -x ./ml_ops.sh 20170922 dns 1e-4 + FDATE=20170922 + DSOURCE=dns + YR=2017 + MH=09 + DY=22 + [[ 8 != \8 ]] + [[ -z dns ]] + source /etc/spot.conf ++ UINODE=master ++ MLNODE=master ++ GWNODE=master ++ DBNAME=spotdb ++ HUSER=/user/soluser ++ NAME_NODE=master ++ WEB_PORT=50070 ++ DNS_PATH=/user/soluser/dns/hive/y=2017/m=09/d=22/ ++ PROXY_PATH=/user/soluser/dns/hive/y=2017/m=09/d=22/ ++ FLOW_PATH=/user/soluser/dns/hive/y=2017/m=09/d=22/ ++ HPATH=/user/soluser/dns/scored_results/20170922 ++ IMPALA_DEM=master ++ IMPALA_PORT=21050 ++ LUSER=/home/soluser ++ LPATH=/home/soluser/ml/dns/20170922 ++ RPATH=/home/soluser/ipython/user/20170922 ++ LIPATH=/home/soluser/ingest ++ USER_DOMAIN=neosecure ++ SPK_EXEC=1 ++ SPK_EXEC_MEM=1g ++ SPK_DRIVER_MEM=1g ++ SPK_DRIVER_MAX_RESULTS=200m ++ SPK_EXEC_CORES=2 ++ SPK_DRIVER_MEM_OVERHEAD=100m ++ SPK_EXEC_MEM_OVERHEAD=100m ++ SPK_AUTO_BRDCST_JOIN_THR=10485760 ++ LDA_OPTIMIZER=em ++ LDA_ALPHA=1.02 ++ LDA_BETA=1.001 ++ PRECISION=64 ++ TOL=1e-6 ++ TOPIC_COUNT=20 ++ DUPFACTOR=1000 + '[' -n 1e-4 ']' + TOL=1e-4 + '[' -n '' ']' + MAXRESULTS=-1 + '[' dns == flow ']' + '[' dns == dns ']' + RAWDATA_PATH=/user/soluser/dns/hive/y=2017/m=09/d=22/ + '[' '!' -z neosecure ']' + USER_DOMAIN_CMD='--userdomain neosecure' + FEEDBACK_PATH=/user/soluser/dns/scored_results/20170922/feedback/ml_feedback.csv + HDFS_SCORED_CONNECTS=/user/soluser/dns/scored_results/20170922/scores + hdfs dfs -rm -R -f /user/soluser/dns/scored_results/20170922/scores + spark-submit --class org.apache.spot.SuspiciousConnects --master yarn --deploy-mode cluster --driver-memory 1g --conf spark.driver.maxResultSize=200m --conf spark.driver.maxPermSize=512m --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.maxExecutors=1 --conf spark.executor.cores=2 --conf spark.executor.memory=1g --conf spark.sql.autoBroadcastJoinThreshold=10485760 --conf 'spark.executor.extraJavaOptions=-XX:MaxPermSize=512M -XX:PermSize=512M' --conf spark.kryoserializer.buffer.max=512m --conf spark.yarn.am.waitTime=100s --conf spark.yarn.am.memoryOverhead=100m --conf spark.yarn.executor.memoryOverhead=100m target/scala-2.11/spot-ml-assembly-1.1.jar --analysis dns --input /user/soluser/dns/hive/y=2017/m=09/d=22/ --dupfactor 1000 --feedback /user/soluser/dns/scored_results/20170922/feedback/ml_feedback.csv --ldatopiccount 20 --scored /user/soluser/dns/scored_results/20170922/scores --threshold 1e-4 --maxresults -1 --ldamaxiterations 20 --ldaalpha 1.02 --ldabeta 1.001 --ldaoptimizer em --precision 64 --userdomain neosecure 17/09/29 13:51:56 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 17/09/29 13:51:56 INFO yarn.Client: Requesting a new application from cluster with 0 NodeManagers 17/09/29 13:51:56 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 17/09/29 13:51:56 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead 17/09/29 13:51:56 INFO yarn.Client: Setting up container launch context for our AM 17/09/29 13:51:56 INFO yarn.Client: Setting up the launch environment for our AM container 17/09/29 13:51:56 INFO yarn.Client: Preparing resources for our AM container 17/09/29 13:51:57 INFO yarn.Client: Deleted staging directory hdfs://master:9000/user/soluser/.sparkStaging/application_1506636890912_0058 Exception in thread "main" java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126) at org.apache.hadoop.fs.Path.<init>(Path.java:134) at org.apache.hadoop.fs.Path.<init>(Path.java:93) at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:337) at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:458) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:497) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:814) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:169) at org.apache.spark.deploy.yarn.Client.run(Client.scala:1091) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1150) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) real 0m3.610s user 0m5.122s sys 0m0.369s [soluser@master spot-ml]$ cat /etc/spot.conf # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # http://www.apache.org/licenses/LICENSE-2.0 # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. #node configuration UINODE='master' MLNODE='master' GWNODE='master' DBNAME='spotdb' #hdfs - base user and data source config HUSER='/user/soluser' NAME_NODE='master' WEB_PORT=50070 DNS_PATH=${HUSER}/${DSOURCE}/hive/y=${YR}/m=${MH}/d=${DY}/ PROXY_PATH=${HUSER}/${DSOURCE}/hive/y=${YR}/m=${MH}/d=${DY}/ FLOW_PATH=${HUSER}/${DSOURCE}/hive/y=${YR}/m=${MH}/d=${DY}/ HPATH=${HUSER}/${DSOURCE}/scored_results/${FDATE} #impala config IMPALA_DEM=master IMPALA_PORT=21050 #local fs base user and data source config LUSER='/home/soluser' LPATH=${LUSER}/ml/${DSOURCE}/${FDATE} RPATH=${LUSER}/ipython/user/${FDATE} LIPATH=${LUSER}/ingest #dns suspicious connects config USER_DOMAIN='neosecure' SPK_EXEC='1' SPK_EXEC_MEM='1g' SPK_DRIVER_MEM='1g' SPK_DRIVER_MAX_RESULTS='200m' SPK_EXEC_CORES='2' SPK_DRIVER_MEM_OVERHEAD='100m' SPK_EXEC_MEM_OVERHEAD='100m' SPK_AUTO_BRDCST_JOIN_THR='10485760' LDA_OPTIMIZER='em' LDA_ALPHA='1.02' LDA_BETA='1.001' PRECISION='64' TOL='1e-6' TOPIC_COUNT=20 DUPFACTOR=1000 [soluser@master spot-ml]$ spark-shell Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 17/09/29 13:52:57 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException Spark context Web UI available at http://192.168.40.158:4040 Spark context available as 'sc' (master = spark://master:7077, app id = app-20170929135251-0000). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.2.0 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_144) Type in expressions to have them evaluated. Type :help for more information. scala> :quit [soluser@master spot-ml]$ java -version java version "1.8.0_144" Java(TM) SE Runtime Environment (build 1.8.0_144-b01) Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode) [soluser@master spot-ml]$ hdfs version Hadoop 2.6.0-cdh5.12.1 Subversion http://github.com/cloudera/hadoop -r 520d8b072e666e9f21d645ca6a5219fc37535a52 Compiled by jenkins on 2017-08-24T16:34Z Compiled with protoc 2.5.0 >From source with checksum de51bf9693ab9426379a1cd28142cea0 This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.12.1.jar [soluser@master spot-ml]$ Thanks in advance Jorge Pizarro -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org