It's tough to say what is going on here. Who generates the .metadata file. What are the contents of this directory on HDFS? (This look like the mapreduce staging directory) Tez mode for MR jobs will work in most cases, but it is not completely compatible with all MR jobs.
On Mon, Feb 6, 2017 at 3:33 AM, Артем Великородный <[email protected]> wrote: > I tried to import some data to Hive as parquet through sqoop using this > command: > > sqoop import --connect jdbc:mysql://node1:3306/sqoop --username root > --password 123456 --table devidents --hive-import --hive-table galinqewra > --create-hive-table -m 1 --as-parquetfile > > in mapred-site.xml i set mapreduce.framework.name to yarn-tez > and > in hive-site.xml hive.execution.engine to tez > > and it fails with this exception: > > 17/02/03 01:07:45 INFO client.TezClient: Submitting DAG to YARN, > applicationId=application_1486051443218_0001, dagName=codegen_devidents.jar > 17/02/03 01:07:46 INFO impl.YarnClientImpl: Submitted application > application_1486051443218_0001 > 17/02/03 01:07:46 INFO client.TezClient: The url to track the Tez AM: > http://node1:8088/proxy/application_1486051443218_0001/ > 17/02/03 <http://node1:8088/proxy/application_1486051443218_0001/17/02/03> > 01:07:59 INFO mapreduce.Job: The url to track the job: > http://node1:8088/proxy/application_1486051443218_0001/ > 17/02/03 <http://node1:8088/proxy/application_1486051443218_0001/17/02/03> > 01:07:59 INFO mapreduce.Job: Running job: job_1486051443218_0001 > 17/02/03 01:08:00 INFO mapreduce.Job: Job job_1486051443218_0001 running in > uber mode : false > 17/02/03 01:08:00 INFO mapreduce.Job: map 0% reduce 0% > 17/02/03 01:08:27 INFO mapreduce.Job: Job job_1486051443218_0001 failed with > state FAILED due to: Vertex failed, vertexName=initialmap, > vertexId=vertex_1486051443218_0001_1_00, diagnostics=[Task failed, > taskId=task_1486051443218_0001_1_00_000000, diagnostics=[TaskAttempt 0 > failed, info=[Error: Error while running task ( failure ) : > attempt_1486051443218_0001_1_00_000000_0:org.kitesdk.data.DatasetNotFoundException: > Descriptor location does not exist: > hdfs:/tmp/default/.temp/job_14860514432180_0001/mr/job_14860514432180_0001/.metadata > at > org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.checkExists(FileSystemMetadataProvider.java:562) > at > org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.find(FileSystemMetadataProvider.java:605) > at > org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.load(FileSystemMetadataProvider.java:114) > at > org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:197) > at > org.kitesdk.data.spi.AbstractDatasetRepository.load(AbstractDatasetRepository.java:40) > at > org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadJobDataset(DatasetKeyOutputFormat.java:591) > at > org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateTaskAttemptDataset(DatasetKeyOutputFormat.java:602) > at > org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateTaskAttemptView(DatasetKeyOutputFormat.java:615) > at > org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.getRecordWriter(DatasetKeyOutputFormat.java:448) > at > org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:399) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable._callInternal(LogicalIOProcessorRuntimeTask.java:533) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:516) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:501) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 > killedTasks:0, Vertex vertex_1486051443218_0001_1_00 [initialmap] > killed/failed due to:OWN_TASK_FAILURE]. DAG did not succeed due to > VERTEX_FAILURE. failedVertices:1 killedVertices:0 > 17/02/03 01:08:27 INFO mapreduce.Job: Counters: 0 > 17/02/03 01:08:27 WARN mapreduce.Counters: Group FileSystemCounters is > deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead > 17/02/03 01:08:27 INFO mapreduce.ImportJobBase: Transferred 0 bytes in > 63.4853 seconds (0 bytes/sec) > 17/02/03 01:08:27 WARN mapreduce.Counters: Group > org.apache.hadoop.mapred.Task$Counter is deprecated. Use > org.apache.hadoop.mapreduce.TaskCounter instead > 17/02/03 01:08:27 INFO mapreduce.ImportJobBase: Retrieved 0 records. > 17/02/03 01:08:27 ERROR tool.ImportTool: Error during import: Import job > failed! > > Hive table is created but no any data in it. > > if i start job on MapReduce mode it successfully completed > also it pass if i run without '--as-parquetfile' > > any suggestions? >
