Hey Vaghawan Ojha thanks for your comment. The path is not to the source CSV file, the CSV is in S3, the flow is as follows: CsvBulkLoad is supposed to create the HFiles in /tmp hdfs directory, and in the second phase to load them into hbase. Thsese are the missing files I get in this error
Vaghawan Ojha wrote > Hi, Are you sure you are pointing to the right path and file? Because the > error says > Caused by: java.io.FileNotFoundException: File does not exist: > hdfs://* > Please make sure the csv file is there. > > On Sunday, November 26, 2017, idosenesh < > ido.ad.se@ > > wrote: > >> Im trying to bulk load into Phoenix using the CsvBulkLoadTool. >> Im running on amazon EMR cluster with 3 i3x2large core nodes, and default >> phoenix/hbase/emr configurations. >> >> I've successfully ran the job 3 times (i.e. succesfully inserted about >> 250G >> * 3 sized csv files) but the 4th run yields the following error: >> 2017-11-23 21:53:07,962 FATAL [IPC Server handler 7 on 39803] >> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: >> attempt_1511332372804_0016_m_002760_1 - exited : >> java.lang.IllegalArgumentException: Can't read partitions file >> at >> org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf( >> TotalOrderPartitioner.java:116) >> at org.apache.hadoop.util.ReflectionUtils.setConf( >> ReflectionUtils.java:76) >> at >> org.apache.hadoop.util.ReflectionUtils.newInstance( >> ReflectionUtils.java:136) >> at >> org.apache.hadoop.mapred.MapTask$NewOutputCollector.< >> init>(MapTask.java:711) >> at >> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) >> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:422) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs( >> UserGroupInformation.java:1698) >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) >> Caused by: java.io.FileNotFoundException: File does not exist: >> hdfs://***************:8020/mnt/var/lib/hadoop/tmp/ >> partitions_66f309d7-fe46-440a-99bb-fd8f3b40099e >> at >> org.apache.hadoop.hdfs.DistributedFileSystem$22. >> doCall(DistributedFileSystem.java:1309) >> at >> org.apache.hadoop.hdfs.DistributedFileSystem$22. >> doCall(DistributedFileSystem.java:1301) >> at >> org.apache.hadoop.fs.FileSystemLinkResolver.resolve( >> FileSystemLinkResolver.java:81) >> at >> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus( >> DistributedFileSystem.java:1317) >> at org.apache.hadoop.io.SequenceFile$Reader. > <init> > ( >> SequenceFile.java:1830) >> at org.apache.hadoop.io.SequenceFile$Reader. > <init> > ( >> SequenceFile.java:1853) >> at >> org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner. >> readPartitions(TotalOrderPartitioner.java:301) >> at >> org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf( >> TotalOrderPartitioner.java:88) >> >> >> My hdfs utilization is not high: >> [hadoop@******** /]$ hdfs dfsadmin -report >> Configured Capacity: 5679504728064 (5.17 TB) >> Present Capacity: 5673831846248 (5.16 TB) >> DFS Remaining: 5333336719720 (4.85 TB) >> DFS Used: 340495126528 (317.11 GB) >> DFS Used%: 6.00% >> Under replicated blocks: 0 >> Blocks with corrupt replicas: 0 >> Missing blocks: 0 >> Missing blocks (with replication factor 1): 0 >> >> >> >> Im running the following command: >> HADOOP_CLASSPATH=/usr/lib/hbase/hbase-protocol.jar:/usr/lib/hbase/conf >> hadoop jar /usr/lib/phoenix/phoenix-4.11.0-HBase-1.3-client.jar >> org.apache.phoenix.mapreduce.CsvBulkLoadTool -Dfs.permissions.umask-mode= >> 000 >> --table KEYWORDS_COMBINED_SALTED -d '|' --ignore-errors --input >> s3://path/to/my/bucket/file.csv >> >> >> The data on the last table is structurally the same as inserted before. >> >> Any ideas? >> >> >> >> -- >> Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/ >> -- Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/
