[ https://issues.apache.org/jira/browse/DRILL-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511704#comment-14511704 ]
Krystal commented on DRILL-1345: -------------------------------- David - is this issue resolved? Can I close it? > Drill can write to Amazon S3 storage buckets but not read from them > ------------------------------------------------------------------- > > Key: DRILL-1345 > URL: https://issues.apache.org/jira/browse/DRILL-1345 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet, Storage - Text & CSV > Affects Versions: 0.4.0, 0.5.0 > Environment: CentOS 6.3 on Amazon Web Services virtual instance > Reporter: David Tucker > Assignee: David Tucker > Priority: Critical > Fix For: Future > > > After configuring the storage plug-in for Amazon S3, drill commands will > correctly create parquet or csv files in the S3 bucket. However, attempting > to read those file results in a software hang. > To reproduce the issue : > Confirm Hadoop access to the bucket from the shell with > 'hadoop fs -ls s3://<bucket>/' > Likely causes for failure of hadoop access are incorrect user > authentication settings in core-site.xml. You'll need appropriate > AWS authentication keys for the following properties > fs.s3.awsAccessKeyId > fs.s3.awsSecretAccessKey > fs.s3n.awsAccessKeyId > fs.s3n.awsSecretAccessKey > Configure S3 storage plug-in (clone of default DFS plug-in > with a single change to the connection string {should be > "s3://<bucket>". This CANNOT BE DONE until the actual > connectivity to the bucket is verified (a separate issue with storage > plug-in configuration that MUST connect to the target > connection string or it fails). > Simple queries to create tables in the S3 bucket will work. > alter session set `store.format`='parquet' ; > create table `my-s3`.`/employee1` as select * from cp.`employee.json` ; > > alter session set `store.format`='csv' ; > create table `my-s3`.`/employee2` as select * from cp.`employee.json` ; > > Confirm the existence of the files in the S3 bucket, and the readability of > their contents with "hadoop fs" commands. > Attempts to read the same tables will hang > select * from `my-s3`.'/employee1' > "jstack -F <drillbit_pid>" indicates there is a deadlock of some kind. > NOTE: The jets3t class enabling S3 data access from MapR Hadoop 4.0.1 client > was incompatible with Drill 0.4 and 0.5. I had to leave the jets3t library > excluded (via hadoop-excludes.txt) and copy in older jets3t support from MapR > 3.0.3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)