James, I am sorry I am not sure if I follow that. Could you please give an example?
From: James Wing [mailto:[email protected]] Sent: Wednesday, December 13, 2017 12:55 PM To: [email protected] Subject: Re: ListS3 Processor Error For ListS3, you will want to separate those in the Bucket and Prefix properties. On Dec 13, 2017, at 9:34 AM, Aruna Sankaralingam <[email protected]<mailto:[email protected]>> wrote: James, “part-d-prescription-drug” is the main folder in S3 and “unstructured” is the sub folder inside the main folder. From: James Wing [mailto:[email protected]] Sent: Wednesday, December 13, 2017 1:34 AM To: [email protected]<mailto:[email protected]> Subject: Re: ListS3 Processor Error Are you able to list the bucket with the AWS CLI (aws s3 ls)? It can be helpful to compare performance between NiFi and the AWS CLI, especially if you are able to do so from the same machine, with the same permissions, and as similar bucket and prefix settings as you can manage. In the screenshot above, the bucket is shown as "part-d-prescription-drug/unstructured", which looks unusual to me. Is the bucket "part-d-prescription-drug" and the prefix "unstructured/"? Thanks, James On Tue, Dec 12, 2017 at 7:34 AM, Aruna Sankaralingam <[email protected]<mailto:[email protected]>> wrote: Joe, No, I don’t have anything in between AWS and NiFi. NiFi is installed in one of the EC2 instance in AWS – N.Virginia Region S3 is also in N.Virginia Region From: Joe Witt [mailto:[email protected]<mailto:[email protected]>] Sent: Monday, December 11, 2017 1:28 PM To: [email protected]<mailto:[email protected]> Subject: Re: ListS3 Processor Error The XML response is truncated for some reason as implied by the following. Do you have any devices/software/systems/proxies in between your NiFi and the amazon service? Are you able to manually issue the request and get the response you expect? 2017-12-11 18:01:02,875 ERROR [Timer-Driven Process Thread-6] org.apache.nifi.processors.aws.s3.ListS3 ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] failed to process session due to com.amazonaws.SdkClientException: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler: {} com.amazonaws.SdkClientException: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:156) at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseListBucketObjectsResponse(XmlResponsesSaxParser.java:298) at com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:70) at com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:59) at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62) at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31) at com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1444) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1151) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:964) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:676) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:650) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:633) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:601) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:583) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:447) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4137) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4079) at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:819) at org.apache.nifi.processors.aws.s3.ListS3$S3ObjectBucketLister.listVersions(ListS3.java:314) at org.apache.nifi.processors.aws.s3.ListS3.onTrigger(ListS3.java:208) at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1119) at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147) at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:128) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.xml.sax.SAXParseException: Premature end of file. at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327) at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1472) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1014) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:505) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:841) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:770) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213) at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:142) ... 32 common frames omitted On Mon, Dec 11, 2017 at 1:07 PM, Aruna Sankaralingam <[email protected]<mailto:[email protected]>> wrote: Attached my nifi-app.log. Could you please let me know what went wrong? From: Joe Witt [mailto:[email protected]<mailto:[email protected]>] Sent: Friday, December 08, 2017 4:04 PM To: [email protected]<mailto:[email protected]> Subject: Re: ListS3 Processor Error Here is an example I found for another processor https://mail-archives.apache.org/mod_mbox/nifi-dev/201509.mbox/%3CCAFddr26AEVqnoQ=mWr7DSNDFVrr9NuYy9GCcXg=4fyycqab...@mail.gmail.com%3E Thanks On Fri, Dec 8, 2017 at 4:02 PM, Aruna Sankaralingam <[email protected]<mailto:[email protected]>> wrote: Joe, Could you please let me know how to turn on the debug logging? From: Joe Witt [mailto:[email protected]<mailto:[email protected]>] Sent: Friday, December 08, 2017 3:59 PM To: [email protected]<mailto:[email protected]> Subject: Re: ListS3 Processor Error What version of NiFi? Looks like either a classpath/classloader issue OR the amazon client library cannot parse the response it is getting back... The logs/nifi-app.log should have the full stack trace. If not you can turn on debug logging for that processor and perhaps then it will. Thanks On Fri, Dec 8, 2017 at 3:56 PM, Aruna Sankaralingam <[email protected]<mailto:[email protected]>> wrote: I am trying to get a pdf file from S3 and load to Elastic Search. The ListS3 processor is giving me this error. Could someone please let me know where I am going wrong? 20:52:25 UTC ERROR 37d7226e-0160-1000-6049-d4c489cd32f3 ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] failed to process session due to com.amazonaws.SdkClientException: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler 20:52:25 UTC WARNING 37d7226e-0160-1000-6049-d4c489cd32f3 ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] Processor Administratively Yielded for 1 sec due to processing failure 20:52:26 UTC ERROR 37d7226e-0160-1000-6049-d4c489cd32f3 ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] failed to process due to com.amazonaws.SdkClientException: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler; rolling back session: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler 20:52:26 UTC ERROR 37d7226e-0160-1000-6049-d4c489cd32f3 ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] failed to process session due to com.amazonaws.SdkClientException: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler 20:52:26 UTC WARNING 37d7226e-0160-1000-6049-d4c489cd32f3 ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] Processor Administratively Yielded for 1 sec due to processing failure Auto-refresh <image001.png>
