Thank you so much, Matt. You have answered my question. Thanks Aruna
From: Matt Burgess [mailto:[email protected]] Sent: Monday, December 11, 2017 7:26 PM To: [email protected] Subject: Re: ListS3 Processor Error Aruna, The index and type for Elasticsearch are kinds of partitioning that can help the users organize data, but definitely help in indexing and searching data. Types are not always required, but an index is. Imagine you are trying to store a bunch of tweets from a Twitter feed (or firehose) into Elasticsearch. You could call the index "twitter" and type "tweet" for each tweet that you store in the twitter index. Now say you want to also put Twitter user information into that index. You can reuse "twitter" as the index but then specify "user" as the type. Now you can search the entire index for information in tweets and user data, or you can additionally search by type, perhaps searching only the documents with user type. In the REST API, the index/type is specified such as GET /twitter/tweet/1 or GET /twitter/user/2 or something like that. The Elasticsearch processors use the index and type information to determine the right call to make to Elasticsearch. You can certainly choose "pdf" as the type if you like, although depending on the sort of queries you'll be running, you may want to pick an index that incorporates any kind of data you'll be keeping together, and a type that is more domain-specific (such as "customer" if it is a PDF full of customer data). Please let me know if that answers your question, I can provide more information if need be. Regards, Matt On Mon, Dec 11, 2017 at 4:18 PM, Joe Witt <[email protected]<mailto:[email protected]>> wrote: For that we'll need someone familiar with that processor/Elastic to chime in :) Thanks On Mon, Dec 11, 2017 at 4:16 PM, Aruna Sankaralingam <[email protected]<mailto:[email protected]>> wrote: Oops I overlooked the question on version that you asked. My apologies. I am using Nifi v1.4. I moved the pdf file to another folder in the same S3 bucket and Nifi was able to pick up. Initially it was in S3 > part-d-prescription-drug/unstructured I moved to S3 > Nifi-Pecos-files I still don’t know what was wrong with the old location. But for now, I am using the one that works. I am trying to put this pdf file in elasticsearch. I am not sure what I should give for “Index” and “Type”. Should the type be “PDF” ? Thanks Aruna From: Joe Witt [mailto:[email protected]<mailto:[email protected]>] Sent: Monday, December 11, 2017 3:32 PM To: [email protected]<mailto:[email protected]> Subject: Re: ListS3 Processor Error Aruna, We'll need to know more about your config/env to help I think. I am not aware of any normal usage situation that should result in truncated responses. It is possible it is a coding bug we can resolve but I think we'll need more details. Did you see the questions in my last reply? Thanks On Mon, Dec 11, 2017 at 2:50 PM, Aruna Sankaralingam <[email protected]<mailto:[email protected]>> wrote: Could someone please let me know what is wrong with the configuration that it is failing? From: Aruna Sankaralingam [mailto:[email protected]<mailto:[email protected]>] Sent: Monday, December 11, 2017 1:07 PM To: [email protected]<mailto:[email protected]> Subject: RE: ListS3 Processor Error Attached my nifi-app.log. Could you please let me know what went wrong? From: Joe Witt [mailto:[email protected]] Sent: Friday, December 08, 2017 4:04 PM To: [email protected]<mailto:[email protected]> Subject: Re: ListS3 Processor Error Here is an example I found for another processor https://mail-archives.apache.org/mod_mbox/nifi-dev/201509.mbox/%3CCAFddr26AEVqnoQ=mWr7DSNDFVrr9NuYy9GCcXg=4fyycqab...@mail.gmail.com%3E Thanks On Fri, Dec 8, 2017 at 4:02 PM, Aruna Sankaralingam <[email protected]<mailto:[email protected]>> wrote: Joe, Could you please let me know how to turn on the debug logging? From: Joe Witt [mailto:[email protected]<mailto:[email protected]>] Sent: Friday, December 08, 2017 3:59 PM To: [email protected]<mailto:[email protected]> Subject: Re: ListS3 Processor Error What version of NiFi? Looks like either a classpath/classloader issue OR the amazon client library cannot parse the response it is getting back... The logs/nifi-app.log should have the full stack trace. If not you can turn on debug logging for that processor and perhaps then it will. Thanks On Fri, Dec 8, 2017 at 3:56 PM, Aruna Sankaralingam <[email protected]<mailto:[email protected]>> wrote: I am trying to get a pdf file from S3 and load to Elastic Search. The ListS3 processor is giving me this error. Could someone please let me know where I am going wrong? 20:52:25 UTC ERROR 37d7226e-0160-1000-6049-d4c489cd32f3 ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] failed to process session due to com.amazonaws.SdkClientException: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler 20:52:25 UTC WARNING 37d7226e-0160-1000-6049-d4c489cd32f3 ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] Processor Administratively Yielded for 1 sec due to processing failure 20:52:26 UTC ERROR 37d7226e-0160-1000-6049-d4c489cd32f3 ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] failed to process due to com.amazonaws.SdkClientException: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler; rolling back session: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler 20:52:26 UTC ERROR 37d7226e-0160-1000-6049-d4c489cd32f3 ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] failed to process session due to com.amazonaws.SdkClientException: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler 20:52:26 UTC WARNING 37d7226e-0160-1000-6049-d4c489cd32f3 ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] Processor Administratively Yielded for 1 sec due to processing failure Auto-refresh [cid:[email protected]]
