I have couple of simple questions that I would like to clear up:
#1: For transportClient & cluster of two hosts: Do I have to add both hosts
to the client, or is it enough to add just one of them and the yml(s) will
take care of the clustering?
.addTransportAddress(new InetSocketTransportAddres
When you do a search, it will return your full _source document by default.
If you supplied a value for the text field at index time, then the text
field is included in the returned _source. If you supply some other field
at index time, then that field will also be returned from the _source. The
Sorry for the confusion - I do want PDFs, but I am concerned with the
retrieval of the image file when it ocr text is searched. I must be missing
something.
As showing below, I provide two fields "text" and the "content". In your
second post you say I don't need the "content' field for images? S
Binh, Thanks, With your help I think I am closer to the answer. Wih the
sample mapping you provided, I should be able to provide the base 64
contents of the image file as the "contents" field, and the ocrtext as
"text field. So, when the ocr text is searched, i can return the "content"
which is
Oh, the attachment part is for your PDF. If you don't need to index PDFs
then just remove that part:
PutMappingResponse putMappingResponse = new
PutMappingRequestBuilder(
client.admin().indices()).
setIndices(INDEX_NAME).setType(DOCUMENT_TYPE).setSource(
XCont
Thanks, it sounds like you are treating it as an attachment, In your
example, what is the "fileContents" in .field("content", fileContents) ?
How do I get file contents of an image, I know in the case of the pdf, this
is content text of the pdf.
Correct, I don't want to index the image binary,
You certainly can add a new field, and then just put the OCR text into that
new field. So for example:
Mapping:
PutMappingResponse putMappingResponse = new
PutMappingRequestBuilder(
client.admin().indices()).setIndices(INDEX_NAME).setType(DOCUMENT_TYPE).setSource(
I index PDFs using apache with the following mapping.
.field( "type", "attachment" )
.field("fields")
.startObject()
.startObject("file")
.field("store", "yes")
.endObject()
I want to index photos, I am able to extract text using OCR. I am confused
how to index the text though, do I treat