Indexing Binary vs text

2014-03-27 Thread IronMan2014
I have couple of simple questions that I would like to clear up: #1: For transportClient & cluster of two hosts: Do I have to add both hosts to the client, or is it enough to add just one of them and the yml(s) will take care of the clustering? .addTransportAddress(new InetSocketTransportAddres

Re: indexing binary

2014-02-27 Thread Binh Ly
When you do a search, it will return your full _source document by default. If you supplied a value for the text field at index time, then the text field is included in the returned _source. If you supply some other field at index time, then that field will also be returned from the _source. The

Re: indexing binary

2014-02-27 Thread ZenMaster80
Sorry for the confusion - I do want PDFs, but I am concerned with the retrieval of the image file when it ocr text is searched. I must be missing something. As showing below, I provide two fields "text" and the "content". In your second post you say I don't need the "content' field for images? S

Re: indexing binary

2014-02-27 Thread ZenMaster80
Binh, Thanks, With your help I think I am closer to the answer. Wih the sample mapping you provided, I should be able to provide the base 64 contents of the image file as the "contents" field, and the ocrtext as "text field. So, when the ocr text is searched, i can return the "content" which is

Re: indexing binary

2014-02-27 Thread Binh Ly
Oh, the attachment part is for your PDF. If you don't need to index PDFs then just remove that part: PutMappingResponse putMappingResponse = new PutMappingRequestBuilder( client.admin().indices()). setIndices(INDEX_NAME).setType(DOCUMENT_TYPE).setSource( XCont

Re: indexing binary

2014-02-27 Thread ZenMaster80
Thanks, it sounds like you are treating it as an attachment, In your example, what is the "fileContents" in .field("content", fileContents) ? How do I get file contents of an image, I know in the case of the pdf, this is content text of the pdf. Correct, I don't want to index the image binary,

Re: indexing binary

2014-02-27 Thread Binh Ly
You certainly can add a new field, and then just put the OCR text into that new field. So for example: Mapping: PutMappingResponse putMappingResponse = new PutMappingRequestBuilder( client.admin().indices()).setIndices(INDEX_NAME).setType(DOCUMENT_TYPE).setSource(

indexing binary

2014-02-26 Thread ZenMaster80
I index PDFs using apache with the following mapping. .field( "type", "attachment" ) .field("fields") .startObject() .startObject("file") .field("store", "yes") .endObject() I want to index photos, I am able to extract text using OCR. I am confused how to index the text though, do I treat