Hi

I am new to elasticsearch and am trying out the attachement plugin. I'm a 
bit confused on how to handle the meta-data from the attachements.

I have created a simple mapping as example. I explicitly store the 'title' 
field, other fields are by default not stored.
PUT /test/file/_mapping
{
  "random" : {
      "properties": {
          "content" : { 
               "type" : "attachment",
                "fields" : {
                   
                    "title" : {
                        "index": "analyzed", 
                        "store" : "yes"
                    },

                    "content_type" : {
                        "store" : "no"
                    }
                }
          }
      }
  }
}

This is the mapping as given by elasticsearch

{
   "test": {
      "mappings": {
         "file": {
            "properties": {
               "content": {
                  "type": "attachment",
                  "path": "full",
                  "fields": {
                     "content": {
                        "type": "string"
                     },
                     "author": {
                        "type": "string"
                     },
                     "title": {
                        "type": "string",
                        "store": true
                     },
                     "name": {
                        "type": "string"
                     },
                     "date": {
                        "type": "date",
                        "format": "dateOptionalTime"
                     },
                     "keywords": {
                        "type": "string"
                     },
                     "content_type": {
                        "type": "string"
                     },
                     "content_length": {
                        "type": "integer"
                     }
                  }
               }
            }
         }
      }
   }
}


Example query:

GET /test/file/_search
{
 "fields": [
    "*", "content.content_type" 
 ],  
    "query": {
        "match": {       
           "content.content_type": "xhtml test document"
        }
    }
} 

response:
{
   "took": 13,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.26574233,
      "hits": [
         {
            "_index": "test",
            "_type": "file",
            "_id": "3SmwWJe6TtiP0nheD6pFCg",
            "_score": 0.26574233,
            "fields": {
               "content.content_type": [
                  "...PCEtLQogTGljZW5zZWQgdG8gdGhlI..."
               ],
               "content.title": [
                  "XHTML test document"
               ]
            }
         }
      ]
   }
}


So I am able to query on the "content_type" field, but in the response I 
get the base64 representation of the attachement, instead of 
""application/xhtml+xml".
Do I really need to store each meta-data field for my attachement? I was 
under the impression that elasticsearch would extract the field from the 
_source at runtime (or would this cause to much overhead?)

Thx,
Tom

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/461e4ba9-cdab-4c76-a915-c8e1f8b7ae22%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to