Indexing new document and check version

2015-04-23 Thread Tomáš Jurák
Hi, when I create new document and call prepareIndex it overwrites an existing one (was created just a nanosecond before this using another thread). Is it possible to use ES versions to control this behavior like when doing updates? Problem is, that when document exists I can receive its versio

Bulk indexing creates a lot of disk read OPS

2015-04-23 Thread Eran
Hello, I've created an index I use for logging. This means there are mostly writes, and some searches once in a while. In the phase of the first loading, I'm using several clients to concurrently index documents using the bulk API. At first, indexing takes 200 ms for a bulk of 5000 documents. A

Re: Combining several documents in a terms filter

2015-04-23 Thread Bruno Miranda
Is there an upper limit of bool filters? On Friday, April 17, 2015 at 10:18:38 PM UTC-7, vineeth mohan wrote: > > Hello Daniel , > > Feel free to use should clause in bool filter > > . > Here you can give mul

Re: FreeBSD 10.1 install elasticsearch plugin fails

2015-04-23 Thread David Pilato
Ok but this mapper attachment version does not exist. Just per curiosity could you give a link which shows that? > The elasticseach version in FreeBSD is 1.43, match mapper attachment 2.43. > -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs > Le 24 avr. 2015 à 05:49, Pccom F

Re: FreeBSD 10.1 install elasticsearch plugin fails

2015-04-23 Thread Pccom Frank
I am using Freebsd, there is no such thing as bin/plugin, only elasticseach-plugin functioning as bin/plugin, I guess. It won't follow the official doc. The elasticseach version in FreeBSD is 1.43, match mapper attachment 2.43. On Apr 23, 2015 2:39 AM, "David Pilato" wrote: > The command you wri

Re: Elasticsearch crashed after start

2015-04-23 Thread Mark Walkom
What do the logs show? On 24 April 2015 at 12:03, Ann Yablunovskaya wrote: > Hi! > > I don't understand, what happend. > OS CentOS 7.1 > I have ES cluster with two servers. > It have the same configuration. > > I tried to configure shield and marvel but my second ES instanse have > suddenly cras

Geo Mapping from Twitter

2015-04-23 Thread Sree
Hi all, "coordinates" : { "type" : "Point", "coordinates" : [ 100.41404641, 5.37384675 ] }, This is the Geo coordinates from Twitter. I tried it with "coordinates": {"properties": { "coordinates": {"type": "geo_point", "lat_lon": true, "geoha

Re: term and string

2015-04-23 Thread Jason Wee
Yeap, that help, thanks Doug! :-) On Thu, Apr 23, 2015 at 10:56 PM, Doug Turnbull wrote: > A term in a purely technical sense is an entry in the inverted index. > Technically it is a very low-level entity. > > For example, if you tokenized and analyzed doc1: "Dougie Turnbull" using the > English

Elasticsearch crashed after start

2015-04-23 Thread Ann Yablunovskaya
Hi! I don't understand, what happend. OS CentOS 7.1 I have ES cluster with two servers. It have the same configuration. I tried to configure shield and marvel but my second ES instanse have suddenly crashed. [root@server bin]# ./elasticsearch -v Version: 1.5.1, Build: 5e38401/2015-04-09T13:41:

Re: Issue when MatchPhasePrefix and Sort

2015-04-23 Thread Jason Wee
Hmmm why would a field that large? at 594mb? Jason On Fri, Apr 24, 2015 at 2:11 AM, TB wrote: > When executing a search with MatchPhasePrefix on a Propety which is a large > string. > The search fails with error. > "Data too large, data for [Field] would be larger than limit of > [623326003/

Re: Evaluating Moving to Discourse - Feedback Wanted

2015-04-23 Thread Jack Park
I've long wondered how it is that the ElasticSearch tribe could not build something within its own community, a platform that uses ElasticSearch to provide the UX being discussed here (a term occasionally used for that is "eat your own dog food"). Just thinkin... On Thu, Apr 23, 2015 at 8:18 AM,

Re: murmur3 field type and doc_values

2015-04-23 Thread Drew
I'm answering my own question here since I came across this issue on GitHub. https://github.com/elastic/elasticsearch/issues/10465 So the correct field options for a murmur3 field are: "index" : "no", "doc_values" : true - Drew On Wednesday, March 25, 2015 at 4:28:25 PM UTC-7, Drew wrote: > > H

Count, Aggregation, Frequency: Bit Stumped

2015-04-23 Thread rolandino
Hi, Wonder if anyone can help? I have the following query: { "from": 0, "sort": [ "_score" ], "fields": [ "id", "title", "text" ], "query": { "query_string": { "fields": [ "title", "text" ], "query": "(\"green socks\" OR \"red soc

Updating/upserting using Cascading/Scalding with a pre-defined _id field

2015-04-23 Thread Andres Perez
Hi. I've been creating Scalding Taps to read/write to elasticsearch (using cascading EsTaps underneath), and everything has been going fine when writing to documents using insert operations (es.write.operation=ES_OPERATION_INDEX). I am writing to a preexisting resource that I created through a

ES heavy write & index creation

2015-04-23 Thread Cristian Makoto Sandiga
Hi, Recently i was searching about a Real time analytic tool, and i found ES. Basically we want to show aggregate information for multiple customers. And this information have to be showed in real time. The information expire then one month. basically we want to use aggregation for analytics.

Unassigned Primary and Replica Shard

2015-04-23 Thread ariel
Hello, I have a 4 machine cluster (2 data nodes and 2 master nodes) and some time last night I had a failover from one master to the other. At this time a number of things happened: 1. Original master came back up but did not join the cluster 2. Shards for about 5 indexes are unassigned

Re: SHIELD terms lookup filter : AuthorizationException BUG

2015-04-23 Thread Jay Modi
Hi Bert, I don't know of a workaround to accomplish this in a single query right now. We have been discussing how to fix this issue in depth over the past few days and have ideas on how to move forward but no timeline on it being resolved. Regarding support contracts and fixes, I'm going to de

Issue when MatchPhasePrefix and Sort

2015-04-23 Thread TB
When executing a search with MatchPhasePrefix on a Propety which is a large string. The search fails with error. "Data too large, data for [Field] would be larger than limit of [623326003/594.4mb]] ElasticsearchException[java.lang.OutOfMemoryError: Java heap space] need help -- You received t

Re: Elasticsearch ingest performance

2015-04-23 Thread Kimbro Staken
I think much of your problem at this stage is YCSB. It doesn't sound like it's even close to pushing the limits of your cluster. If you just want synthetic data makelogs[1] can be used to generate data that is a little more real world looking and it uses bulk requests. A single running instance of

Re: Elasticsearch ingest performance

2015-04-23 Thread Dave Erickson
No idea how many shards you need. Try 10, 15, 20, 25 and see how the numbers coming from YCSB and system stat change. Wow, that github repo hasn't been touched in 3 years. Elasticsearch and the java client for Elasticsearch have probably changes a bit since then, so be careful about what you rea

Re: WordCloud in Elasticsearch

2015-04-23 Thread Alfredo Serafini
Hi Jeff IMHO a wordcloud visualization is simple to construct over facets, so if you have aggregations which counts how many documents you have for every term, this is probably the most simple way to construct it. If you want to use the term vectors it's important to understand what you want to

Re: Evaluating Moving to Discourse - Feedback Wanted

2015-04-23 Thread Alfredo Serafini
Hi I agree with Jorge 100%: adopting a license which gives people the freedom to construct datasets would be great. As I already suggested too to leslie, I think one of the most interesting things coudl be indeed releasing the dataset of the conversations, and costruct over there some example :-

Re: term and string

2015-04-23 Thread Doug Turnbull
A *term* in a purely technical sense is an entry in the inverted index. Technically it is a very low-level entity. For example, if you tokenized and analyzed doc1: "Dougie Turnbull" using the English analyzer (which stems words to root forms, lowercases, etc), you'd get an inverted index that look

Re: Elasticsearch ingest performance

2015-04-23 Thread Milind Shah
Thank you all for your inputs. I am working with Brian on this exercise as well. Let me try answering some of the questions. CPU usage: There are only 2 cores of CPU in use. When I monitor the disk usage, I notice that for every 2 minutes or so, the disk usage goes close to 100% for about 10 se

term and string

2015-04-23 Thread Jason Wee
Can anybody explain what is the different between term and string in elasticsearch context? When we index using default mapping (http://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html), the default type is string. But when we query, we use the word term (htt

one node gets no data, without errors in the logs

2015-04-23 Thread Vadim Kisselmann
Hi folks, we use elasticsearch 0.19.11 for one of our old clusters in the AWS (i cannot upgrade). The load was rising, so i added two new nodes to the existing 3 node cluster with same config setup(rolled out with chef, identical). All nodes are up, cluster is green. Node 5 gets some shards, but

Sort Results by category and occurrences in other indexes

2015-04-23 Thread James Radke
Good morning, I am new to ElasticSearch, and have the basics of querying, sorting, etc. working with some test data. I have a concept that I would like to solve, but am not really sure how to do it, so I thought I would post here and see if you all can help me figure this out. I have the f

Adding a sycronisation key to elasticsearch.

2015-04-23 Thread James
Hi, I have a system where I grab lots of XML files throughout the day. I keep a "sycned.xml" and a load of datestamped files too. Synced.xml is the latest file which has been sycned with elasticsearch. I run a program which works out the difference between the latest timestamped file and "sy

WordCloud in Elasticsearch

2015-04-23 Thread Jeff Fogarty
I looking to create a wordcloud in Jupyter (IPython Notebook) using either python or javascript. I have a collection of Presidential speeches from the millercenter.org loading into ES. I'm able to execute a termvector query which returns the below; term term_freq ttf doc_freq I

Re: Users data flow

2015-04-23 Thread Zaid Amir
Thanks for the reply, yes I know I exaggerated a bit there :) but to be honest finding a starting point for my cluster is driving me nuts. And how many shards/index is just not clear. Now I know each shard is a Lucene instance and having many instances running on a single node is a bad practice

Re: Elasticsearch ingest performance

2015-04-23 Thread dave
Just some thoughts. Yeah, with 16 cores per machine and 10 machines having 5 shards per index is probably too low. What are your system metrics telling you? Are the CPUs idle? What does the CPU I/O wait look like? Are you doing single index operations or batch index operations with YCSB? A

Re: Users data flow

2015-04-23 Thread christian . dahlqvist
Hi, If I have calculated correctly, that corresponds to about 238TB of raw data. If this is the size of JSON documents being indexed in Elasticsearch, you will definitely need more than 2 nodes. The good thing about using aliases the way David describes is that you will not need to put all use

Re: bulk index request dataloss

2015-04-23 Thread joergpra...@gmail.com
With the JDBC plugin, you should slightly increase the requests per bulk request ("maxbulkactions") in order to keep your concurrent bulk requests low enough to get handled by ES. The ES bulk thread pool default setting is ok. Please avoid a change. Jörg On Thu, Apr 23, 2015 at 12:20 PM, mzrth_

Re: bulk index request dataloss

2015-04-23 Thread David Pilato
It would mean that you are going to accumulate up to 1000 requests of 2500 docs at a time in memory. That could be a lot. You need to monitor that. That’s a lot of objects that might be GCed at some point. If your bulk request is rejected, why not trying to slow down injection rate instead of f

Re: bulk index request dataloss

2015-04-23 Thread mzrth_7810
Turns out it was because the bulk thread pool queue size was too small, any new requests were being rejected. Is it common to set threadpool.bulk.queue_size to something like 1000 ? On Tuesday, 7 April 2015 11:10:33 UTC+1, Jörg Prante wrote: > > Do you evaluate the bulk request responses? > > J

Re: Users data flow

2015-04-23 Thread Zaid Amir
Thanks for the explanation it is clear now. Now for the other part of my question. Lets assume that I am expecting this index to hold data for 1000 users. Each user will have 500,000 documents and each document will be 512KB. Now, these documents are pure text files. And lets say that my query

Re: maxDocs different between primary and replica shards

2015-04-23 Thread chris
Thanks for your reply Christian. This helped a lot and I eventually learned that this is called the "bouncing results" problem: http://www.elastic.co/guide/en/elasticsearch/guide/current/_search_options.html#_preference I don't quite understand why it isn't lower cost to make each shard identi

Re: Elasticsearch ingest performance

2015-04-23 Thread Michael McCandless
You can try the ideas here too: https://www.elastic.co/blog/performance-considerations-elasticsearch-indexing Mike McCandless On Wed, Apr 22, 2015 at 8:00 PM, Kimbro Staken wrote: > Hello Brian, > > Many things will affect the rate of ingest, the biggest one is making sure > the load gets sprea

Re: Users data flow

2015-04-23 Thread David Pilato
Aliases help to avoid developper bugs! Basically, imagine you forgot to apply the filter in one of your queries… Your user will see everything. Also, aliases might help you to secure your access to users data. If you are using Nginx or Shield, you can say that this user A has only access to loca

Re: Field Data Cache Size and Eviction

2015-04-23 Thread Jason Wee
A bit late from the OP posted this, not sure if it is still relevant but anyway... >> Under what circumstances will an ES node evict entries from it's field data cache? We're also deleting documents from the index, can this have an impact? What other things should I be looking it to find a corr

Re: Users data flow

2015-04-23 Thread Zaid Amir
So then what is the benefit of using aliases as opposed to using one index and filtered queries? From what I've read, aliases and routing can give a boost in queries since the index knows on which shards the documents are located, but now you are saying that it does not matter since users' data

Re: 30 billion unique documents (and counting)

2015-04-23 Thread Alexandre Heimburger
Regarding 1 index per month, what about search performances when searching on 2 or 3 years ? Le jeudi 23 avril 2015 03:20:33 UTC+2, Kimbro Staken a écrit : > > Running ES at scale is all about balance and sizing right. Like the 3 > bears, not too big and not too small, just right. Big boxes will

more_like_this POST/JSON equivalent of GET

2015-04-23 Thread Aleem Bawany
Hello, my first day with ElasticSearch and it's wonderful so far. I have the following query which works just fine: curl -XGET ' http://localhost:9200/demo/news/1177421/_mlt?mlt_fields=title,content&fields=id&min_doc_freq=1&pretty=true ' But the following JSON version returns incorrect results:

Re: Users data flow

2015-04-23 Thread Zaid Amir
So then what is the benefit of using aliases as opposed to using one index and filtered queries? > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticse

Re: Grouping/extracting results uploaded to Elasticsearch

2015-04-23 Thread KT SSP
When you say "split the fields out", do you mean take "__AAACC__" and have it post multiple entries, one for each 60 second snapshot. If so, then that is what is already being done, perhaps I should have explained that better. Each character was meant to represent a single entry

Re: Users data flow

2015-04-23 Thread David Pilato
You don’t need one shard per user unless each user has a very big amount of data. Using routing is good as all documents for a given user will go to the same shard. But also documents from other users will go to that shard. That’s not an issue. Use filters to filter your user data based on their

Re: Cannot read from Elasticsearch using Spark SQL

2015-04-23 Thread michele crudele
Thanks Costin. Il giorno lunedì 20 aprile 2015 12:38:30 UTC+2, Costin Leau ha scritto: > > Beta3 work with Spark SQL 1.0 and 1.1 > Spark SQL 1.2 was released after that and broke binary backwards > compatibility however his has been fixed in master/dev > version [1] > Note that Spark SQL 1.3

Users data flow

2015-04-23 Thread Zaid Amir
Hi, I am trying to figure out the best way to design my ES cluster. Currently my search service is subscription based and each user can only search his own data. So looking around I found several examples about users data flow and the way of using aliases and its all straight forward. One th

Re: Master node refuse to accept its role

2015-04-23 Thread David Pilato
So could you then add one by one the other settings you defined and report here which setting is causing your initial issue? David > Le 23 avr. 2015 à 09:17, Zaid Amir a écrit : > > I tried your configuration and though the master node was able to start. Now > the nodes are unable to see each

Re: Master node refuse to accept its role

2015-04-23 Thread Zaid Amir
I tried your configuration and though the master node was able to start. Now the nodes are unable to see each other. So the data node is now reporting that it cannot see any master nodes and its failing to start. On Wednesday, April 22, 2015 at 4:01:12 PM UTC+3, David Pilato wrote: > > I just ra