Re: UNASSIGNED indexes

2015-04-07 Thread Aaron Mefford
I have had experience with such but _not_ without data loss. The reality is that some data loss has already occurred. I am not aware of any ES solution that will allow you to retrieve what data remains, without further data loss, and restore the index to green status. I have seen reference to so

Re: Is there a way to do scan with limit?

2015-04-03 Thread Aaron Mefford
Unless I am mistaken, that is already the job of the scan query, reducing the load caused by sorting a query. There would be no reduction by limiting the result set. As I understand it the scan query identifies which shards has results, and then just starts serving the first results first without

Re: ES/Lucene eating up entire memory!

2015-04-03 Thread Aaron Mefford
:57 AM UTC-6, Yogesh wrote: > > Thanks Aaron. Your post was very informative. > Can you recommend any blogposts, articles etc. where I could read more on > this topic? > > Thanks again for your help. > > On Tuesday, March 31, 2015 at 9:57:58 PM UTC+5:30, Aaron Mefford wrote

Re: alias not indexing all documents

2015-04-03 Thread Aaron Mefford
Can you share what you are trying to accomplish with the now()? There may be an alternative approach It may make sense to cron modification to the alias, such that the now value is periodically updated. On Thursday, April 2, 2015 at 1:03:59 PM UTC-6, Stefanie wrote: > > Hi, > > I have found wh

Re: UNASSIGNED indexes

2015-04-03 Thread Aaron Mefford
Noticed this happening on a cluster this week which had reached 85%, the full disk watermark. On Thursday, April 2, 2015 at 3:29:18 PM UTC-6, Mark Walkom wrote: > > Take a look in your ES logs, it should have something of use. > > You can also try dropping the replicas to 0 for the indices that a

Re: elasticsearch does not working with number match

2015-04-03 Thread Aaron Mefford
You likely are doing string matching because the data was ingested as a string. Take a look at the following they should clear things up for you. http://www.elastic.co/guide/en/elasticsearch/guide/master/mapping-intro.html http://www.elastic.co/guide/en/elasticsearch/reference/1.x/mapping.html h

Re: Is there a way to do scan with limit?

2015-04-03 Thread Aaron Mefford
Is there a reason not to control the limit in your code? On Thursday, April 2, 2015 at 6:31:07 PM UTC-6, Chen Wang wrote: > > I want to for example, fast get 1m out of 5m records. > I am currently using: > > SearchResponse scrollResp = this.client > .prepareSearch(esQuery.indices) > > .addField

Re: CPU 100% utilization

2015-04-03 Thread Aaron Mefford
Thats a lot of threads for 1G of memory. On Friday, April 3, 2015 at 4:55:20 AM UTC-6, cyrilforce wrote: > > Hi, > > Having to test the performance of the ES with some load testing it > reaches 100% CPU utilization and following is hot threads : > > ES configuration > --

Re: Prefix Query Result Problem.

2015-04-03 Thread Aaron Mefford
Have you tried Countries.Name.Untouched? On Friday, April 3, 2015 at 9:06:17 AM UTC-6, James Crone wrote: > > I have try analyzer on specific index field by creating multifield. And it > looks like: > "Countries" : { >"Properties" : { > "Name" : { > "

Re: what are the research papers that ES relies on?

2015-03-31 Thread Aaron Mefford
y awesome. > > Thanks > > 31 Mart 2015 Salı 00:42:45 UTC+3 tarihinde Aaron Mefford yazdı: > >> I understand that if you do not have sufficient storage space, then you >> cannot manage a replica on every node. However, you are not limited to the >> size of a &qu

Re: elasticsearch high cpu usage every hourly

2015-03-31 Thread Aaron Mefford
>From what I can see in your graphs I noticed two things. You seem to have a spike in search requests at that time, a spike in http traffic, and a cache eviction right at the beginning of it. Are you certain you don't have an external user with a cron job that runs at the top of the hour? P

Re: ES/Lucene eating up entire memory!

2015-03-31 Thread Aaron Mefford
You need to read up a bit on how memory is allocated in Linux. In an ElasticSearch or Database server, this seems to be both, you want that free column to be 0. All available free memory should be used to cache files. In your snapshot you have 35GB of file cache listed under the cached headi

Re: Assigning, or just Deleting shards.

2015-03-30 Thread Aaron Mefford
In a single node cluster set the replica count to 0. The shards will clean themselves up automatically. Don't forget to do regular backups as you have no redundancy. On Friday, March 27, 2015 at 1:15:38 PM UTC-6, avery...@insecure-it.com wrote: > > I have an elasticsearch host (single) that h

Re: what are the research papers that ES relies on?

2015-03-30 Thread Aaron Mefford
t; magical ways that ES uses rather than lucene has its own. > > 30 Mart 2015 Pazartesi 18:55:49 UTC+3 tarihinde Aaron Mefford yazdı: >> >> "Automagic" routing happens already on hashing the document id. It >> sounds like you may have a situation where your documen

Re: what are the research papers that ES relies on?

2015-03-30 Thread Aaron Mefford
"Automagic" routing happens already on hashing the document id. It sounds like you may have a situation where your document id is creating a hot spot. This being the case what you want is not automagic routing but more control over the routing or a better document id. There is the ability to

Re: ES&Lucene 32GB heap myth or fact?

2015-03-27 Thread Aaron Mefford
I think part of what you may be missing, is the intent that ElasticSearch be scaled out rather than up. There are other issues that occur when you scale up instead of out, the first of which is that losing a single node of your cluster can be disastrous. It is also generally far more expensive

Re: Re[4]: Elasticsearch - node client does not connect to cluster

2015-03-17 Thread Aaron Mefford
t to understand how to make node client work. > > > Вторник, 17 марта 2015, 11:26 -06:00 от Aaron Mefford >: > > This is what I use in my code, not sure how correct it is given the > abysmal state of the the Java API documentation. > > import or

Re: Re[2]: Elasticsearch - node client does not connect to cluster

2015-03-17 Thread Aaron Mefford
This is what I use in my code, not sure how correct it is given the abysmal state of the the Java API documentation. import org.elasticsearch.common.settings.Settings; import org.elasticsearch.common.settings.ImmutableSettings; import org.elasticsearch.client.Client; import org.elasticsearch.clien

Re: Snapshot Scaling Problems

2015-03-16 Thread Aaron Mefford
some script? > > > On Friday, March 13, 2015 at 11:52:59 AM UTC-5, Aaron Mefford wrote: >> >> Yes it was m1.smalls that I first noticed the EBS throttling on. Things >> work well in bursts, but sustained EBS does not work well. It will work >> substantially better

Re: Analyzers and JSON

2015-03-13 Thread Aaron Mefford
Well.. I think I may see your issue. I decoded this string: L2hvbWUvYWhhcm1vbi90ZXN0L0EgUGx1cyAtIE1lZGlhIFBsYW4gU3VtbWFyeS54bHM= It is: /home/aharmon/test/A Plus - Media Plan Summary.xls Another is: /home/aharmon/test/A Plus - Summary by Venue.pdf I think you misunderstand the purpose or how

Re: Analyzers and JSON

2015-03-13 Thread Aaron Mefford
pposed to use Tika to index the content of documents but it doesn't > seem to be working correctly. I base64 encode the documents but it comes > back as null when I decode it. > On Friday, March 13, 2015 at 11:38:38 AM UTC-5, Aaron Mefford wrote: >> >> Not certain wha

Re: Analyzers and JSON

2015-03-13 Thread Aaron Mefford
ments but it doesn't > seem to be working correctly. I base64 encode the documents but it comes > back as null when I decode it. > On Friday, March 13, 2015 at 11:38:38 AM UTC-5, Aaron Mefford wrote: >> >> Not certain what you are referring to so I expect not. I have used t

Re: best practice for rebuilding an index using aliases

2015-03-13 Thread Aaron Mefford
Weird that was the post I made yesterday morning that just now hit the list after vanishing. On Thu, Mar 12, 2015 at 10:21 AM, wrote: > I switched to using aliases about a year ago and I love it. I am able to > rebuild in the background and make a clean cutover once the process > completes. > >

Re: Snapshot Scaling Problems

2015-03-13 Thread Aaron Mefford
Yes it was m1.smalls that I first noticed the EBS throttling on. Things work well in bursts, but sustained EBS does not work well. It will work substantially better in an m3.medium and if you are using the new EBS SSD volumes. On Thu, Mar 12, 2015 at 10:30 PM, Andy Nemzek wrote: > Thank you gu

Re: Kibana with Hadoop directly?

2015-03-13 Thread Aaron Mefford
To second what Costin said, it is trivial to load the data into ElasticSearch, then you have the data ready to use the full power of Kibana. ElasticSearch is very quick to setup and scales well. Trying to go the route of rolling your own setup to generate graphs with D3 will certainly be harder,

Re: Analyzers and JSON

2015-03-13 Thread Aaron Mefford
gt; Do you have experience with the mapper attachment? > > On Friday, March 13, 2015 at 11:15:18 AM UTC-5, Aaron Mefford wrote: >> >> Your going to have the same issue with SOLR, putting the contents in to >> XML which is even heavier than JSON. >> >> I wish that I

Re: Analyzers and JSON

2015-03-13 Thread Aaron Mefford
t would be a > good solution to my problem. > > thanks, > Austin > > On Thursday, March 12, 2015 at 4:04:29 PM UTC-5, Aaron Mefford wrote: >> >> Take a look at Apache Tika http://tika.apache.org/ >> <http://www.google.com/url?q=http%3A%2F%2Ftika.apache.org%2F&

Re: Analyzers and JSON

2015-03-12 Thread Aaron Mefford
Take a look at Apache Tika http://tika.apache.org/. It will allow you to extract the contents of the documents for indexing, this is outside of the scope of the ElasticSearch indexing. A good tool to make these files downloadable is also out of scope, but I'll answer to what is in scope. You need

Re: If I have ELK stack running on EC2. How can I make the ES as a cluster?

2014-08-03 Thread Aaron Mefford
I don't know that ES has any intelligence to support varied node sizes so I would say yes they should be the same size. I've not looked into this so I may be wrong. Also I use multiple Ebs volumes in a software raid. to increase non provisioned iops. Not necessary if you use piops. Aaron Sent

Re: If I have ELK stack running on EC2. How can I make the ES as a cluster?

2014-07-28 Thread Aaron Mefford
There are some new options in the latest builds of ElasticSearch as I understand it that replace the old S3 Gateway. However, neither the S3 Gateway nor those others are requirements for setting up ElasticSearch on EC2. They are only disaster recovery options that will help you to get back u

Match All query performance

2014-07-06 Thread Aaron Mefford
Is there any reason that match all queries would be impacted significantly by index size? It seems that in the absence of any sort, query or other mechanism requiring scoring it should just be a matter of fetching the first document from a shard. In practice that does not seem to be the case.