Re: Text Categorization in ES

2014-02-26 Thread Dawid Weiss
Searching for Laptop will automatically give result for Dell, Sony, HP, Lenevo, Samsung... as well. As lingo3g is used for clustering the documents so it will store the reference for above terms as well. There is no way to get a clear, intuitive classification like this from an unsupervised

Re: Elasticsearch Reverse Suggester Problem

2014-02-26 Thread Garry Welding
Hi Nik, thanks for the suggestion. That's why I'm using the pre and post filters as I want to match the suffix of upshchair because I understand how Lucene stores terms. As such I have set up a new property called name_reverse that stores the product name as reversed tokens. I'm then trying to

Re: Elasticsearch Reverse Suggester Problem

2014-02-26 Thread Garry Welding
However, I did give it a try removing the pre filter, but it didn't change the results. On Tuesday, February 25, 2014 8:00:14 PM UTC, Nikolas Everett wrote: I believe the job of the reverse filter is to efficiently provide suggestions that share a suffix with the provided term rather than a

Re: Text Categorization in ES

2014-02-26 Thread prashy
So it means that all the classification has to be done prior, on the basis of user defined scenario. And automatically this feature is not supported either through carrot or Lingo3g. Like we have the feature of word-delimiter, hunspell filter etc. -- View this message in context:

Re: Text Categorization in ES

2014-02-26 Thread Hannes Korte
On 26.02.2014 08:28, prashy wrote: To be specific I want a query like : Searching for Laptop will automatically give result for Dell, Sony, HP, Lenevo, Samsung... as well. I'm not sure I got that correctly. Besides the text classification we talked about, this sentence could also mean that you

Invalid Version Format, but version same

2014-02-26 Thread Jorj Ives
Hello, I'm trying to bring a second node into my cluster. I've set it up as unicast and the two nodes are trying to communicate however, they claim they're incompatible versions. Node One: { ok : true, status : 200, name : ES Server Node, version : { number : 0.90.11,

Index existing data and integrate elasticsearch with PHP

2014-02-26 Thread Santosh
Dear All, I have gone through the link - http://www.elasticsearch.org/guide/en/elasticsearch/client/php-api/current/_quickstart.html to understand the setup. Can someone point to the documentation where I can 1. Index existing data from mysql - I can index the data using curl but trying to

ES Response Time

2014-02-26 Thread prashy
Hi All, Is there any way to measure end-to-end response time for a query in elastic search? That is the time taken from the time query is executed and the result is shown on the ES UI. And what does took parameter means in response ouput? -- View this message in context:

Re: Relation Between Heap Size and Total Data Size

2014-02-26 Thread Dan Fairs
So, I am wondering that is there any relationship between heap size and total data size? Is there any formula to determine heap size based on data size? You might want to check that you're not running out of file handles: http://www.elasticsearch.org/tutorials/too-many-open-files/ Cheers,

Re: ES Response Time

2014-02-26 Thread Zachary Tong
The `took` parameter is the number of milliseconds that the query took to execute on the Elasticsearch server. It's basically the time required to parse the query, broadcast it to the shards and collect the results. It doesn't include network time going to and from Elasticsearch itself (since

Re: DateRange aggregation semantics - include_lower/include_upper?

2014-02-26 Thread mooky
I think its necessary to be able to specify an *include_lower*/ *include_upper* option like with filters. On Tuesday, 25 February 2014 14:54:24 UTC, Binh Ly wrote: Yes, you are correct. The from is inclusive, and the to is exclusive. -- You received this message because you are subscribed

Re: Compute TF/IDF across indexes

2014-02-26 Thread Binh Ly
I tried this and indeed it works, so thanks Ivan for the tip! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view

How to visualize statistics on time series data in Kibana

2014-02-26 Thread Dave Snigier
Howdy everyone, I have events with the following structure in ES: { _index: logstash-2014.02.25, _type: symantecav-logs, _id: _5Hig6lPTUi2p-palnuplA, _score: null, _source: { message: [

elasticsearch cache configuration

2014-02-26 Thread Hediye Delkhosh
Hello. I've installed Elasticsearch on web server and it use most part of memory, I searched and finally found cache module and it's configuration but I don't know how config Elasticsearch. Where should I insert below configurations??? index.cache.field.max_size: 5

Re: Free (cloud) hosting Elasticsearch provider

2014-02-26 Thread Mattias Nordberg
https://facetflow.com/ - We offer hosted Elasticsearch with 500MB and 5,000 documents for free. On Wednesday, September 4, 2013 8:49:21 AM UTC+2, Charles Moulliard wrote: Hi, Does it exist a free (cloud or not) hosting Elasticsearch provider ? Regards, Charles -- You received this

Re: Histogram of high-cardinality aggregate

2014-02-26 Thread Binh Ly
Unfortunately, I don't believe you can do a sub-aggregation on a single-value metric at the moment. For now, you'll probably have to index the actual (min) values and then aggregate on them. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To

Re: How to join 2 indexes at query time

2014-02-26 Thread Binh Ly
Unfortunately, ES is not like SQL in this respect. You'll need to denormalize somewhat because ES is more document-oriented. You'd probably need to either denormalize offer_id into categorytype, or category into offertype to get all the data you want returned in 1 query. -- You received this

Re: Invalid Version Format, but version same

2014-02-26 Thread Binh Ly
Just a guess, can you check also that you have the same exact Java versions across all nodes? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to

Re: How to visualize statistics on time series data in Kibana

2014-02-26 Thread Binh Ly
When you add a Histogram panel, look in the setting Chart Value. There are options for max and mean in there and then in the Value Field, you can specify scan duration (or connect duration) - I'm not 100% sure if the spaces in your field name might fail but if it does, you'll probably need to

Include special Symbol

2014-02-26 Thread Nick Chang
Hello I have column include special symbol. Ex: user : (Google)a...@gmail.com I want to count. POST /datas/_search { facets: { terms: { terms: { field: user, size: 10 } } } } But, result is not right. term: gmail.com, count:

Re: Include special Symbol

2014-02-26 Thread Binh Ly
You'll likely need that field to be unanalyzed (i.e. tell ES not to cut it up in the index). One way is to predefine that field in your mapping as: user: { type: string, index: not_analyzed } More details here:

What's wrong with my match query in Java API?

2014-02-26 Thread Daniel Guo
I want to do a match query, and the query works fine in REST: curl -XGET 'localhost:9200/search/video_search/_search?pretty' -d @query. json query.json: { query: { match: { tvName: { query: 决战华岩寺, operator: or,

Re: Include special Symbol

2014-02-26 Thread Nick Chang
Hello Binh I used elasticsearch-river-mongo plugin. How to modify this index?? Thanks for your reply Binh Ly於 2014年2月26日星期三UTC+8下午11時03分29秒寫道: You'll likely need that field to be unanalyzed (i.e. tell ES not to cut it up in the index). One way is to predefine that field in your mapping

Kibana empty after upgrade to ES 1.0.1

2014-02-26 Thread Terry Healy
I just upgraded all my ES systems to 1.0.1 and they seem to be working fine - except for Kibana 3. I had installed Kibana 3 Milestone pre-5 (8512132). Previously I was using it just with _all enabled. Now when I attempt to use a filter for * and a time filter for the past hour or so, it lists

[Hadoop] New tutorial submitted

2014-02-26 Thread Yann Barraud
Hi, I'm glad to inform I publish a new tutorial to integrate ES + Hortonworks sandbox. https://github.com/hortonworks/hadoop-tutorials/pull/9 Cheers, Yann -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and

Re: elasticsearch cache configuration

2014-02-26 Thread Zachary Tong
If you simply want to decrease the amount of memory that Elasticsearch is using, you need to change your heap size (via the HEAP_SIZE environment variable). That controls the total memory allocated to Elasticsearch. Echoing what Binh said...try not to change the field-data settings unless you

Re: What's wrong with my match query in Java API?

2014-02-26 Thread Binh Ly
Hmmm, not sure. I tired this and it worked for me: q = QueryBuilders.matchQuery(tvName, 决战华岩寺) .minimumShouldMatch(1) .operator(MatchQueryBuilder.Operator.OR); Perhaps can you give a complete example of an index, 1 document, and the actual full Java query to duplicate

Re: Queue capacity and EsRejectedExecutionException leads to loss of data

2014-02-26 Thread Thomas
Thanks David, So this is a rabbitMQRiver issue, is there a need to open a separate issue? (Never done the procedure, will look this one) Thomas On Wednesday, 26 February 2014 15:48:55 UTC+2, Thomas wrote: Hi, We have installed the RabbitMQ river plugin to pull data from our Queue and

Re: Queue capacity and EsRejectedExecutionException leads to loss of data

2014-02-26 Thread David Pilato
I think that adding a comment into the existing issue would be fine. --  David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 26 février 2014 à 17:00:12, Thomas (thomas.bo...@gmail.com) a écrit: Thanks David, So this is a rabbitMQRiver issue, is there a need

Single thread with high CPU usage

2014-02-26 Thread Magnus Hyllander
I have an ES 0.90.11 cluster with three nodes (d0, d1, d2), with 4 cores and 7GB memory, running Ubuntu and JDK 7u45. The ES instances are all master+data, configured with 3.5GB heap size. They are pretty much running a vanilla configuration. Logstash is currently storing on average 200 logs

Re: Single thread with high CPU usage

2014-02-26 Thread Nikolas Everett
Check to see how much GC you are doing when it spikes. If it is high, try to clear the cache: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-clearcache.html I'd try clearing each cache one at a time to see which one helps. If that is the problem you can configure

Re: EsRejectedExecutionException when searching date based indices.

2014-02-26 Thread Alex Clark
That is correct, I was mixing the terms nodes and shards (sorry about that). I'm running the test on a single node (machine). I've chosen 20 shards so we could eventually go to a 20 server cluster without re-indexing. It's unlikely we'll ever need to go that high but we never know and given

BigDecimal support

2014-02-26 Thread mooky
In financial services space, we almost never use float/double in our domain - we always use BigDecimal. In elastic, I would like to be able to index/store BigDecimal in a lossless manner (ie what I get back from _source has the same precision, etc as what I put in). When I have had to

Re: Removing elasticsearch logs

2014-02-26 Thread computer engineer
thanks. Will look into that. Do you perhaps know where the directory is that stores all these messages or log files On Wednesday, February 26, 2014 8:10:09 AM UTC-5, Binh Ly wrote: There is currently discussion around this, but in the meantime, try this to see if it helps:

Re: EsRejectedExecutionException when searching date based indices.

2014-02-26 Thread joergpra...@gmail.com
I think you have a misconception about shard over-allocation and re-indexing, so you should read https://groups.google.com/d/msg/elasticsearch/49q-_AgQCp8/MRol0t9asEcJ where kimchy explains how over-allocation of shards work. If you have time-series indexes, you need not 20 shards per day, just

Re: BigDecimal support

2014-02-26 Thread joergpra...@gmail.com
ES accepts BigDecimal input. You can specify scale and rounding mode to format the BigDecimal. https://github.com/jprante/elasticsearch/commit/8ef8cd149b867e3e45bc3055dfd6da80e4e9c7ec Internally, BigDecimal is automatically converted to a JSON string if the number does not fit into double

Migration from 0.90.10 to 1.0.1

2014-02-26 Thread Mariano Battistessa
Hello, I have installed the version 0.90.10 of Elasticsearch. I have large amounts of indexed information. How I can migrate the information to Elasticsearch 1.0.1? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this

Re: Kibana empty after upgrade to ES 1.0.1

2014-02-26 Thread Terry Healy
I had to modify my URL back to point to Glassfish: http://192.168.4.254:8080/#/dashboard/file/guided.json On Wednesday, February 26, 2014 10:26:02 AM UTC-5, Terry Healy wrote: I just upgraded all my ES systems to 1.0.1 and they seem to be working fine - except for Kibana 3. I had installed

Re: Elasticsearch 1.0.0 is now GA

2014-02-26 Thread Tony Su
It's not working for me with or without any quotes. If I'm not just doing some kind of incredible User error, I'm not talking about the User inserting quotes (or not)... I'm talking about the underlying Java code which accepts the input. Although I can't think of how this would have anything

Re: Elasticsearch 1.0.0 is now GA

2014-02-26 Thread Tony Su
Cool, I'll try that next. Thx, Tony On Tuesday, February 25, 2014 7:56:34 AM UTC-8, InquiringMind wrote: I always start Elasticsearch from within my own wrapper script, es.sh. Inside this wrapper script is the following incantation: NODE_OPT=-D*es.node.name http://es.node.name*=$(uname

Re: Put mapping documentation -- What options are available? Specifically, how to store a property but without indexing it?

2014-02-26 Thread Daniel Winterstein
Sorry Ivan! I'm not having much luck on this thread. Daniel Sent from my phone. Please excuse the brevity. On 26 Feb 2014 01:58, Ivan Brusic i...@brusic.com wrote: Luke? :) On Tue, Feb 25, 2014 at 1:09 PM, Daniel Winterstein daniel.winterst...@gmail.com wrote: Dear Hariharan, Alex, Luke,

Re: How to visualize statistics on time series data in Kibana

2014-02-26 Thread David Snigier Jr.
That did the trick! I was able to keep the spaces in the field name, but did need to cast the field to a float in logstash for the metric to work. Really loving how quickly valuable data hidden in the logs can be drawn out and visualized with logstash+elasticsearch+kibana. Props to y'all for

Too many nodes started up on some data nodes - best approach to fix?

2014-02-26 Thread Josh Harrison
I restarted my cluster the other day, but something odd stuck, resulting in 15/16 data nodes starting up an extra ES instance in the same cluster. This ended badly as there were two nodes with identical display names, the system locked up, etc. When restarting again, to my horror, we were

Re: Interesting question on Transaction Log record mutability

2014-02-26 Thread Yuri Panchenko
Thanks Binh, but I don't think you got the fullest gist of my question. I want to be able to minimize reindexing of the same document too many times. What I would like to do is to turn off indexing/refreshing and even transaction log flushing in between of the batched partial updates. If I

Re: How to visualize statistics on time series data in Kibana

2014-02-26 Thread Binh Ly
Oh yeah forgot about the datatype - that's good that you caught that. Good to hear! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to

Re: How to generate ES index in the hadoop

2014-02-26 Thread drew dahlke
Hi Costin, We're very interested offline processing as well. To draw a parallel to HBase, you could write a hadoop job that writes out to a table over the thrift API. However if you're going to load in many terrabytes of data, there's the option to write out directly to the HTable file format

How to absorb a lot of incremental partial updates efficiently on a denormalized record?

2014-02-26 Thread Yuri Panchenko
Guys, I'm evaluating a denormalized data structure in ES that basically looks like a Customer record with a lot of transactions with dollar amounts and dates. It roughly looks like this: { id: 123, name: Gavin, ... transactions: { txn_uid_1 : { date : 02-19-2013, amount : $19.99 },

Re: Migration from 0.90.10 to 1.0.1

2014-02-26 Thread Mark Walkom
What OS are you on, are you using the packaged version or the standalone (zip)? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 27 February 2014 04:31, Mariano Battistessa mariano.ba...@gmail.comwrote: Hello, I

Re: Interesting question on Transaction Log record mutability

2014-02-26 Thread Binh Ly
Thanks, I think I understand better now. I deleted my previous post so that I can clarify better. The transaction log is just a backup mechanism for durability. When you index a document, it eventually goes into a segment (in memory). When you update it, the old doc is marked as deleted and

Re: Exact phrase match - city names example

2014-02-26 Thread thale jacobs
I am having problem a similar problem too. Here is how I set it up the test index: Create the index: curl -s -XPUT 'localhost:9200/test' -d '{ mappings: { properties: { name: { street: { type: string,

Re: How to generate ES index in the hadoop

2014-02-26 Thread Costin Leau
On 2/26/2014 11:26 PM, drew dahlke wrote: Hi Costin, We're very interested offline processing as well. To draw a parallel to HBase, you could write a hadoop job that writes out to a table over the thrift API. However if you're going to load in many terrabytes of data, there's the option to

Re: Interesting question on Transaction Log record mutability

2014-02-26 Thread Yuri Panchenko
Thanks for the explanation!! I thought that if a record is contained in the transaction log, it would not be part of a sement. But as soon as we flush the transaction log, it re-indexes the changes into the segment and then commits to disk. But it sounds that a record can be both in the

indexing binary

2014-02-26 Thread ZenMaster80
I index PDFs using apache with the following mapping. .field( type, attachment ) .field(fields) .startObject() .startObject(file) .field(store, yes) .endObject() I want to index photos, I am able to extract text using OCR. I am confused how to index the text though, do I treat it like any

Bug in Java PutMapping API?

2014-02-26 Thread Andre Encarnacao
I am trying to use the Java API (v1.0.0) to create a custom mapping for my index but have run into a problem. More specifically, some of my mapping fields (for example, format and index) are not being stored as part of the mapping in Elastic Search. In fact, the only field that is being stored

APT repositories available?

2014-02-26 Thread Trey Hyde
I'm trying to integrate the apt repositories into our setup according to http://www.elasticsearch.org/blog/apt-and-yum-repositories/ and http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-repositories.html A few days ago, apt would also give 403 for those repos. At

Re: APT repositories available?

2014-02-26 Thread Mark Walkom
I think you have it incorrectly; deb http://packages.elasticsearch.org/elasticsearch/1.0/debian stable main Not; deb http://packages.elasticsearch.org/elasticsearch/1.0/debian main stable Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web:

Re: How to join 2 indexes at query time

2014-02-26 Thread Jayesh Bhoyar
Hi Binh, Thanks for the answer. Is there any case if I index this data into same index with different category GIST@ https://gist.github.com/jsbonline2006/9243973 I have 1 index: productindex/ Type: offertype productindex/ Type: categorytype Now as per my index data: My input will be

Re: How to join 2 indexes at query time

2014-02-26 Thread Matt Weber
How about using parent/child functionality? https://gist.github.com/mattweber/96f3515fc4453a5cb0db Thanks, Matt Weber On Wed, Feb 26, 2014 at 7:45 PM, Jayesh Bhoyar jsbonline2...@gmail.comwrote: Hi Binh, Thanks for the answer. Is there any case if I index this data into same index with

Re: Include special Symbol

2014-02-26 Thread Nick Chang
Hello I already solve this problem. Your suggest is right. Thanks Nick Binh Ly於 2014年2月26日星期三UTC+8下午11時03分29秒寫道: You'll likely need that field to be unanalyzed (i.e. tell ES not to cut it up in the index). One way is to predefine that field in your mapping as: user: { type: string,

Re: Help Understanding custom_filters_score Error

2014-02-26 Thread James Martin
Hi Chris, I'm in the same boat; looking to combine an or filter (so 1 or the other filter matches) with a custom_filters_score in order to boost results which meet a certain criteria. Did you have any luck solving this? On Friday, 3 May 2013 15:26:52 UTC+10, Chris wrote: Of course, posting

Re: Relation Between Heap Size and Total Data Size

2014-02-26 Thread Umutcan
So, I am wondering that is there any relationship between heap size and total data size? Is there any formula to determine heap size based on data size? You might want to check that you're not running out of file handles: http://www.elasticsearch.org/tutorials/too-many-open-files/ Thanks