Re: How to clone an index with all it's documents

2014-11-23 Thread Ajay Divakaran
I've created a simple script to do the same. Below is the Github url https://github.com/ajaydivakaran/es-index-cloner On Friday, November 21, 2014 9:23:31 PM UTC+5:30, Ori P wrote: I'm looking for a way to clone an entire index with all it's documents into a new index with a different name.

Json documents with large number of fields

2014-11-23 Thread Ajay Divakaran
Hi all, I have around 20K JSON documents that have around 350 fields to be pushed into ES (0.90.13). The mapping type is all multi valued string fields. Currently I have a single shard and 0 replicas. My question is whether ES(Lucene) is capable of handling these large documents? Or does it

Specifying search fields in search request

2014-11-23 Thread Ajay Divakaran
Hi all, I'm using ES version 0.90.7. My requirement is to fetch all the fields of the JSON documents that I search/filter for. Is there a performance difference in specifying all the field names in the search request vs not specifying the 'fields' parameter. By not specifying the fields

Re: Treatment of special characters in elasticsearch

2014-11-23 Thread joergpra...@gmail.com
Then you have an error in your program. I tried it and it works. There is no character handling, except lowercase filter. See: https://gist.github.com/jprante/4e5cd1ac2220ef8c39ca Jörg On Sun, Nov 23, 2014 at 4:27 AM, prachi...@gmail.com wrote: I am using Java Transport client. Where do we

Re: Json documents with large number of fields

2014-11-23 Thread joergpra...@gmail.com
This is not large. I have 5500 fields in ~100m docs, most fields are not analyzed / not indexed. Sure, this works perfectly. Jörg On Sun, Nov 23, 2014 at 10:19 AM, Ajay Divakaran ajay.divakara...@gmail.com wrote: Hi all, I have around 20K JSON documents that have around 350 fields to be

Re: Specifying search fields in search request

2014-11-23 Thread joergpra...@gmail.com
Each field given in search request adds up to search performance because the field must be searched, weighted, scored, and ranked. I recommend using _all field, or combining fields into fewer fields with copy_to, and use as few fields as possible for search. Jörg On Sun, Nov 23, 2014 at 10:56

Re: Specifying search fields in search request

2014-11-23 Thread Ajay Divakaran
Jorg Thanks for your reply. Just to clarify the 'fields' that I was referring was for retrieval not against which to search. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send

Memory leak related issue with Threadlocals in elasticsearch

2014-11-23 Thread prachicsa
I have a web service, where every time a button is pressed from UI, it connects to elastic search and fires a query. This is the code which is executed every time. The issue is that after a while, intermittently, the UI hangs. private static final String CONFIG_CLUSTER_NAME = cluster.name;

Re: Memory leak related issue with Threadlocals in elasticsearch

2014-11-23 Thread joergpra...@gmail.com
You do not correctly instantiate TransportClient as a singleton. From the code snippet it is not clear, but it seems that you recreate TransportClient at each query, which is not correct. Jörg On Sun, Nov 23, 2014 at 4:36 PM, prachi...@gmail.com wrote: I have a web service, where every time a

Re: Strange shard counts

2014-11-23 Thread joergpra...@gmail.com
If you have a single node cluster, they are unassigned. Jörg On Sat, Nov 22, 2014 at 8:38 PM, Jingzhao Ou jingzhao...@gmail.com wrote: Hi, I checked the index stats by visiting http://localhost:9200/raw/_stats?pretty { _shards : { total : 10, successful : 5, failed : 0

Re: Json documents with large number of fields

2014-11-23 Thread Ajay Divakaran
Jorg, Thanks for your reply. Does the memory spike when you retrieve these documents in bulk as a result of a query/filter result? On Sunday, November 23, 2014 7:43:49 PM UTC+5:30, Jörg Prante wrote: This is not large. I have 5500 fields in ~100m docs, most fields are not analyzed / not

Re: Elastic Search as cache

2014-11-23 Thread Jingzhao Ou
Hi, Jörg, doc_values is an awesome feature even for my normal uses! I have lots of numeric data that does not need string analyzer. I can move these data to be managed by the OS file system. For my memcached case, I have _source: { enabled: false } to the whole index and enable doc_values

Re: Strange shard counts

2014-11-23 Thread Jingzhao Ou
Hi, Jörg, Yes, I got that now. On my development PC, I changed things in elasticsearch.yml to index.number_of_shards: 1 index.number_of_replicas: 0 The stats output becomes: _shards : { total : 1, successful : 1, failed : 0 }, Thanks a lot for your help! Jingzhao On

Re: Strange shard counts

2014-11-23 Thread Ajay Divakaran
The 5 that were not assigned is that because you set a limit on the max shards per node? On Sunday, November 23, 2014 11:24:34 PM UTC+5:30, Jingzhao Ou wrote: Hi, Jörg, Yes, I got that now. On my development PC, I changed things in elasticsearch.yml to index.number_of_shards: 1

Re: Strange shard counts

2014-11-23 Thread Jingzhao Ou
Hi, Ajay, The 5 that were not assigned is that because you set a limit on the max shards per node? I don't see any limit on the max shards per node. I did a Google search and cannot find such parameter. I cannot only see memory limits. Mind sharing more details? Thanks a lot for your

Re: Strange shard counts

2014-11-23 Thread Ajay Divakaran
See the section on Total shards per node. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-allocation.html On Monday, November 24, 2014 12:15:54 AM UTC+5:30, Jingzhao Ou wrote: Hi, Ajay, The 5 that were not assigned is that because you set a limit on the

Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-23 Thread Konstantin Erman
Advice to increase indices.recovery.concurrent_streams sounds suspiciously specific to me :-) What made you so confident that it is the bottleneck for recovery in most cases? And how cluster.routing.allocation.node_concurrent_recoveries should be set? On Sunday, November 23, 2014 6:27:40 AM

terms filter with the value to match in upercase is not possible?

2014-11-23 Thread dashaus
Hi!, I have a document like this: { type: film, countries: [US, ES] } And i insert it in elasticsearch, then i do the follow search: GET _search?search_type=dfs_query_and_fetch { query: { filtered: { query: { term: {type: film} }, filter: { query: {

Re: Json documents with large number of fields

2014-11-23 Thread joergpra...@gmail.com
No. Use filtered query to avoid memory spikes by post filter. Jörg On Sun, Nov 23, 2014 at 6:22 PM, Ajay Divakaran ajay.divakara...@gmail.com wrote: Jorg, Thanks for your reply. Does the memory spike when you retrieve these documents in bulk as a result of a query/filter result? On

Re: Elastic Search as cache

2014-11-23 Thread joergpra...@gmail.com
Can you rephrase your question, I do not understand Jörg On Sun, Nov 23, 2014 at 6:43 PM, Jingzhao Ou jingzhao...@gmail.com wrote: My remaining question is: how can I verify the field data is needed on disk? -- You received this message because you are subscribed to the Google Groups

Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-23 Thread joergpra...@gmail.com
The default indices recovery performance is limited by 3 concurrent streams and 20MB/sec. This is very slow on my machines. YMMV. Jörg On Sun, Nov 23, 2014 at 9:01 PM, Konstantin Erman kon...@gmail.com wrote: Advice to increase indices.recovery.concurrent_streams sounds suspiciously specific

scan query that returns document values only is heavily accessing the *.FDT file .

2014-11-23 Thread Tzahi jakubovitz
Hi all, I have a tests index with 43 million documenst. there is a string document value for each document. (about 5-10 character value for each document) Mapping is: { myindex : { mappings : { num_type : { _type : { store : true },

Re: Elastic Search as cache

2014-11-23 Thread Jingzhao Ou
Sorry about that. I meant to ask: with doc_values set to true, how to verify that the field data is stored in file system cache, instead of JVM heap. Thanks, Jingzhao On Sunday, November 23, 2014 2:39:32 PM UTC-8, Jörg Prante wrote: Can you rephrase your question, I do not understand

Re: terms filter with the value to match in upercase is not possible?

2014-11-23 Thread Ivan Brusic
A term query will not analyze the search terms, so if your countries field is using the default analyze, there will be no match since the standard analyzer will lowercase the terms. Either set your field as not_analyzed or use another query such as match. -- Ivan On Sat, Nov 22, 2014 at 4:35

Re: Odd behavior of bulk loading speed - good riddle?

2014-11-23 Thread Christopher Ambler
Nobody? No ideas why bulk upserts slow down over time? Loading 9 million documents starts off at 2000+ per second and, by hour three, is down to 300 per second. The whole job takes the better part of 8 hours, with this linear slowdown. Nobody has an idea? I'm drawing a blank, myself! -- You

Re: Odd behavior of bulk loading speed - good riddle?

2014-11-23 Thread Mark Walkom
FYI it is the weekend still for parts of the world, and we all enjoy our time off :) How many nodes do you have? What is your heap size? Are you monitoring your system and ES, if so what does it tell you? Have you tried increasing the bulk count? On 24 November 2014 at 16:48, Christopher Ambler