Re: elasticsearch-knapsack plugin for update settings throwing exception in main with Client

2015-05-29 Thread joergpra...@gmail.com
You can not run the Knapsack plugin at transport client side. It must run at server side in a node being part of the cluster. Jörg On Fri, May 29, 2015 at 11:07 AM, Muddadi Hemaanusha hemaanusha.bu...@gmail.com wrote: Hi All, Am using elasticsearch-knapsack plugin for update settings and

Re: Searching on exponent numbers

2015-05-22 Thread joergpra...@gmail.com
This is a long unresolved issue. One solution would be adding BigDecimal support. See for example https://github.com/elastic/elasticsearch/pull/5683 Jörg On Fri, May 22, 2015 at 8:20 AM, Craig Berry craig.adrian.be...@gmail.com wrote: Hi there, I want to be able to provide a text search

Re: Forums Are Now Live at http://discuss.elastic.co

2015-05-11 Thread joergpra...@gmail.com
and pastebins/gists shouldn't be considered against the limit. We ask people to use gist all the time and github issue or code links are a good thing to use as well. On May 4, 2015 5:40 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: Thanks Shaunak, I appreciate that. I think

Re: Memory usage of the machine with ES is continuously increasing

2015-05-07 Thread joergpra...@gmail.com
On my systems, dentry use is ~18MB while ES 1.5.2 is under heavy duty (RHEL 6.6, Java 8u45, on-premise server). I think you should double check if the effect you see is caused by ES or by your JVM/Arch Linux/EC2/whatever. Jörg On Mon, May 4, 2015 at 12:47 PM, Pradeep Reddy

Re: Forums Are Now Live at http://discuss.elastic.co

2015-05-04 Thread joergpra...@gmail.com
It does not work. I can not post messages with links. After I try to post a new topic such as - snip To all of you who want to sneak at the features planned for ES 2.0, this issue collects some of it https://github.com/elastic/elasticsearch/issues/9970 Best, Jörg snip I

Re: too many open files problems and suggestions on cluster configuration

2015-05-01 Thread joergpra...@gmail.com
The number of open files does not depend on the number of documents. A shard comes not for free. Each shard can take around ~150 open file descriptors (sockets, segment files) and up to 400-500 if actively being indexed. Take care of number of shards, if you have 5 shards per index, and 2000

Re: More memory or more CPU cores help better performance?

2015-04-30 Thread joergpra...@gmail.com
As said, it depends. When bulk-indexing documents, for example, my multi-threaded workload is network-bound. It can easily be made CPU-bound by pre-processing documents in single thread mode. Certain queries are CPU-bound, others not. If I retrieve millions of documents in a row, decompression

Re: JDBC River missing documents??

2015-04-30 Thread joergpra...@gmail.com
wow, thanks for sharing! Best, Jörg On Thu, Apr 30, 2015 at 10:43 PM, GWired garrettcjohn...@gmail.com wrote: The below will build a table in SQL to store Refresh times. The first time it runs it will put in an entry and going backwards in time until all records are retrieved. Once

Re: JDBC River missing documents??

2015-04-29 Thread joergpra...@gmail.com
Nice work, can you share the recipe with the community? I could post it on the JDBC plugin wiki Jörg On Wed, Apr 29, 2015 at 1:56 PM, GWired garrettcjohn...@gmail.com wrote: My theory is that i was overloading my ES VM's on initial loads or when doing large loads. My cpu would jump to 99%

Re: More memory or more CPU cores help better performance?

2015-04-29 Thread joergpra...@gmail.com
First you need to find out if your workload is CPU-bound or if it is network-bound. If CPU-bound, go for the virtual machine with best CPU equipment. If network bound, go for the virtual machine that offers best network connectivity. It is very hard to get precise numbers for performance

Re: Convert bulk request to json document and publish that document to ES as a seperate task

2015-04-28 Thread joergpra...@gmail.com
You are using the binary stream protocol of ES in the writeTo() method which is not appropriate for writing to files. Once you added requests to a bulk request, you can not get your content back as JSON. A better approach is to use an XContentBuilder with an OutputStream, and add the content to

Re: Get certain fields in bulk API response

2015-04-27 Thread joergpra...@gmail.com
You can send a term query after a bulk response, or you can implement your own bulk action, which returns the custom ID instead of _id. Jörg On Mon, Apr 27, 2015 at 9:55 AM, Jakko Sikkar jakko.sik...@gmail.com wrote: Hi, I have a ES mapping with ES unique identifier (_id) and custom

Re: ES load test ended up with out of memory error after enabling the clustering

2015-04-25 Thread joergpra...@gmail.com
? Thanks Manjula On Fri, Apr 17, 2015 at 2:51 AM, joergpra...@gmail.com joergpra...@gmail.com wrote: I have thousands of concurrent indexing/queries running per second on non-virtualized servers. 4G heap is ok, it is more than enough, there should be other reasons for OOM I am sure

Re: JDBC River missing documents??

2015-04-23 Thread joergpra...@gmail.com
There are log messages at ES cluster side, you should look there why bulk indexing failed. Jörg On Thu, Apr 23, 2015 at 5:45 AM, GWired garrettcjohn...@gmail.com wrote: Found this in the logs: [2015-04-22 22:01:25,063][ERROR][river.jdbc.BulkNodeClient] bulk [15] failed with 945 failed

Re: bulk index request dataloss

2015-04-23 Thread joergpra...@gmail.com
With the JDBC plugin, you should slightly increase the requests per bulk request (maxbulkactions) in order to keep your concurrent bulk requests low enough to get handled by ES. The ES bulk thread pool default setting is ok. Please avoid a change. Jörg On Thu, Apr 23, 2015 at 12:20 PM,

Re: FIQL for abstraction of Query Syntax

2015-04-22 Thread joergpra...@gmail.com
I implemented CQL for Elasticearch https://github.com/xbib/elasticsearch-plugin-sru I do not recommend it for the general case because CQL is inferior to the power and expressiveness of Elasticsearch DSL. If you have audience that prefers old school boolean search and do not want ES-specific

Re: upgrade java for elasticsearch node

2015-04-22 Thread joergpra...@gmail.com
Please note, Java 7 has reached end of life, and will no longer receive updates https://www.java.com/en/download/faq/java_7.xml I recommend Java 8. ES is sensitive to JVM changes (hash codes for hash maps are computed differently in Java 8) but this exposes only in rare cases. I am not sure

Re: jdbcRiver rebuilding after restart.

2015-04-20 Thread joergpra...@gmail.com
The column strategy is a community effort, it can manipulate SQL statement where clauses with timestamp filter. I do not have enough knowledge about column strategy. You are correct, at node restart, a river does not know from where to restart. There is no method to resolve this within river

Re: jdbcRiver rebuilding after restart.

2015-04-19 Thread joergpra...@gmail.com
It is up to the SQL statement to control the rows that are fetched when the JDBC river restarts. Note that rivers are deprecated. One of the reason because rivers are obsoleted is the undefined state if a node restarts. JDBC river simply re-runs the SQL statement. Use the JDBC plugin in

Re: Concurrent searches over same dataset degrades performance

2015-04-17 Thread joergpra...@gmail.com
Sorry I overlooked it, you use getTookInMillis() Maybe the extra time is spent because you use a range filter which is not cached? Jörg On Fri, Apr 17, 2015 at 3:02 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: What time do you measure? The ES query time, or the network latency

Re: Concurrent searches over same dataset degrades performance

2015-04-17 Thread joergpra...@gmail.com
What time do you measure? The ES query time, or the network latency? Jörg On Fri, Apr 17, 2015 at 2:25 PM, Vishal Mahajan vishal...@gmail.com wrote: Hi, I was trying Filtered query (default search type) to fetch first 8k out of approx 170k matched records. I noticed that on an average query

Re: Concurrent searches over same dataset degrades performance

2015-04-17 Thread joergpra...@gmail.com
what you mean by round robin in concurrent searches. Regards, Vishal On Apr 17, 2015 7:26 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: Do you round-robin the four concurrent searches over the cluster nodes? Jörg On Fri, Apr 17, 2015 at 3:38 PM, Vishal Mahajan vishal...@gmail.com

Re: Concurrent searches over same dataset degrades performance

2015-04-17 Thread joergpra...@gmail.com
Do you round-robin the four concurrent searches over the cluster nodes? Jörg On Fri, Apr 17, 2015 at 3:38 PM, Vishal Mahajan vishal...@gmail.com wrote: I doubt that's the cause as it should also affect sequential searches. Regards, Vishal On Friday, April 17, 2015 at 6:34:24 PM UTC+5:30,

Re: Update River Settings MYSQL JDBC

2015-04-17 Thread joergpra...@gmail.com
You must delete the river instance userentriessdatariver, and create a new one. Jörg On Fri, Apr 17, 2015 at 12:51 PM, James Crone arafay...@gmail.com wrote: Hi.. I am new in elastic search and using https://github.com/jprante/elasticsearch-jdbc and my river setting is: PUT

Re: ES load test ended up with out of memory error after enabling the clustering

2015-04-16 Thread joergpra...@gmail.com
Did you assign different heap sizes? Please use same heap size for all data nodes. Do not limit cache to 30%, this is very small. Let ES use the default settings. Jörg On Thu, Apr 16, 2015 at 5:43 PM, Manjula Piyumal manjulapiyu...@gmail.com wrote: Hi all, I am trying to run load test with

Re: How many fields is too many?

2015-04-16 Thread joergpra...@gmail.com
The time required for update depends on the peculiarities of the update operations, the massive scripting overhead, the refresh operation, and the segment merge activities that are related. The number of fields does not matter. My application has 5000 fields. I avoid updates at all costs. A new

Re: Storing/searching IPs

2015-04-16 Thread joergpra...@gmail.com
It is possible to write a plugin with IP/subnet as a new field type. Jörg On Thu, Apr 16, 2015 at 9:34 PM, Attila Nagy nagy.att...@gmail.com wrote: Hi, I would like to store IP addresses and subnets (one or more per document) and I would like to search for them with exact or inclusion (does

Re: Evaluating Moving to Discourse - Feedback Wanted

2015-04-15 Thread joergpra...@gmail.com
I know I can not influence the decision for Discourse, so here are just my 2 ¢. The move should also consider that users who register with the new forum should have the right to export their own contributions to download them similar to Google takeaway function for Gmail / G+ account. Also, it

Re: Cross-DC clusters - specific dangers

2015-04-14 Thread joergpra...@gmail.com
Split-Brain risk is not related to latency, it can happen on any network which is dynamic. The main issue is latency, yes. This is a killer. If latency is too high, real-time systems can be seen as unusable from a user perspective. Second issue is network bandwith. LAN traffic is a magnitude

Re: refresh_interval:10s is better than refresh_interval:-1?

2015-04-14 Thread joergpra...@gmail.com
May I ask, when you seek for better indexing performance, what your current performance is? How many nodes ( = hardware machines) do you have? Jörg On Tue, Apr 14, 2015 at 1:36 PM, Hajime placeofnomemor...@gmail.com wrote: Possibly it is IO bound but I don't seem too many io wait on Cpu or

Re: true embedded mode

2015-04-13 Thread joergpra...@gmail.com
All requests are serialized and deserialized at shard level, it is the only method of creating executable Lucene queries. There is no client-server mode at shard level. There would be no huge performance gain of directly pass in and out, there is nothing much to win, because sooner or later you

Re: how to lower the significance of a certain phrase

2015-04-12 Thread joergpra...@gmail.com
You can not penalize terms, you can only reward terms. The trick is to reward important terms and so all other (unwanted and unknown) terms get penalized. One method is to analyze sentences for grammar (part-of-speech tagging) and reward nouns or other keywords with boosting values, and use an

Re: Elastic 1.5.1 + postgresql

2015-04-09 Thread joergpra...@gmail.com
You can still use the JDBC plugin. It is not only a river, but also a standalone module, similar to Logstash. Jörg On Thu, Apr 9, 2015 at 10:07 PM, Fabio Ebner fabio.eb...@lumera.com.br wrote: It's possible to connect the elastic 1.5.1 with my postgresql?? in 1.3.1 I do this with river

Re: river jdbc plugin install for windows-not working

2015-04-08 Thread joergpra...@gmail.com
Please note, JDBC plugin is not only a river any more, it can also be used as a standalone tool like Logstash. Jörg On Wed, Apr 8, 2015 at 10:58 AM, James Green james.mk.gr...@gmail.com wrote: As discussed elsewhere please avoid Rivers as they are deprecated for removal. On 6 April 2015 at

Re: Connection String for Oracle Database present in managed cloud

2015-04-07 Thread joergpra...@gmail.com
Can you please ask your Oracle DB provider for the JDBC URL and the network environment setup? This is for Elasticsearch related questions. Jörg On Tue, Apr 7, 2015 at 9:26 AM, Sanu Vimal sanuvi...@gmail.com wrote: Hi All, I have the oracle database in the managed cloud. I have not got any

Re: bulk index request dataloss

2015-04-07 Thread joergpra...@gmail.com
Do you evaluate the bulk request responses? Jörg On Tue, Apr 7, 2015 at 11:16 AM, mzrth_7810 afrazmam...@gmail.com wrote: Hey everyone, I've been trying to maximise my indexing rate. I'm indexing around a million documents, using 4 threads. Each thread is indexing at 2500 documents per

Re: Connect Mysql to elastic search

2015-04-06 Thread joergpra...@gmail.com
The JDBC plugin can not find the JDBC driver jar. Put a driver jar into the plugins/jdbc folder, and check for permissions. Do not add all types of mysql connectors - this will not work. Just put exactly one driver in there. Jörg On Mon, Apr 6, 2015 at 5:34 AM, Sanu Vimal sanuvi...@gmail.com

Re: river jdbc plugin install for windows-not working

2015-04-06 Thread joergpra...@gmail.com
If you have installed the JDBC plugin by the plugin tool with the Elasticsearch user, it should have created the plugins/jdbc folder. MySQL 5.0 has ended life since December, 2011. MySQL JDBC 5.0.8 is over seven years old. I do not think it makes much sense to try old versions. Please update and

Re: Exception with posting jdbc river config

2015-04-06 Thread joergpra...@gmail.com
What is your problem with JDBC plugin exactly? Can you post the error message? Jörg On Mon, Apr 6, 2015 at 6:19 AM, Sanu Vimal sanuvi...@gmail.com wrote: Hi Jorg, Though converting to binary it still dosent parse do you have any documentation for jdbc river in windows.The linux one was very

Re: With 1.5.0, facet date_histograms min/max now return Infinity or -Infinity instead of numeric values?

2015-04-06 Thread joergpra...@gmail.com
In the facet entries, you will receive the default values of min/max if total_count is 0, and the defaults are java.lang.Double.POSITIVE_INFINITY and java.lang.Double.NEGATIVE_INFINITY. That is, ES never updates min/max while processing values, because there are no values. I would recommend to

Re: chunk-wise incremental data feed into ES jdbc feeder/river

2015-04-05 Thread joergpra...@gmail.com
JDBC plugin supports MySQL streaming mode out of the box: https://github.com/jprante/elasticsearch-river-jdbc/issues/520#issuecomment-89789655 If it does not work, I'm available for help to find alternatives. I would love to know more about the observed DB timeout. The MySQL timeouts can be

Re: red shard status - why please?

2015-04-04 Thread joergpra...@gmail.com
Please check the logs, you should see error messages. Jörg On Sun, Apr 5, 2015 at 12:03 AM, Dan Langille dan.langi...@gmail.com wrote: On Saturday, April 4, 2015 at 5:24:47 PM UTC-4, Jörg Prante wrote: 1.4.2 was released last December, so I doubt you have created it in July or August.

Re: red shard status - why please?

2015-04-04 Thread joergpra...@gmail.com
1.4.2 was released last December, so I doubt you have created it in July or August. Jörg On Sat, Apr 4, 2015 at 11:10 PM, Dan Langille dan.langi...@gmail.com wrote: I'm seeing this: { cluster_name : elasticsearch, status : red, timed_out : false, number_of_nodes : 4,

Re: Elastic Search river date pattern

2015-04-03 Thread joergpra...@gmail.com
Do you run MySQL and ES in different timezones? Jörg On Fri, Apr 3, 2015 at 2:47 PM, phani.nadimi...@goktree.com wrote: Hi All, I have important scenario to share with you regarding mysql river. I created index it contains date field no format was specified. the following is the

Re: Stop words returning results?

2015-04-02 Thread joergpra...@gmail.com
_all has its own analyzer, if you do not set it, it will be the standard analyzer by default. Jörg On Thu, Apr 2, 2015 at 11:04 AM, Rupert Smith rupertlssm...@googlemail.com wrote: Hi, I think I need to understand how the _all field works when it comes to analysis. I want to query against

Re: transport client threadpool and down node

2015-04-02 Thread joergpra...@gmail.com
You can set client.transport.sniff to true, then nodes are detected automatically http://www.elastic.co/guide/en/elasticsearch/client/java-api/current/client.html#transport-client Jörg On Thu, Apr 2, 2015 at 12:25 PM, Jason Wee peich...@gmail.com wrote: Hello, elasticsearch java transport

Re: Stop words returning results?

2015-04-02 Thread joergpra...@gmail.com
It is technically possible to combine analyzers for a single field, see the combo analyzer https://github.com/yakaz/elasticsearch-analysis-combo/ Jörg On Thu, Apr 2, 2015 at 11:12 AM, Rupert Smith rupertlssm...@googlemail.com wrote: Ok thanks. Some of the fields are not_analyzed, and

Re: transport client threadpool and down node

2015-04-02 Thread joergpra...@gmail.com
client side.. so does this sniff parameter added will also work when the downed node come back online? jason On Thu, Apr 2, 2015 at 7:25 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: You can set client.transport.sniff to true, then nodes are detected automatically http

Re: Wrong load distribution

2015-04-01 Thread joergpra...@gmail.com
You should set k*n shards for your indexes to avoid balancing troubles with k being a constant integer and n being the number of nodes. Also keep an eye on dynamic mapping. if you have add thousands of new field names steadily over time, you exercise the master node. Jörg On Wed, Apr 1, 2015

Re: Security Suggestion In Elasticsearch

2015-03-31 Thread joergpra...@gmail.com
The rule is not new. Do not expose Elasticsearch to the public internet, just like Postgresql and Gearman. Jörg On Tue, Mar 31, 2015 at 8:45 AM, Shohedul Hasan sha...@qianalysis.com wrote: Hi, I am trying to deploy my ES server in Digital ocean. But Digital ocean had some hacker attack as i

[ANN] JDBC plugin with feeder mode as an alternative to the deprecated Elasticsearch River API

2015-03-31 Thread joergpra...@gmail.com
Hi, if you use the JDBC river plugin and you are concerned about the deprecation of the river API, I wrote a step-by-step guide how to start the JDBC plugin in a feeder mode. The feeder mode is a standalone JVM which connects to an ES cluster using Java TransportClient under the hood. You can

Re: Wrong load distribution

2015-03-31 Thread joergpra...@gmail.com
Do you have your shards equally distributed over the 4 nodes? Or do you use the default of 5 shards? Jörg On Tue, Mar 31, 2015 at 5:28 PM, Loïc Wenkin loic.wen...@gmail.com wrote: Hi all, I meet a load distribution problem today and I browsed the Internet to find out someone having the same

Re: what are the research papers that ES relies on?

2015-03-30 Thread joergpra...@gmail.com
Elasticsearch is open source, so reading (and using and modifying) the algorithms is possible. There is also a lot of introductory material available online, and I recommend Elasticsearch - The definitive guide if you want paperwork. If you create an index, ES creates shards for this index (by

Re: which is the fastest client that could handle most requests per second? (any benchmarks?)

2015-03-30 Thread joergpra...@gmail.com
I think you mean Node.js. The Java node client does not work on 1 thread. Use Java API, it is the generic interface to ES. Also note, the setup of the cluster determines the client performance. There is not much you can do at client side if your cluster is small and slow. Jörg On Mon, Mar 30,

Re: Can 2 ElasticSearch plugins interact?

2015-03-30 Thread joergpra...@gmail.com
You can combine JDBC plugin and attachment mapper if your database can convert the blob to base64 string. Jörg On Mon, Mar 30, 2015 at 9:07 PM, Kiran Koirala kkoir...@manageforce.com wrote: I have a situation where we need to index a contents of a PDF file in Elastic Search. This can be

Re: Can 2 ElasticSearch plugins interact?

2015-03-30 Thread joergpra...@gmail.com
, joergpra...@gmail.com joergpra...@gmail.com wrote: You can combine JDBC plugin and attachment mapper if your database can convert the blob to base64 string. Jörg On Mon, Mar 30, 2015 at 9:07 PM, Kiran Koirala kkoir...@manageforce.com wrote: I have a situation where we need to index

Re: Unexpected high CPU / IO loading

2015-03-29 Thread joergpra...@gmail.com
1. If you are not sure about merging, you should look for other reasons for high load. Identify the processes with high activity. Check if your storage I/O system can keep up. 2. You can not turn off merging. With indices.store.throttle.type: none you diasble throttling. 3. Optimizing by manual

Re: How to delete index permanently Elastic Search

2015-03-28 Thread joergpra...@gmail.com
Please read guideline at https://github.com/jprante/elasticsearch-river-jdbc#parameters-inside-of-the-jdbc-block You can not update tables with data by ES by JDBC plugin. It can only transport a tabular data stream from RDBMS to ES JSON. For more complex requirements, you should build an

Re: How to delete index permanently Elastic Search

2015-03-28 Thread joergpra...@gmail.com
You should use _id column name in the SQL statement to control update of documents by their ID. No need to delete the river, or the index. Jörg On Sat, Mar 28, 2015 at 10:38 AM, Abdul Rafay arafay...@gmail.com wrote: Thank you. I understand :) On Saturday, March 28, 2015 at 2:36:40 PM

Re: ActionRequest support for BulkProcessor in the Java API

2015-03-27 Thread joergpra...@gmail.com
The BulkProcessor is a helper class for managing write requests where large chunks of documents are combined into a single write request which saves a lot of network acknowledging ping/pong. If you send queries, there is a small write request, after which large response chunks are read, so there

Re: ESLucene 32GB heap myth or fact?

2015-03-27 Thread joergpra...@gmail.com
The statement It wastes memory, reduces CPU performance, and makes the GC struggle with large heaps. reads like there is a catastrophe waiting and is a bit overstated. It may waste memory usable by the JVM heap, true. But it does not reduce CPU performance - OOP with LP64 is exercising memory and

Re: ES Choking on Seemingly Valid JSON

2015-03-27 Thread joergpra...@gmail.com
You have mixed plain strings and json objects for field json in your data. This is not allowed. Jörg On Fri, Mar 27, 2015 at 10:10 PM, David Kleiner david.klei...@gmail.com wrote: Error, with some parameters modified to protect the innocent. the JSON here does pass the lint validator, is it

Re: ESLucene 32GB heap myth or fact?

2015-03-26 Thread joergpra...@gmail.com
I will not doubt your numbers. The difference may depend on the application workload, how many heap objects are created. ES is optimized to use very large heap objects to decrease GC overhead. So I agree the difference for ES may be closer to 0.5 GB / 1 GB and not 8 GB. Jörg On Thu, Mar 26,

Re: ESLucene 32GB heap myth or fact?

2015-03-26 Thread joergpra...@gmail.com
There is no trouble at all, only a surprise effect to those who do not understand the effect of compressed OOPs. Compressed OOPs solve a memory space efficiency problem but work silently. The challenge is, large object pointers waste some of the CPU memory bandwith when JVM must access objects on

Re: PUT gzipped data into elasticsearch

2015-03-25 Thread joergpra...@gmail.com
...@gmail.com joergpra...@gmail.com wrote: Logstash has both Java and HTTP output, but I assume you want to use HTTP. Set http.compression parameter to true in ES configuration, then you can use gzip-compressed HTTP traffic using Accept-Encoding header. http://www.elastic.co/guide/en

Re: PUT gzipped data into elasticsearch

2015-03-25 Thread joergpra...@gmail.com
was almost there, but was missing either the id of document or at before the binary file. Thanks. Marcel On Wed, Mar 25, 2015 at 2:15 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: Yes, I mean pushing compressed data. You have several wrong assumptions: use PUT instead of POST, mark file

Re: PUT gzipped data into elasticsearch

2015-03-24 Thread joergpra...@gmail.com
Are you using Java API? Jörg On Tue, Mar 24, 2015 at 11:59 AM, Marcel Matus matusmar...@gmail.com wrote: Hi, some of our data are big ones (1 - 10 MB), and if there are milions of those, it causes us trouble in our internal network. We would like to compress these data in generation time,

Re: Bulk UDP deprecated?

2015-03-24 Thread joergpra...@gmail.com
See https://github.com/elastic/elasticsearch/pull/7595 This feature is rarely used. Removing it will help reduce the moving parts of Elasticsearch and focus on the core. If there is demand, I can jump in and move the bulk UDP code to a community-supported plugin for ES 2.0 For syslog, I have

Re: PUT gzipped data into elasticsearch

2015-03-24 Thread joergpra...@gmail.com
... Marcel On Tue, Mar 24, 2015 at 1:50 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: Are you using Java API? Jörg On Tue, Mar 24, 2015 at 11:59 AM, Marcel Matus matusmar...@gmail.com wrote: Hi, some of our data are big ones (1 - 10 MB), and if there are milions of those

Re: Queue capacity

2015-03-23 Thread joergpra...@gmail.com
Is this version 1.4.4? Can you create a thread dump with tools like jstack? If many threads are in the state BLOCKING, this would be interesting. Jörg On Mon, Mar 23, 2015 at 11:47 AM, Sharmi Banerjee bonny.rocko...@gmail.com wrote: I'm also facing the same issue. I have copied 20 index

Re: Limit large number of threads

2015-03-23 Thread joergpra...@gmail.com
ES uses several threadpools. Some are fixed sized, some are scalable, and the reference is the JVM available core count, i.e. Runtime.getRuntime().availableProcessors(), which can be overridden by a processors directive:

Re: PayloadTermQuery in ElasticSearch

2015-03-20 Thread joergpra...@gmail.com
Thanks for the hint that similarity class should be in the ES lib folder. I will try this to see if that enables my plugin code to have per-field custom similarity. Payloads are a broad subject. For example, in my plugin, payload filters are missing. Let's assume you use UIMA or some NLP tagging.

Re: filter bitsets

2015-03-20 Thread joergpra...@gmail.com
Caching filters are implemented in ES, not in Lucene. E.g. org.elasticsearch,common.lucene.search.CachedFilter is a class that implements cached filters on the base of Lucene filter class. The format is not only bitsets. The Lucene filter instance is cached, no matter if it is doc sets or bit

Re: Limit large number of threads

2015-03-20 Thread joergpra...@gmail.com
If thread counts go out of bounds, it may be a lockup somewhere. What version of ES do you use? Jörg On Fri, Mar 20, 2015 at 2:08 PM, Abid Hussain huss...@novacom.mygbiz.com wrote: Thanks for clarification. Still I wonder why such a huge amount of thread is created and if this can lead to

Re: Limit large number of threads

2015-03-20 Thread joergpra...@gmail.com
Hm, I doubt it is ok if a 1.4.0 node has 195 threads in state BLOCKED: Thread 19374: (state = BLOCKED) - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise) - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=175 (Compiled frame)

Re: Limit large number of threads

2015-03-20 Thread joergpra...@gmail.com
I think you should check a thread dump created by tools like jstack if you have a high JVM thread count in state BLOCKED. This might be a pointer that something unusual is going on, but I'm not sure. Jörg On Fri, Mar 20, 2015 at 4:41 PM, Abid Hussain huss...@novacom.mygbiz.com wrote: We're

Re: filter bitsets

2015-03-19 Thread joergpra...@gmail.com
There are several concepts: - filter operation (bool, range/geo/script) - filter composition (composable or not, composable means bitsets are used) - filter caching (ES stores filter results or not, if not cached, ES must walk doc-by-doc to apply filter) #1 says you should take care what kind of

Re: Java API TransportClient Threadpool

2015-03-18 Thread joergpra...@gmail.com
There is a connection pool. Netty connections are pooled, they can connect to multiple nodes at the same time. All requests are submitted asynchronously. It means, submitting and receiving may happen on different threads. They do not block. Jörg On Wed, Mar 18, 2015 at 8:47 AM, Abid Hussain

Re: issue with singleton analyzer in single JVM multi-index setup

2015-03-18 Thread joergpra...@gmail.com
Is it possible to examine the code of your plugin? Generally speaking, analyzers are instantiated per index creation for each thread. In org.elasticsearch.index.analysis.AnalysisModule, you can see how analyzer providers and factories are prepared for injection by the help of the ES injection

Re: issue with singleton analyzer in single JVM multi-index setup

2015-03-18 Thread joergpra...@gmail.com
Do you use an analyzer provider? Example public class RussianLemmatizingTwitterAnalyzerProvider extends AbstractIndexAnalyzerProviderRussianLemmatizingTwitterAnalyzer { private final MorphAnalyzer morphAnalyzer; ... @Inject public

Re: Indexing and Searching XML documents

2015-03-18 Thread joergpra...@gmail.com
I do not understand what you mean by Solr handles XML input and output automatically. You have to set up Solr schema and configuration to process your XML documents. My plugin does not convert XML to JSON. It makes Elasticsearch understand XML natively by using a streaming parser that processes

Re: issue with singleton analyzer in single JVM multi-index setup

2015-03-18 Thread joergpra...@gmail.com
In the get() method of the provider, I would better try to always return a new analyzer instance. The configuration and setup of the analyzer could be refactored to the provider. Jörg On Wed, Mar 18, 2015 at 8:12 PM, Dmitry Kan dmitry@gmail.com wrote: Yes, I use an analyzer provider. Here

Re: PayloadTermQuery in ElasticSearch

2015-03-17 Thread joergpra...@gmail.com
The concrete implementation depends on what you store in the payload (e.g. scores) Jörg On Tue, Mar 17, 2015 at 7:01 AM, Devaraja Swami devarajasw...@gmail.com wrote: I need to use PayloadTermQuery from Lucene. Does anyone know how I can use this in ElasticSearch? I am using ES 1.4.4, with

Re: Indexing and Searching XML documents

2015-03-17 Thread joergpra...@gmail.com
It strongly depends on the method how you want to convert XML to JSON and vice versa. Maybe this plugin can give you some hints about Jackson XML regarding parsing and formatting https://github.com/jprante/elasticsearch-xml Do not expect XML schema, validation, or XSL stylesheet, this is not

Re: PayloadTermQuery in ElasticSearch

2015-03-17 Thread joergpra...@gmail.com
in Lucene. On Tue, Mar 17, 2015 at 2:16 AM, joergpra...@gmail.com joergpra...@gmail.com wrote: The concrete implementation depends on what you store in the payload (e.g. scores) Jörg On Tue, Mar 17, 2015 at 7:01 AM, Devaraja Swami devarajasw...@gmail.com wrote: I need to use

Re: Is bulk index sending to data nodes better or non-data nodes?

2015-03-16 Thread joergpra...@gmail.com
Which article is that? It does not matter, you can send search and bulk requests to all nodes. ES will do the routing and automatically forward the requests to the nodes where they can be executed. Jörg On Mon, Mar 16, 2015 at 4:40 PM, chenlin rao rao.chen...@gmail.com wrote: Hello, anyone.

Re: PerThreadIDAndVersionLookup - thread safety

2015-03-16 Thread joergpra...@gmail.com
Sorry for being unclear, the TermsEnum array is one (the most important) of the arrays for iteration, the other arrays are also not thread safe - you can view all the private class variables as a thread-private cache. NumericDocValues is the key component for retrieving the version. Jörg On Mon,

Re: Elasticsearch high heap usage

2015-03-16 Thread joergpra...@gmail.com
This is not high. The JVM always uses the whole heap to avoid garbage collection as much as possible. In ES, a threshold is set to 75% before CMS garbage collection kicks in. Jörg On Mon, Mar 16, 2015 at 4:39 AM, chris85l...@googlemail.com wrote: Hello, We have a 2 node elasticsearch cluster

Re: Aggregation / Sort and CircuitBreakingException

2015-03-15 Thread joergpra...@gmail.com
Have you considered doc values? http://www.elastic.co/guide/en/elasticsearch/guide/current/doc-values.html Jörg On Sun, Mar 15, 2015 at 11:11 PM, Lindsey Poole lpo...@gmail.com wrote: Hey guys, I have a question about the mechanics of aggregation and sorting w.r.t. the fielddata cache. I

Re: PerThreadIDAndVersionLookup - thread safety

2015-03-15 Thread joergpra...@gmail.com
It is not thread safe because of the TermsEnum array, which can not be shared between threads. By not sharing, a thread can reuse the array, which avoids expensive reinitialization. The utility class was introduced at https://github.com/elastic/elasticsearch/issues/6212 and from what I

Re: Aggregation / Sort and CircuitBreakingException

2015-03-15 Thread joergpra...@gmail.com
...@gmail.com joergpra...@gmail.com wrote: Have you considered doc values? http://www.elastic.co/guide/en/elasticsearch/guide/current/doc-values.html Jörg On Sun, Mar 15, 2015 at 11:11 PM, Lindsey Poole lpo...@gmail.com wrote: Hey guys, I have a question about the mechanics

Re: Field names with the same name across types having different index/type in Elasticsearch

2015-03-14 Thread joergpra...@gmail.com
If you have thousands of tenants with thousands of potentially overlapping mappings that should operate independently, the hardware sizing of a cluster is a challenge, yes. OTOH you can play tricks at your search/index front end API if you can hide ES internals from the customers, e.g. prefixing

Re: Is there limitation how many indices could I create in ES cluster? and performance?

2015-03-14 Thread joergpra...@gmail.com
You may use a single index with enough shards for users and use routing for accessing the shard where a user ID has the docs indexed. See also shard overallocation http://www.elastic.co/guide/en/elasticsearch/guide/current/overallocation.html and

Re: What configuration is available to control MemoryMapDirectory

2015-03-14 Thread joergpra...@gmail.com
You may try limit direct memory on JVM level by using -XX:MaxDirectMemorySize (default is unlimited). See also ES_DIRECT_SIZE in http://www.elastic.co/guide/en/elasticsearch/reference/current/setup-service.html#_linux I recommend at least 2GB Jörg On Sat, Mar 14, 2015 at 1:03 AM, Lindsey Poole

Re: What configuration is available to control MemoryMapDirectory

2015-03-14 Thread joergpra...@gmail.com
I'm out - no experience with EC2. I avoid foreign servers at all cost. Maybe 120G RAM is affected by swap/memory overcommit. Do not forget to check memlock and memory ballooning. The chances are few you can control host settings as a guest in a virtual server environment. Jörg On Sat, Mar 14,

Re: whitespace tokenizer not working as I'd expect

2015-03-13 Thread joergpra...@gmail.com
From which source did you assume that %20 is a white space? The mapping char filter understands \u notation (which is not documented in ES). With curl, on bash, you have to escape the \u notation with double backslash like this . = \\u0020 Here is a working example

Re: multi-core support for elasticsearch

2015-03-13 Thread joergpra...@gmail.com
How do you observe there is only one core? Elastisearch uses many threads by default and as many cores as possible. Jörg On Fri, Mar 13, 2015 at 12:40 PM, Alexander Petrovsky askju...@gmail.com wrote: Hi! I have the same problem on my singe elasticsearch instance. # dpkg -l | grep elas

Re: char_filter for German

2015-03-12 Thread joergpra...@gmail.com
Yes, please upgrade Elasticsearch to use the official german normalizer. I added it to decompound plugin for convenience, it may be removed at any later time. Jörg On Wed, Mar 11, 2015 at 9:54 PM, Krešimir Slugan kresimir.slu...@gmail.com wrote: Thanks! I assume that german_normalize is

Re: char_filter for German

2015-03-11 Thread joergpra...@gmail.com
Use german_normalization german_normalize is the same filter I implemented in my plugin https://github.com/jprante/elasticsearch-analysis-german/blob/master/src/main/java/org/xbib/elasticsearch/index/analysis/german/GermanAnalysisBinderProcessor.java when it was not available in ES core. Jörg

Re: Plugin: Getting LocalNode

2015-03-06 Thread joergpra...@gmail.com
Use something like this for node name public class MyService extends AbstractLifeCycleComponentMyService { @Inject public MyService(Settings settings, Node node) { super(settings); String name = node.settings().get(name); ... } and for node IDs public class MyService extends

  1   2   3   4   5   6   7   8   9   10   >