You can not run the Knapsack plugin at transport client side. It must run
at server side in a node being part of the cluster.
Jörg
On Fri, May 29, 2015 at 11:07 AM, Muddadi Hemaanusha
hemaanusha.bu...@gmail.com wrote:
Hi All,
Am using elasticsearch-knapsack plugin for update settings and
This is a long unresolved issue.
One solution would be adding BigDecimal support. See for example
https://github.com/elastic/elasticsearch/pull/5683
Jörg
On Fri, May 22, 2015 at 8:20 AM, Craig Berry craig.adrian.be...@gmail.com
wrote:
Hi there,
I want to be able to provide a text search
and pastebins/gists shouldn't be considered against
the limit. We ask people to use gist all the time and github issue or
code
links are a good thing to use as well.
On May 4, 2015 5:40 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:
Thanks Shaunak,
I appreciate that. I think
On my systems, dentry use is ~18MB while ES 1.5.2 is under heavy duty (RHEL
6.6, Java 8u45, on-premise server).
I think you should double check if the effect you see is caused by ES or by
your JVM/Arch Linux/EC2/whatever.
Jörg
On Mon, May 4, 2015 at 12:47 PM, Pradeep Reddy
It does not work. I can not post messages with links.
After I try to post a new topic such as
- snip
To all of you who want to sneak at the features planned for ES 2.0, this
issue collects some of it
https://github.com/elastic/elasticsearch/issues/9970
Best,
Jörg
snip
I
The number of open files does not depend on the number of documents.
A shard comes not for free. Each shard can take around ~150 open file
descriptors (sockets, segment files) and up to 400-500 if actively being
indexed.
Take care of number of shards, if you have 5 shards per index, and 2000
As said, it depends.
When bulk-indexing documents, for example, my multi-threaded workload is
network-bound. It can easily be made CPU-bound by pre-processing documents
in single thread mode. Certain queries are CPU-bound, others not. If I
retrieve millions of documents in a row, decompression
wow, thanks for sharing!
Best,
Jörg
On Thu, Apr 30, 2015 at 10:43 PM, GWired garrettcjohn...@gmail.com wrote:
The below will build a table in SQL to store Refresh times. The first
time it runs it will put in an entry and going backwards in time until all
records are retrieved. Once
Nice work, can you share the recipe with the community?
I could post it on the JDBC plugin wiki
Jörg
On Wed, Apr 29, 2015 at 1:56 PM, GWired garrettcjohn...@gmail.com wrote:
My theory is that i was overloading my ES VM's on initial loads or when
doing large loads.
My cpu would jump to 99%
First you need to find out if your workload is CPU-bound or if it is
network-bound.
If CPU-bound, go for the virtual machine with best CPU equipment.
If network bound, go for the virtual machine that offers best network
connectivity.
It is very hard to get precise numbers for performance
You are using the binary stream protocol of ES in the writeTo() method
which is not appropriate for writing to files.
Once you added requests to a bulk request, you can not get your content
back as JSON.
A better approach is to use an XContentBuilder with an OutputStream, and
add the content to
You can send a term query after a bulk response, or you can implement your
own bulk action, which returns the custom ID instead of _id.
Jörg
On Mon, Apr 27, 2015 at 9:55 AM, Jakko Sikkar jakko.sik...@gmail.com
wrote:
Hi,
I have a ES mapping with ES unique identifier (_id) and custom
?
Thanks
Manjula
On Fri, Apr 17, 2015 at 2:51 AM, joergpra...@gmail.com
joergpra...@gmail.com wrote:
I have thousands of concurrent indexing/queries running per second on
non-virtualized servers.
4G heap is ok, it is more than enough, there should be other reasons for
OOM I am sure
There are log messages at ES cluster side, you should look there why bulk
indexing failed.
Jörg
On Thu, Apr 23, 2015 at 5:45 AM, GWired garrettcjohn...@gmail.com wrote:
Found this in the logs:
[2015-04-22 22:01:25,063][ERROR][river.jdbc.BulkNodeClient] bulk [15]
failed with 945 failed
With the JDBC plugin, you should slightly increase the requests per bulk
request (maxbulkactions) in order to keep your concurrent bulk requests
low enough to get handled by ES.
The ES bulk thread pool default setting is ok. Please avoid a change.
Jörg
On Thu, Apr 23, 2015 at 12:20 PM,
I implemented CQL for Elasticearch
https://github.com/xbib/elasticsearch-plugin-sru
I do not recommend it for the general case because CQL is inferior to the
power and expressiveness of Elasticsearch DSL. If you have audience that
prefers old school boolean search and do not want ES-specific
Please note, Java 7 has reached end of life, and will no longer receive
updates
https://www.java.com/en/download/faq/java_7.xml
I recommend Java 8.
ES is sensitive to JVM changes (hash codes for hash maps are computed
differently in Java 8) but this exposes only in rare cases.
I am not sure
The column strategy is a community effort, it can manipulate SQL statement
where clauses with timestamp filter.
I do not have enough knowledge about column strategy.
You are correct, at node restart, a river does not know from where to
restart. There is no method to resolve this within river
It is up to the SQL statement to control the rows that are fetched when the
JDBC river restarts.
Note that rivers are deprecated. One of the reason because rivers are
obsoleted is the undefined state if a node restarts. JDBC river simply
re-runs the SQL statement.
Use the JDBC plugin in
Sorry I overlooked it, you use getTookInMillis()
Maybe the extra time is spent because you use a range filter which is not
cached?
Jörg
On Fri, Apr 17, 2015 at 3:02 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:
What time do you measure? The ES query time, or the network latency
What time do you measure? The ES query time, or the network latency?
Jörg
On Fri, Apr 17, 2015 at 2:25 PM, Vishal Mahajan vishal...@gmail.com wrote:
Hi,
I was trying Filtered query (default search type) to fetch first 8k out of
approx 170k matched records. I noticed that on an average query
what you mean by round robin in
concurrent searches.
Regards,
Vishal
On Apr 17, 2015 7:26 PM, joergpra...@gmail.com joergpra...@gmail.com
wrote:
Do you round-robin the four concurrent searches over the cluster nodes?
Jörg
On Fri, Apr 17, 2015 at 3:38 PM, Vishal Mahajan vishal...@gmail.com
Do you round-robin the four concurrent searches over the cluster nodes?
Jörg
On Fri, Apr 17, 2015 at 3:38 PM, Vishal Mahajan vishal...@gmail.com wrote:
I doubt that's the cause as it should also affect sequential searches.
Regards,
Vishal
On Friday, April 17, 2015 at 6:34:24 PM UTC+5:30,
You must delete the river instance userentriessdatariver, and create a new
one.
Jörg
On Fri, Apr 17, 2015 at 12:51 PM, James Crone arafay...@gmail.com wrote:
Hi..
I am new in elastic search and using
https://github.com/jprante/elasticsearch-jdbc and my river setting is:
PUT
Did you assign different heap sizes? Please use same heap size for all data
nodes. Do not limit cache to 30%, this is very small. Let ES use the
default settings.
Jörg
On Thu, Apr 16, 2015 at 5:43 PM, Manjula Piyumal manjulapiyu...@gmail.com
wrote:
Hi all,
I am trying to run load test with
The time required for update depends on the peculiarities of the update
operations, the massive scripting overhead, the refresh operation, and the
segment merge activities that are related.
The number of fields does not matter.
My application has 5000 fields. I avoid updates at all costs. A new
It is possible to write a plugin with IP/subnet as a new field type.
Jörg
On Thu, Apr 16, 2015 at 9:34 PM, Attila Nagy nagy.att...@gmail.com wrote:
Hi,
I would like to store IP addresses and subnets (one or more per document)
and I would like to search for them with exact or inclusion (does
I know I can not influence the decision for Discourse, so here are just my
2 ¢.
The move should also consider that users who register with the new forum
should have the right to export their own contributions to download them
similar to Google takeaway function for Gmail / G+ account.
Also, it
Split-Brain risk is not related to latency, it can happen on any network
which is dynamic.
The main issue is latency, yes. This is a killer. If latency is too high,
real-time systems can be seen as unusable from a user perspective.
Second issue is network bandwith. LAN traffic is a magnitude
May I ask, when you seek for better indexing performance, what your current
performance is? How many nodes ( = hardware machines) do you have?
Jörg
On Tue, Apr 14, 2015 at 1:36 PM, Hajime placeofnomemor...@gmail.com wrote:
Possibly it is IO bound but I don't seem too many io wait on Cpu or
All requests are serialized and deserialized at shard level, it is the only
method of creating executable Lucene queries. There is no client-server
mode at shard level. There would be no huge performance gain of directly
pass in and out, there is nothing much to win, because sooner or later you
You can not penalize terms, you can only reward terms. The trick is to
reward important terms and so all other (unwanted and unknown) terms get
penalized. One method is to analyze sentences for grammar (part-of-speech
tagging) and reward nouns or other keywords with boosting values, and use
an
You can still use the JDBC plugin. It is not only a river, but also a
standalone module, similar to Logstash.
Jörg
On Thu, Apr 9, 2015 at 10:07 PM, Fabio Ebner fabio.eb...@lumera.com.br
wrote:
It's possible to connect the elastic 1.5.1 with my postgresql?? in 1.3.1 I
do this with river
Please note, JDBC plugin is not only a river any more, it can also be used
as a standalone tool like Logstash.
Jörg
On Wed, Apr 8, 2015 at 10:58 AM, James Green james.mk.gr...@gmail.com
wrote:
As discussed elsewhere please avoid Rivers as they are deprecated for
removal.
On 6 April 2015 at
Can you please ask your Oracle DB provider for the JDBC URL and the network
environment setup?
This is for Elasticsearch related questions.
Jörg
On Tue, Apr 7, 2015 at 9:26 AM, Sanu Vimal sanuvi...@gmail.com wrote:
Hi All,
I have the oracle database in the managed cloud. I have not got any
Do you evaluate the bulk request responses?
Jörg
On Tue, Apr 7, 2015 at 11:16 AM, mzrth_7810 afrazmam...@gmail.com wrote:
Hey everyone,
I've been trying to maximise my indexing rate. I'm indexing around a
million documents, using 4 threads. Each thread is indexing at 2500
documents per
The JDBC plugin can not find the JDBC driver jar.
Put a driver jar into the plugins/jdbc folder, and check for permissions.
Do not add all types of mysql connectors - this will not work. Just put
exactly one driver in there.
Jörg
On Mon, Apr 6, 2015 at 5:34 AM, Sanu Vimal sanuvi...@gmail.com
If you have installed the JDBC plugin by the plugin tool with the
Elasticsearch user, it should have created the plugins/jdbc folder.
MySQL 5.0 has ended life since December, 2011. MySQL JDBC 5.0.8 is over
seven years old. I do not think it makes much sense to try old versions.
Please update and
What is your problem with JDBC plugin exactly? Can you post the error
message?
Jörg
On Mon, Apr 6, 2015 at 6:19 AM, Sanu Vimal sanuvi...@gmail.com wrote:
Hi Jorg,
Though converting to binary it still dosent parse do you have any
documentation for jdbc river in windows.The linux one was very
In the facet entries, you will receive the default values of min/max if
total_count is 0, and the defaults are java.lang.Double.POSITIVE_INFINITY
and java.lang.Double.NEGATIVE_INFINITY. That is, ES never updates min/max
while processing values, because there are no values.
I would recommend to
JDBC plugin supports MySQL streaming mode out of the box:
https://github.com/jprante/elasticsearch-river-jdbc/issues/520#issuecomment-89789655
If it does not work, I'm available for help to find alternatives.
I would love to know more about the observed DB timeout. The MySQL timeouts
can be
Please check the logs, you should see error messages.
Jörg
On Sun, Apr 5, 2015 at 12:03 AM, Dan Langille dan.langi...@gmail.com
wrote:
On Saturday, April 4, 2015 at 5:24:47 PM UTC-4, Jörg Prante wrote:
1.4.2 was released last December, so I doubt you have created it in July
or August.
1.4.2 was released last December, so I doubt you have created it in July or
August.
Jörg
On Sat, Apr 4, 2015 at 11:10 PM, Dan Langille dan.langi...@gmail.com
wrote:
I'm seeing this:
{
cluster_name : elasticsearch,
status : red,
timed_out : false,
number_of_nodes : 4,
Do you run MySQL and ES in different timezones?
Jörg
On Fri, Apr 3, 2015 at 2:47 PM, phani.nadimi...@goktree.com wrote:
Hi All,
I have important scenario to share with you regarding mysql river.
I created index it contains date field no format was specified.
the following is the
_all has its own analyzer, if you do not set it, it will be the standard
analyzer by default.
Jörg
On Thu, Apr 2, 2015 at 11:04 AM, Rupert Smith rupertlssm...@googlemail.com
wrote:
Hi,
I think I need to understand how the _all field works when it comes to
analysis. I want to query against
You can set client.transport.sniff to true, then nodes are detected
automatically
http://www.elastic.co/guide/en/elasticsearch/client/java-api/current/client.html#transport-client
Jörg
On Thu, Apr 2, 2015 at 12:25 PM, Jason Wee peich...@gmail.com wrote:
Hello, elasticsearch java transport
It is technically possible to combine analyzers for a single field, see the
combo analyzer https://github.com/yakaz/elasticsearch-analysis-combo/
Jörg
On Thu, Apr 2, 2015 at 11:12 AM, Rupert Smith rupertlssm...@googlemail.com
wrote:
Ok thanks.
Some of the fields are not_analyzed, and
client side.. so does this sniff parameter added will also work when
the downed node come back online?
jason
On Thu, Apr 2, 2015 at 7:25 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:
You can set client.transport.sniff to true, then nodes are detected
automatically
http
You should set k*n shards for your indexes to avoid balancing troubles
with k being a constant integer and n being the number of nodes.
Also keep an eye on dynamic mapping. if you have add thousands of new
field names steadily over time, you exercise the master node.
Jörg
On Wed, Apr 1, 2015
The rule is not new. Do not expose Elasticsearch to the public internet,
just like Postgresql and Gearman.
Jörg
On Tue, Mar 31, 2015 at 8:45 AM, Shohedul Hasan sha...@qianalysis.com
wrote:
Hi,
I am trying to deploy my ES server in Digital ocean. But Digital ocean
had some hacker attack as i
Hi,
if you use the JDBC river plugin and you are concerned about the
deprecation of the river API, I wrote a step-by-step guide how to start the
JDBC plugin in a feeder mode.
The feeder mode is a standalone JVM which connects to an ES cluster using
Java TransportClient under the hood.
You can
Do you have your shards equally distributed over the 4 nodes? Or do you use
the default of 5 shards?
Jörg
On Tue, Mar 31, 2015 at 5:28 PM, Loïc Wenkin loic.wen...@gmail.com wrote:
Hi all,
I meet a load distribution problem today and I browsed the Internet to
find out someone having the same
Elasticsearch is open source, so reading (and using and modifying) the
algorithms is possible. There is also a lot of introductory material
available online, and I recommend Elasticsearch - The definitive guide if
you want paperwork.
If you create an index, ES creates shards for this index (by
I think you mean Node.js. The Java node client does not work on 1 thread.
Use Java API, it is the generic interface to ES. Also note, the setup of
the cluster determines the client performance. There is not much you can do
at client side if your cluster is small and slow.
Jörg
On Mon, Mar 30,
You can combine JDBC plugin and attachment mapper if your database can
convert the blob to base64 string.
Jörg
On Mon, Mar 30, 2015 at 9:07 PM, Kiran Koirala kkoir...@manageforce.com
wrote:
I have a situation where we need to index a contents of a PDF file in
Elastic Search. This can be
, joergpra...@gmail.com
joergpra...@gmail.com wrote:
You can combine JDBC plugin and attachment mapper if your database can
convert the blob to base64 string.
Jörg
On Mon, Mar 30, 2015 at 9:07 PM, Kiran Koirala kkoir...@manageforce.com
wrote:
I have a situation where we need to index
1. If you are not sure about merging, you should look for other reasons for
high load. Identify the processes with high activity. Check if your storage
I/O system can keep up.
2. You can not turn off merging. With indices.store.throttle.type: none you
diasble throttling.
3. Optimizing by manual
Please read guideline at
https://github.com/jprante/elasticsearch-river-jdbc#parameters-inside-of-the-jdbc-block
You can not update tables with data by ES by JDBC plugin. It can only
transport a tabular data stream from RDBMS to ES JSON. For more complex
requirements, you should build an
You should use _id column name in the SQL statement to control update of
documents by their ID.
No need to delete the river, or the index.
Jörg
On Sat, Mar 28, 2015 at 10:38 AM, Abdul Rafay arafay...@gmail.com wrote:
Thank you. I understand :)
On Saturday, March 28, 2015 at 2:36:40 PM
The BulkProcessor is a helper class for managing write requests where large
chunks of documents are combined into a single write request which saves a
lot of network acknowledging ping/pong.
If you send queries, there is a small write request, after which large
response chunks are read, so there
The statement It wastes memory, reduces CPU performance, and makes the GC
struggle with large heaps. reads like there is a catastrophe waiting and
is a bit overstated. It may waste memory usable by the JVM heap, true. But
it does not reduce CPU performance - OOP with LP64 is exercising memory and
You have mixed plain strings and json objects for field json in your data.
This is not allowed.
Jörg
On Fri, Mar 27, 2015 at 10:10 PM, David Kleiner david.klei...@gmail.com
wrote:
Error, with some parameters modified to protect the innocent.
the JSON here does pass the lint validator, is it
I will not doubt your numbers.
The difference may depend on the application workload, how many heap
objects are created. ES is optimized to use very large heap objects to
decrease GC overhead. So I agree the difference for ES may be closer to
0.5 GB / 1 GB and not 8 GB.
Jörg
On Thu, Mar 26,
There is no trouble at all, only a surprise effect to those who do not
understand the effect of compressed OOPs.
Compressed OOPs solve a memory space efficiency problem but work silently.
The challenge is, large object pointers waste some of the CPU memory
bandwith when JVM must access objects on
...@gmail.com
joergpra...@gmail.com wrote:
Logstash has both Java and HTTP output, but I assume you want to use HTTP.
Set http.compression parameter to true in ES configuration, then you can
use gzip-compressed HTTP traffic using Accept-Encoding header.
http://www.elastic.co/guide/en
was almost there, but was
missing either the id of document or at before the binary file.
Thanks.
Marcel
On Wed, Mar 25, 2015 at 2:15 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:
Yes, I mean pushing compressed data.
You have several wrong assumptions: use PUT instead of POST, mark file
Are you using Java API?
Jörg
On Tue, Mar 24, 2015 at 11:59 AM, Marcel Matus matusmar...@gmail.com
wrote:
Hi,
some of our data are big ones (1 - 10 MB), and if there are milions of
those, it causes us trouble in our internal network.
We would like to compress these data in generation time,
See https://github.com/elastic/elasticsearch/pull/7595
This feature is rarely used. Removing it will help reduce the moving parts
of Elasticsearch and focus on the core.
If there is demand, I can jump in and move the bulk UDP code to a
community-supported plugin for ES 2.0
For syslog, I have
...
Marcel
On Tue, Mar 24, 2015 at 1:50 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:
Are you using Java API?
Jörg
On Tue, Mar 24, 2015 at 11:59 AM, Marcel Matus matusmar...@gmail.com
wrote:
Hi,
some of our data are big ones (1 - 10 MB), and if there are milions of
those
Is this version 1.4.4?
Can you create a thread dump with tools like jstack?
If many threads are in the state BLOCKING, this would be interesting.
Jörg
On Mon, Mar 23, 2015 at 11:47 AM, Sharmi Banerjee bonny.rocko...@gmail.com
wrote:
I'm also facing the same issue.
I have copied 20 index
ES uses several threadpools. Some are fixed sized, some are scalable, and
the reference is the JVM available core count, i.e.
Runtime.getRuntime().availableProcessors(), which can be overridden by a
processors directive:
Thanks for the hint that similarity class should be in the ES lib folder. I
will try this to see if that enables my plugin code to have per-field
custom similarity.
Payloads are a broad subject. For example, in my plugin, payload filters
are missing. Let's assume you use UIMA or some NLP tagging.
Caching filters are implemented in ES, not in Lucene. E.g.
org.elasticsearch,common.lucene.search.CachedFilter is a class that
implements cached filters on the base of Lucene filter class.
The format is not only bitsets. The Lucene filter instance is cached, no
matter if it is doc sets or bit
If thread counts go out of bounds, it may be a lockup somewhere. What
version of ES do you use?
Jörg
On Fri, Mar 20, 2015 at 2:08 PM, Abid Hussain huss...@novacom.mygbiz.com
wrote:
Thanks for clarification.
Still I wonder why such a huge amount of thread is created and if this can
lead to
Hm, I doubt it is ok if a 1.4.0 node has 195 threads in state BLOCKED:
Thread 19374: (state = BLOCKED)
- sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information
may be imprecise)
- java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14,
line=175 (Compiled frame)
I think you should check a thread dump created by tools like jstack if you
have a high JVM thread count in state BLOCKED. This might be a pointer that
something unusual is going on, but I'm not sure.
Jörg
On Fri, Mar 20, 2015 at 4:41 PM, Abid Hussain huss...@novacom.mygbiz.com
wrote:
We're
There are several concepts:
- filter operation (bool, range/geo/script)
- filter composition (composable or not, composable means bitsets are used)
- filter caching (ES stores filter results or not, if not cached, ES must
walk doc-by-doc to apply filter)
#1 says you should take care what kind of
There is a connection pool. Netty connections are pooled, they can connect
to multiple nodes at the same time.
All requests are submitted asynchronously. It means, submitting and
receiving may happen on different threads. They do not block.
Jörg
On Wed, Mar 18, 2015 at 8:47 AM, Abid Hussain
Is it possible to examine the code of your plugin?
Generally speaking, analyzers are instantiated per index creation for each
thread.
In org.elasticsearch.index.analysis.AnalysisModule, you can see how
analyzer providers and factories are prepared for injection by the help of
the ES injection
Do you use an analyzer provider?
Example
public class RussianLemmatizingTwitterAnalyzerProvider extends
AbstractIndexAnalyzerProviderRussianLemmatizingTwitterAnalyzer {
private final MorphAnalyzer morphAnalyzer;
...
@Inject
public
I do not understand what you mean by Solr handles XML input and output
automatically. You have to set up Solr schema and configuration to process
your XML documents.
My plugin does not convert XML to JSON. It makes Elasticsearch understand
XML natively by using a streaming parser that processes
In the get() method of the provider, I would better try to always return a
new analyzer instance.
The configuration and setup of the analyzer could be refactored to the
provider.
Jörg
On Wed, Mar 18, 2015 at 8:12 PM, Dmitry Kan dmitry@gmail.com wrote:
Yes, I use an analyzer provider. Here
The concrete implementation depends on what you store in the payload (e.g.
scores)
Jörg
On Tue, Mar 17, 2015 at 7:01 AM, Devaraja Swami devarajasw...@gmail.com
wrote:
I need to use PayloadTermQuery from Lucene.
Does anyone know how I can use this in ElasticSearch?
I am using ES 1.4.4, with
It strongly depends on the method how you want to convert XML to JSON and
vice versa.
Maybe this plugin can give you some hints about Jackson XML regarding
parsing and formatting
https://github.com/jprante/elasticsearch-xml
Do not expect XML schema, validation, or XSL stylesheet, this is not
in Lucene.
On Tue, Mar 17, 2015 at 2:16 AM, joergpra...@gmail.com
joergpra...@gmail.com wrote:
The concrete implementation depends on what you store in the payload
(e.g. scores)
Jörg
On Tue, Mar 17, 2015 at 7:01 AM, Devaraja Swami devarajasw...@gmail.com
wrote:
I need to use
Which article is that?
It does not matter, you can send search and bulk requests to all nodes. ES
will do the routing and automatically forward the requests to the nodes
where they can be executed.
Jörg
On Mon, Mar 16, 2015 at 4:40 PM, chenlin rao rao.chen...@gmail.com wrote:
Hello, anyone.
Sorry for being unclear, the TermsEnum array is one (the most important) of
the arrays for iteration, the other arrays are also not thread safe - you
can view all the private class variables as a thread-private cache.
NumericDocValues is the key component for retrieving the version.
Jörg
On Mon,
This is not high. The JVM always uses the whole heap to avoid garbage
collection as much as possible. In ES, a threshold is set to 75% before CMS
garbage collection kicks in.
Jörg
On Mon, Mar 16, 2015 at 4:39 AM, chris85l...@googlemail.com wrote:
Hello,
We have a 2 node elasticsearch cluster
Have you considered doc values?
http://www.elastic.co/guide/en/elasticsearch/guide/current/doc-values.html
Jörg
On Sun, Mar 15, 2015 at 11:11 PM, Lindsey Poole lpo...@gmail.com wrote:
Hey guys,
I have a question about the mechanics of aggregation and sorting w.r.t.
the fielddata cache. I
It is not thread safe because of the TermsEnum array, which can not be
shared between threads. By not sharing, a thread can reuse the array, which
avoids expensive reinitialization.
The utility class was introduced at
https://github.com/elastic/elasticsearch/issues/6212
and from what I
...@gmail.com
joergpra...@gmail.com wrote:
Have you considered doc values?
http://www.elastic.co/guide/en/elasticsearch/guide/current/doc-values.html
Jörg
On Sun, Mar 15, 2015 at 11:11 PM, Lindsey Poole lpo...@gmail.com wrote:
Hey guys,
I have a question about the mechanics
If you have thousands of tenants with thousands of potentially overlapping
mappings that should operate independently, the hardware sizing of a
cluster is a challenge, yes.
OTOH you can play tricks at your search/index front end API if you can hide
ES internals from the customers, e.g. prefixing
You may use a single index with enough shards for users and use routing for
accessing the shard where a user ID has the docs indexed. See also shard
overallocation
http://www.elastic.co/guide/en/elasticsearch/guide/current/overallocation.html
and
You may try limit direct memory on JVM level by
using -XX:MaxDirectMemorySize (default is unlimited). See also
ES_DIRECT_SIZE in
http://www.elastic.co/guide/en/elasticsearch/reference/current/setup-service.html#_linux
I recommend at least 2GB
Jörg
On Sat, Mar 14, 2015 at 1:03 AM, Lindsey Poole
I'm out - no experience with EC2. I avoid foreign servers at all cost.
Maybe 120G RAM is affected by swap/memory overcommit. Do not forget to
check memlock and memory ballooning. The chances are few you can control
host settings as a guest in a virtual server environment.
Jörg
On Sat, Mar 14,
From which source did you assume that %20 is a white space?
The mapping char filter understands \u notation (which is not
documented in ES).
With curl, on bash, you have to escape the \u notation with double
backslash like this
. = \\u0020
Here is a working example
How do you observe there is only one core?
Elastisearch uses many threads by default and as many cores as possible.
Jörg
On Fri, Mar 13, 2015 at 12:40 PM, Alexander Petrovsky askju...@gmail.com
wrote:
Hi!
I have the same problem on my singe elasticsearch instance.
# dpkg -l | grep elas
Yes, please upgrade Elasticsearch to use the official german normalizer.
I added it to decompound plugin for convenience, it may be removed at any
later time.
Jörg
On Wed, Mar 11, 2015 at 9:54 PM, Krešimir Slugan kresimir.slu...@gmail.com
wrote:
Thanks!
I assume that german_normalize is
Use german_normalization
german_normalize is the same filter I implemented in my plugin
https://github.com/jprante/elasticsearch-analysis-german/blob/master/src/main/java/org/xbib/elasticsearch/index/analysis/german/GermanAnalysisBinderProcessor.java
when it was not available in ES core.
Jörg
Use something like this for node name
public class MyService extends AbstractLifeCycleComponentMyService {
@Inject
public MyService(Settings settings, Node node) {
super(settings);
String name = node.settings().get(name);
...
}
and for node IDs
public class MyService extends
1 - 100 of 1234 matches
Mail list logo