There seems to be some problem when indexing Mysql data in ES. These are
the logs of ES:
[][DEBUG][NodeClient ] after bulk [18650] [succeeded=93255]
[failed=0] [5ms]
[][DEBUG][NodeClient ] before bulk [18654] of 5 items, 2407
bytes, 1 outstanding bulk requests
Hello,
I need to index 100 000 documents with 1Mo.
This is my configuration of ElasticSearch index:
index: {
type: doc,
bulk_size: 100,
number_of_shards : 5,
number_of_replicas : 2
}
I need to know what each parameters effect.
--
You received this message because you
Hi Guys,
Can someone guide me, If I want to create the index with my field name not
with _id column.
looking for some help from your end.
Regards
Dharmendra
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and
Hi.
I am trying to find a way to express a character count filter in a
querystring, for instance: I need to find all documents with field
subject that holds less than 20 chars.
How would i do that in a querystring ?
/David
--
You received this message because you are subscribed to the Google
Yes, that is correct.
Martijn
On 21 May 2014 02:34, Mark Dodwell m...@mkdynamic.co.uk wrote:
Many thanks, that is a super clear answer.
So, until that issue is addressed, am I correct in thinking I should do
this when percolating an existing document:
```
curl
Hey guys,
in order to meet the german laws for logging, i got the order to store the
elasticsearch indices in a revision/audit-proof way(Indices cannot be
edited/changed after the storage).
Are there any best practices or tips for doing such a thing?(maybe any
plugins?)
Thanks for your
Yeah it looks like that this would do the job, thanks for response
Am Donnerstag, 22. Mai 2014 10:40:19 UTC+2 schrieb Mark Walkom:
You can set indexes to readonly -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-update-settings.html
Is that what you're after?
I'm trying to construct the following SQL query in Elasticsearch:
SELECT companyId, COUNT(*) c FROM visits GROUP BY companyId ORDER BY c DESC
LIMIT 2
I came up with the following JSON body for the query:
{
facets: {
company: {
filter: {
term: {
entityType:
Keep us up to date with your project, I'm sure there would be interested
from others on a similar setup.
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com
On 22 May 2014 18:46, horst knete baduncl...@hotmail.de wrote:
You have to add a facility to your middleware that can trace all authorized
operations to your index (access, read, write, modify, delete) and you must
write this to an append-only logfile with timestamps.
If there is interest I could write such a plugin (assuming it can run in a
trusted
Hi all:
Now , I am trying to index my logs by using the elasticsearch Python API,
but I only get about 600 records/s indexing speed.
but, on the same ES cluster, with the same data, logstash(redis - logstash
- elasticsearch) can index data at the speed about 3000records/s.
any advice on how
Hi Jörg,
thanks for your offer.
I will contact you if there´s a need for such an plugin in our company.
Also i will keep you up to date if there´s breaking changes in our project.
Am Donnerstag, 22. Mai 2014 10:55:44 UTC+2 schrieb Jörg Prante:
You have to add a facility to your middleware
If you use the column name _id, you can control the ID of the ES document
you created by SQL. If you do not use _id, a random doc ID is generated.
See the README at https://github.com/jprante/elasticsearch-river-jdbc
Jörg
On Thu, May 22, 2014 at 11:43 AM, Tanguy Bernard
The call to prepareGetSnapshots(...) for getting a snapshot which is not
existing, throws SnapshotMissingException. But I expect instead it should
return response with a list of zero snapshots(getSnapshots()) or atleast
isExist=false
Is there any other way one can check the existence of a
Hi guys,
Kind of stuck with a fresh installation of an ElasticSearch cluster.
everything is installed file descriptor limits are set yet when I run
curl -XGET http://10.0.8.62:9200/_nodes?os=trueprocess=truepretty=true;
stats.txt
I get
process : {
refresh_interval : 1000,
What OS and how did you install it?
(Running as root is a really bad idea by the way!)
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com
On 22 May 2014 20:19, Shawn Ritchie xritc...@gmail.com wrote:
Hi guys,
Kind of
so this issue only occurs on server restart. If I had to restart
elasticsearch service it would load the correct number of file descriptors.
Regards
Shawn
On Thu, May 22, 2014 at 12:19 PM, Shawn Ritchie xritc...@gmail.com wrote:
Hi guys,
Kind of stuck with a fresh installation of an
CentOS 6.5 and Java 1.7u55
On Thu, May 22, 2014 at 12:28 PM, Shawn Ritchie xritc...@gmail.com wrote:
so this issue only occurs on server restart. If I had to restart
elasticsearch service it would load the correct number of file descriptors.
Regards
Shawn
On Thu, May 22, 2014 at 12:19
Did you use the RPMs? Where are you setting the ulimit?
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com
On 22 May 2014 20:30, Shawn Ritchie xritc...@gmail.com wrote:
CentOS 6.5 and Java 1.7u55
On Thu, May 22, 2014
No I did not use RPM used .tar for the installation process and my ulimit
settings are in
/etc/security/limits.conf
* - nofile 65535
/etc/sysctl.conf
fs.file-max = 512000
On Thu, May 22, 2014 at 12:37 PM, Mark Walkom ma...@campaignmonitor.comwrote:
Did you use the RPMs? Where are you
Hi,
I want to get the average value of MEMORY field from my ES document. Below
is the query I'm using for that. Here I'm getting the aggregation along
with the hits Json also. Is there any way we can get the aggreation result
only. Please suggest.
POST /virtualmachines/_search
{
query : {
Hello
We have a cluster of 3 nodes running Ubuntu 12.04.4 LTS 64bits, and
elasticsearch v1.1.1
It's be running flawlessly but since the last weak some of the nodes
restarts randomly and cluster gets to red state, then yellow, then green
and it happens again in a loop (sometimes it even
How are you running the service, upstart, init or something else?
ES shouldn't just restart on it's own, this could be something else like
the kernel's OOM killer.
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com
On 22
elasticsearch nodes are launched through /etc/init.d/elasticsearch
On Thu, May 22, 2014 at 2:13 PM, Mark Walkom ma...@campaignmonitor.comwrote:
How are you running the service, upstart, init or something else?
ES shouldn't just restart on it's own, this could be something else like
the
Like Mark said, check the oomkiller. It should log to syslog. Its is evil.
Nik
On Thu, May 22, 2014 at 2:14 PM, Jorge Ferrando jorfe...@gmail.com wrote:
elasticsearch nodes are launched through /etc/init.d/elasticsearch
On Thu, May 22, 2014 at 2:13 PM, Mark Walkom
You could use a script filter:
filtered : {
query : {
...
},
filter : {
script : {
script : doc['subject'].value.length() 20
}
}
}
Dan
On Thursday, May 22, 2014 8:45:41 AM UTC+1, David Nielsen wrote:
Hi.
I am trying to find a way to
I've been checking syslog in all of the nodes and I found no mention to
oom, process killed, out of memory or something similar...
Just in caes I ran this commands in the 3 nodes and the problem persists:
echo 0 /proc/sys/vm/oom-kill
echo 1 /proc/sys/vm/overcommit_memory
echo 100
Hello,
I found some informations wich are not complete :
There is no “correct” number of actions to perform in a single bulk call.
You should experiment with different settings to find the optimum size for
your particular workload.
Every time you index a document elasticsearch will decide
Well yes i know that one, is this really the only/best way to do it?.
My application is forwarding an input field directly to a querystring, the
user need to be able to query something like this:
tags:h1 AND subject:lenght20
On Thursday, May 22, 2014 2:30:30 PM UTC+2, Dan Tuffery wrote:
Hi,
what method are you using in your python script? Have you looked at
the bulk and streaming_bulk helpers in ealsticsearch-py?
http://elasticsearch-py.readthedocs.org/en/master/helpers.html
Hope this helps,
Honza
On Thu, May 22, 2014 at 11:09 AM, 潘飞 cnwe...@gmail.com wrote:
Hi all:
Now ,
Is it possible to use this feature with a lookup on multiple documents
(multiple IDs) to supply the terms?
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-terms-filter.html#_terms_lookup_mechanism
I tried this
terms: {
user: {
Hi Guys,
I'm working on an online shop. Currently we are storing the cart's content
in a MySQL Database so we can very easy access the amount of a certain
product and determine the reserved quantity.
This is very important as the amount in the user's carts is reserved so
other users my not by
How is it possible that the count for term 2 is 3 in the first response,
but 2 in the second response?
From the docs:
The size parameter defines how many top terms should be returned out of the
overall terms list. By default, the node coordinating the search process
will ask each shard to
Hi,
if anyone could comment on my code I would be very greatful. I'd like to
know whether my way to set up the index is as it is intended to be.
Thanks!
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop
Hi,
It looks like you have two tables - one that uses the JSONSerDe from cloudera
and another one using es-hadoop.
You configured your es-hadoop table to consider the input as json however it does not receive the proper format (as the
exception indicates).
See this [1] section of the
Hi! This is a sample setup, close to what I am working with
https://gist.github.com/anonymous/6e1457321a8ad78c6af8
As you can see, I am trying to remove the hyphens from all words, so that
words like hand-made are indexed as handmade. The goal is to make a
search for handmade find all
While doing some tests, I thought I uncovered a bug in the
cluster-health/wait-for-yellow request. No matter what settings I tried,
the request would always return immediately with no timeout. I then
realized that the request is actually something like wait for AT LEAST
yellow state. In other
I have five nodes : Two Master Nodes, One Balancer Node, One Workhorse
Node, and One Coordinator Node.
I am shipping events from logstash, redis, to elasticsearch.
At the moment, my cluster is RED. The shards are created but no index is
created. I used to get an index like logstash.2014-05-22,
You can set the size to 0.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html
You will still get back the search metadata though.
--
Ivan
On Thu, May 22, 2014 at 4:46 AM, Subhadip Bagui i.ba...@gmail.com wrote:
Hi,
I want to get the
Martijn took a swing at it just now. He eliminated any scoring-based
slowdown, like so (constant_score_filter)…
curl -s -XGET 'http://127.0.0.1:9200/dxr_test/line/_search?pretty' -d '{
query: {
filtered: {
query: {
match_all: {}
On Wed, May 21, 2014 at 6:01 PM, Erik Rose grinche...@gmail.com wrote:
I'm trying to move Mozilla's source code search engine (dxr.mozilla.org)
from a custom-written SQLite trigram index to ES. In the current production
incarnation, we support fast regex (and, by extension, wildcard) searches
Releases for some reason never get promoted on the mailing list, so here
goes:
http://www.elasticsearch.org/blog/elasticsearch-1-2-0-released/
The main reason why I posted about the release was because I tested out
cross-version cluster compatibility with 1.1.1 and 1.2.0 nodes and
everything
Leading wildcards are really expensive. Maybe you can try creating a copy
of your content field that reverses the tokens using reverse token filter
[1]. By doing this you turn those expensive leading wildcards into
trailing wildcards which should give you better performance. I think your
query
Leading wildcards are really expensive. Maybe you can try creating a copy
of your content field that reverses the tokens using reverse token filter
[1].
Good advice, typically, but notice I have wildcards on either side.
Reversing just makes the trailing wildcard expensive. :-)
--
You
Aye, and then you can use edit distance on single words (fuzzy query) to
cope with fast typers
On May 22, 2014 8:22 PM, Robert Muir robert.m...@elasticsearch.com
wrote:
On Wed, May 21, 2014 at 6:01 PM, Erik Rose grinche...@gmail.com wrote:
I'm trying to move Mozilla's source code search engine
This is definitely a great approach for a database, but it won't work
exactly the same way for an inverted index because the datastructure
is totally different.
Ah, I was afraid of that. I hoped, due to the field being unanalyzed (and
the documentation's noted restriction that wildcard
Alright, try this on for size. :-)
Since the built-in regex-ish filters want to be all clever and index-based,
why not use the JS script plugin, which is happy to run as a
post-processing phase?
curl -s -XGET 'http://127.0.0.1:9200/dxr_test/line/_search?pretty' -d '{
query: {
Hello,
I'm trying to get produce the distribution of documents that matches vs
don't match a query, and get the cardinality of a field for both sets. The
idea is Users who did vs Users who did not. In reality I'm actually
running another aggregation under did not (otherwise I'd just subtract
It does create an index, it says so in the log - [logstash-2014.05.22]
creating index - it's jut not assigning things.
You've set routing.allocation.awareness.attribute, but have you set the
node value, ie node.rack?
See
ES is eventually consistent, so it may not make sense if you're latency
requirements are very strict.
If you can introduce a delay then it should work.
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com
On 22 May 2014
Plugin developers should watch out for changes in classes, e.g.
XContentRestResponse (useful for REST actions) has gone, and there are some
internal API changes in IndexShard methods, also new deprecations
(IndicesStatusAction is now RecoveryAction) - maybe more I did not
recognize yet in my
Thanks for your reply. I set the node.rack to rack_one on all the nodes as
a test. In ElasticHQ, on the right it shows no indices. It is empty. In my
master, I see that the nodes are identifying with rack_one (all of them).
Any other clues?
Thanks
Brian
On Thursday, May 22, 2014 5:10:25 PM
Hurray!
However they are still using the new version, new path release method, so
if you want 1.2 you will need to update your sources to
http://packages.elasticsearch.org/elasticsearch/1.2/$OS
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web:
Went back and read the page again. So I made one master, workhorse, and
balancer with rackid of rack_two for testing. One master shows rackid of
rack_one. All nodes were restarted. The shards are still unassigned. Also,the
indices in ElasticHQ are empty.
--
You received this message because
distinct_countOn Thu, May 22, 2014 at 10:34 PM, Phil Price
philpr...@gmail.com wrote:
I would expect (aggregations.has_thing.dictinct_count.value +
aggregations.does_not_have_thing.distinct_count.value) to be close to
aggreations.total_distinct_count.value, but in reality it's pretty far off
That is not easy, and the reason is that Elasticsearch and Solr work in
quite a different way eg. when it comes to compute facets/aggregations:
Solr first computes top hits, and if facets are required, it will load the
doc IDs of document matches into a bit set that will be used in a
subsequent
On Thu, May 22, 2014 at 3:54 PM, Matthias Feist matf...@gmail.com wrote:
What do you think: Is it wise to implement such a system in elasticsearch?
I'm mostly worried about the time between the add to cart (inserting a
document) and being able to access the total value due to the flushing
Doh! You are correct, my bad. I assumed the filter was an exclusive per
user property, but in fact - it is not.
Thanks for getting back to me
Cheers
Phil
On Thursday, May 22, 2014 4:36:02 PM UTC-7, Adrien Grand wrote:
distinct_countOn Thu, May 22, 2014 at 10:34 PM, Phil Price
Although I would agree that being able to detect it automatically could
make things simpler, I think that the fact that it is excplicit is more
flexible. For example, it can make sense to copy field values into the root
document[1]. This can help speed-up some queries that don't need to know
about
scan is mainly useful as a way to export data from the index. In the
context of a user interface, I think scroll would make more sense[1]. On a
side note, paging improved significantly for scroll requests in
Elasticsearch 1.2 (in both terms of speed and memory usage).
[1]
For those who would come to this thread through a search engine, Dan found
the root cause of this issue
https://github.com/elasticsearch/elasticsearch/issues/6268
On Wed, May 21, 2014 at 8:03 PM, Daniel Low dang...@gmail.com wrote:
Hello,
Has there been any updates to this? We are using
Thanks for the response Adrien. I'm excited to upgrade to 1.2, but it seems
strange to me that people refer to scan vs. scroll (you're not the first)
as scan is simply a search_type that - AFAIK - can be used for any type of
search (scroll or otherwise).
It just seems strange that setting the
Hey!
I'm using Elasticsearch 1.1.1 on ubuntu on java 7:
java version 1.7.0_55
OpenJDK Runtime Environment (IcedTea 2.4.7) (7u55-2.4.7-1ubuntu1)
OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)
It's working perfectly. But, when I try to upgrade to 1.2.0, elasticsearch
won't start:
I would like to have a river in reverse. Every time a document is inserted
or modified I would like to push that into another destination like a
database. Ideally this would be async or maybe even in batches.
Has anybody done anything like this before?
--
You received this message because
Some relevant comments:
https://github.com/elasticsearch/elasticsearch/issues/1242
--
Ivan
On Thu, May 22, 2014 at 8:45 PM, Tim Uckun timuc...@gmail.com wrote:
I would like to have a river in reverse. Every time a document is inserted
or modified I would like to push that into another
Hi Team,
We are experiencing issue with high usage of non heap memory and high
thread count. Mostly we are seeing GC process is running. We are watching
threads using big desk from past two days. Threads are reaching peak. We
are not sure why it is reaching this much high. Two days back we ran
Is it possible to connect with the TransportClient to an ElasticSearch
cluster via a socks proxy? If yes, how?
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
67 matches
Mail list logo