Hi Friends,
I have indexed Wiki in elasticsearch but it's approximately 3 month old.
Now I want to update wiki pages in elasticseach without lost of existing
data.
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group
Hi,
I'm running elasticsearch through windows using command prompt. The same
started as below.
D:\elasticsearch\elasticsearch-1.0.1\binelasticsearch
[2014-04-08 13:39:00,199][WARN ][bootstrap] jvm uses the
client vm, make sure to run `java` with the ser
ver vm for best
If you use the Java client you probably also have to tell it to not sniff
out the other nodes in the cluster.
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
Hey,
it is possible to not specify an output, then every input becomes the
output. However, this does not allow to create a unified output, which, I
think, is the whole purpose. There is no way to find out, which of the
input actually has matched currently.
Trying to have very good outputs like,
The boost you define in the 'multi_match' query is not being show in the
explain results, so it is not being applied to the score. It should be
displayed in the weight, i.e.
description: weight(DISPLAY_NAME^8:happy in 33593) [PerFieldSimilarity],
result of:
The 'phrase_prefix' type is the
Hey,
can you try to use date formats instead of the named identificators like
date_time and see if it works? Also, can you check the exception of the
elasticsearch logs?
--Alex
On Wed, Apr 9, 2014 at 4:22 AM, Tim Uckun timuc...@gmail.com wrote:
I have a search like this
{
size:
This looks wrong,
*Suggest.Suggestion.Entry.Option option2 =
suggestResponse.getSuggest().getSuggestion(completion).getEntries().get(0).getOptions().get(0);*
it should be:
*Suggest.Suggestion.Entry.Option option2 = *
Hi,
I'm trying to use a singleton instance of client for multiple index
creation. Below is the code for the same. But every time I'm getting
instance as null and it's creating a new instance. Please let me know what
I'm doing wrong
*singleton instance* :
public class ESClientSingleton {
Hey guys,
I have a request time, in which the documents the request_time, the ip and
other data which are not relevant right now.
I need to get the visitors over time. Getting the visits over time is easy
with a histogram aggregation, so is getting the unique visitors with a terms
aggregation,
*Follow KMT and Not GMT*
* A group of scientist later made a conference on this issue and,
being inspired, Saudi authority placed the special clock on Makkah Tower.
After fixing the world's largest clock in the Makkah Shareef Tower. *
* Saudi government can assist Muslims the
I recently had a problem with an index and after searching the net I decided to
give checkIndex a try. I found the class in the right jar but I haven't been
able to get it to check an index. For example when I run
checkIndex -verbose ...heat-analyzer/7/index
I get:
ERROR: could not read any
I recently had a problem with an index and after searching the net I
decided to give checkIndex a try. I found the class in the right jar but I
haven't been able to get it to check an index. For example when I run
checkIndex -verbose ...heat-analyzer/7/index
I get:
ERROR: could not read any
Hello!
I am using the JDBC river plugin (latest version with the name
elasticsearch-river-jdbc-2.2.1.jar on ES 0.90.5) over some very large
views and, so, I wait for the bulk requests to finish, count the total
number of indexed documents to see if it is alright, and delete the river.
Hi,
I am securing an ES installation and try to restrict the number of returned
documents by configuration (for all ES queries). It's possible to use
filtered aliases
(http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-aliases.html),
but I cannot find a working
hello,
is there a possibility to get Hadoop metrics
(http://www.elasticsearch.org/blog/elasticsearch-apache-hadoop-1-3-m3/ )
monitored in realtime through marvel plugin or through a kibana dashboard
?
and in this case what index to use to query ?
Phi
thanks a lot
--
You received this
Hi,
I'm using Kibana 3.0 with ElasticSearch 1.0.1. It seems that sometimes
Kibana adds *semicolons* to the body of an xml message. This causes the xml
to be invalid when we copy/parse them for further analysis.
An example of a fragment of what we see in Kibana:
soap:Envelope
Hi Michael,
Running checkindex from the command line can prove to be tricky, since
checkindex is not aware of things like multi-data-path directories that you
might be using with Elasticsearch. Moreover, you will need to add the
Elasticsearch jars to the classpath since Elasticsearch customizes
Hey,
can you create a complete example, including mapping, indexing and
searching, so one can reproduce locally?
--Alex
On Wed, Apr 9, 2014 at 12:39 PM, Viktor Nordling
viktor.nordl...@gmail.comwrote:
So here's my use case: say that you have 50 points, with _different_ radii.
A new point
Hi david,
My json file contains data like this
{ field1 : value1 }
{ field1 : value2 }
{ field1 : value3 }
Is it possible to index with fsriver?.
My river/index creation as below,
curl -XPUT 'localhost:9200/_river/security/_meta' -d '{
type:fs,
fs: {
url:
Any other thoughts on this? Would 1500 segments per shard be significantly
impacting performance? Have you guys noticed this behavior elsewhere?
Thanks.
On Monday, April 7, 2014 8:56:38 AM UTC-4, Elliott Bradshaw wrote:
Adrian,
I ran the following command:
curl -XPUT
Not at the moment, no. However we plan to add this to capability for real-time
monitoring to Marvel.
On 4/9/14 12:55 PM, Phil gib wrote:
hello,
is there a possibility to get Hadoop metrics
(http://www.elasticsearch.org/blog/elasticsearch-apache-hadoop-1-3-m3/ )
monitored in realtime
Hi Elliott,
1500 segments per shard is certainly way too much, and it is not normal
that optimize doesn't manage to reduce the number of segments.
- Is there anything suspicious in the logs?
- Have you customized the merge policy or scheduler?[1]
- Does the issue still reproduce if you restart
Hi,
I have configured a single node ES with logstash 1.4.0 (8GB memory) with
the following configuration:
-
index.number_of_shards: 7
-
number_of_replicas: 0
-
refresh_interval: -1
-
translog.flush_threshold_ops: 10
-
Hi,
I'm using Kibana 3.0 with ElasticSearch 1.0.1. It seems that sometimes
Kibana adds semicoloms to the body of an xml message. This causes the xml to
be invalid when we copy/parse them for further analysis.
An example of a fragment of what we see in Kibana:
soap:Envelope
Try putting filtered inside query, for example:
{
query: {
filtered: {
filter: {
match_all: {}
}
}
}
}
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from
Hi Alex,
Thanks for your response. Yes I agree, autocomplete is to find a unified
name, but it is also extremely handy to know what the matched word was. You
can technically infer this from adding the doc id within pay load, and thus
look it up afterwards but this is long winded. One main
Greetings,
I have a question about Shard balancing.
I have a 5 Node Cluster with a particular Index that has 4 Shards, 1 Replica
So 8 Shards total.
And when I look at the Index it is allocated as follows; (Note: P=primary,
R=replica)
Node 1: P0
Node 2: P1, P2, P3
Node 3:
Node 4: R0
Node 5: R1,
Hi Adrien,
I did customize my merge policy, although I did so only because I was so
surprised by the number of segments left over after the load. I'm pretty
sure the optimize problem was happening before I made this change, but
either way here are my settings:
index : {
merge : {
policy : {
I am unable to delete certain indexes in my cluster - using it for logging,
new indexes per day:
curl -XDELETE http://172.16.1.100:9200/2014_03_27
{error:ProcessClusterEventTimeoutException[failed to process cluster
event (acquire index lock) within 30s],status:503}
curl -XDELETE
From the Java API, you should be able to do something like:
client.prepareIndex().setTimestamp(blah)
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
This may have some useful information:
http://www.elasticsearch.org/webinars/elasticsearch-pre-flight-checklist/
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
Scenario:
Log files parsed by logstash generated 13mil results in elasticsearch.
After a server restart the shards are not allocated anymore.
The Roy Russo dashboard says there's a problem with the swap space.
I've tried setting mlockall to true, but then I found out that it only
works on
Shooting in the dark here, but here it goes:
1. Do you have anything else running on the system? for example AVs are
known to cause slow-downs for such services, and other I/O or memory heavy
services could cause thrashing or just general slowdown
2. What JVM version are you running this with?
The number of documents is not relevant to the search time.
Important factors for search time are the type of query, shard size, the
number of unique terms (the dictionary size), the number of segments,
network latency, disk drive latency, ...
Maybe you mean equal distribution of docs with same
No. They are not valid json files.
You need to provide one json file per document.
--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr
Le 9 avril 2014 à 14:36:31, Prasanth R (prasanth.sunr...@gmail.com) a écrit:
Hi david,
My json file contains data like
I assume this error is triggered because your http client closed the
connection before reading the response fully. It is not related to JDBC
river.
Jörg
On Wed, Apr 9, 2014 at 11:48 AM, André Morais ano...@gmail.com wrote:
Hello!
I am using the JDBC river plugin (latest version with
Hi Adrien,
I kept the logs up over the last optimize call, and I did see an
exception. I Ctrl-C'd a curl optimize call before making another one, but
I don't think that that caused this exception. The error is essentially as
follows:
netty - Caught exception while handling client http
Can you clarify what you mean by added to the same index and to the same
document? Maybe you can give an example of what you want to achieve.
Jörg
On Wed, Apr 9, 2014 at 1:46 AM, Srinivasan Ramaswamy ursva...@gmail.comwrote:
I am using elasticsearch to index documents. I have a few tables in
The exception is just a side effect because you pressed ctrl-c and the
response could not be transmitted back, it does not point to the problem.
You should use
http://localhost:9200/index/_optimize?max_num_segments=1
instead of
http://localhost:9200/index/_optimize
Jörg
--
You received this
I tried executing this example from previous post which an user was able to
run:
https://gist.github.com/surajtamang/3616612
However, my output was:
https://gist.github.com/rahurkar/10231685
with no output for facets.
Did this functionality break or change from what's described in the
Thanks Jorg. That makes sense. I am actually using max_num_segments=1,
just forgot to add it...
On Wed, Apr 9, 2014 at 11:20 AM, joergpra...@gmail.com
joergpra...@gmail.com wrote:
The exception is just a side effect because you pressed ctrl-c and the
response could not be transmitted back,
I would like to use Phrase Suggester
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-phrase.html
. I've got a problem. When typing johni depp,it returns several results
in this order:
john depp
johnny depp
joann depp
johnn depp
How can I sort the
You can limit the off-heap space used by setting ES_DIRECT_SIZE.
--
Ivan
On Tue, Apr 8, 2014 at 1:31 PM, Yitzhak Kesselman ikessel...@gmail.comwrote:
Hi,
I have experienced same behavior when I have tried to load large amount of
data... If you clear the file system cache
Unfortunately not at the moment. The bucket not only determines the date
range but also all the documents that the metrics are computed on. The only
thing I can think of is you'll probably need to define the individual
ranges/filters yourself and build multiple filter buckets in your query
I'd be curious to see an example of such a query! :)
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this
Would SQLstream be applicable here?
On Monday, April 7, 2014 9:27:58 AM UTC-7, Chris Holt wrote:
Hi, I'm trying to set up a real time streaming dashboard for logs, which
would collect logs using fluentd or similar, and all I would want to do is
extract running statistics from the data eg
Thank you again Ivan (and sorry for the silence, I was away these last few
days).
I made the jar with maven, the problem that I have now is a compilation
failure due to the override annotation in NormRemovalSimilarity.java (*method
does not override or implement a method from a supertype*).
Here's an example. If I use aggregations to search for the top 10 most
frequent messages:
POST _search
{
query: {
match: {
loglevel: error
}
},
aggs: {
freqent_msgs: {
terms: {
field: message.raw,
size: 10
}
}
}
}
I end up with a list
How much data are we talking about? is it feasible to shovel new data to
ES periodically? so changes are made to a data store and are only pushed to
ES once or twice a day?
Personally I'd prefer having non-stable results instead of having to deal
with this. The only place that you really want
Hi Patrick,
This issue is my fault. When I used my custom similarity, I was using
Elasticsearch 0.90.2 (which uses Lucene 4.3.1). It looks like this method
was changed in Lucene 4.4 to use a long instead of a byte:
Just a thought, I'm wondering if you can just build a bool query with
multiple should clauses and just boost each clause. And then I would
imagine 1 bool clause will do the proximity query (span) like you describe
above, and then another clause will do the multi_match. Maybe?
--
You received
Sure. I am trying to index a bunch of products (unique product) and each
product can have multiple tags (product sold my a merchant). I am planning
to add tags a nested document in the index. The product and tags
information are stored in productdb database. And there are search_tags (in
the
Sure a bool filter with multiple must term filters should do it.
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To
Th script could iterate and do something like this for example:
{
script: for ($a: ctx._source.association) { if ($a.code == 546)
$a.imagepath='zzz'; }
}
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop
Lukas, thank you, I will pass along :)
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the
Hello,
I have an event timestamp in hive table which I'm breaking into
evnt_time(timestamp) and an evnt_date(string) columns. The idea is to use
the evnt_time as timestamp in kibana and be able to search for date and
stack one day on top of another in a histogram plot in kibana. Eventually
Answers inline.
Regarding the slow I/O. When I analyzed the creation of the Lucene index
files I see that they are created without any special flags (such as no
buffering or write through). This means that we’re paying costs twice –
when we write the file we’re going cache data in Windows’
I've encountered a weird problem that I hope someone has seen before, and
can provide a resolution for.
I'm running Elasticsearch 1.0 downloaded as an RPM and installed on a
cluster of two RedHat nodes. Five shards with one replica each is
configured.
Unfortunately, I could not reproduce the
Hi,
Es-Hadoop doesn't perform strict mapping - it uses the Hive mapping to infer the JSON types used by the document which
in turn are interpreted by Elasticsearch, which by default, will try to figure out the type of the data sent, in your
case, a timestamp.
The solution to this is create the
The nodes all have that value in their elasticsearch.yml files, and have
been restarted since then.
I'm getting the error trying to update indexes to store their shards only
on those nodes with the appropriate node.storage value. I was under the
impression that was a basic way to do shard
Thanks,
feel free to let me know if there is better place where to report these...
Lukas
On Wed, Apr 9, 2014 at 8:30 PM, Binh Ly binhly...@yahoo.com wrote:
Lukas, thank you, I will pass along :)
--
You received this message because you are subscribed to the Google Groups
elasticsearch
Attached the index rate (using bigdesk):
https://lh5.googleusercontent.com/-Jve-j75qB9o/U0WgK5ZMvMI/AFo/5_WZuCryeRw/s1600/bigdesk.png
The indexing requests per second is around 2K and the Indexing time per
second is around 3K
On Wednesday, April 9, 2014 9:36:12 PM UTC+3, Yitzhak
Hi,
I am currently exploring the option of using scripts with aggregations and
I noticed that for some reason scripts for terms aggregations are executed
much slower than for other aggregations, even if the script doesn't access
any fields yet. This also happens for native Java scripts. I'm
If you are able to put everything into one document you might try the span
near query with ordering.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-span-near-query.html
Stein Kåre
- http://isitdown.no
On Friday, January 10, 2014 7:47:17 PM UTC+1, Andr'e wrote:
Trying to understand how different index type influences the performance of
a given index. For instance if I have a users index with carPayment and
busPayment type. In terms of performance and how it's physicall stored does
it make any difference if data is stored in different types vs just one
The terms aggregation relies on the fact that field data produces unique
values in order to run efficiently. When you provide a script, by default
there will be a wrapper that will take care of deduplicating them in order
to make sure the result would be the same as if the data was stored in the
We're attempting to create a new Elasticsearch cluster for indexing URLs, but
have run into a memory leak when turning replication on for our indices.
The current setup is: 5 x m2.2xlarge, 4 TB mounted on EBS per node (not
Provisioned IOPs).
We create one index per day, and will keep the
Hi,
I have included logstash in my stack and started to play with it. I'm sure
it can do the trick I was looking for, and much more.
Thank you ...
[waiting for your blog post :)]
Pascal.
On Mon, Apr 7, 2014 at 9:38 AM, Alexander Reelsen a...@spinscale.de wrote:
Hey,
I dont know about your
Nothing sticks out, but I would synchronize the call so that different
threads do not end up creating an instance. Enforce the singleton pattern
by making the static variable private and have only one return in your
method. My code looks something like
public class ClientFactory {
private static
It seems to me that a multimatch query with type phrase_prefix does not
support fuzziness ... but a multimatch query with type match_phrase_prefix
does.
I like the results from phrase_prefix better than match_phrase_prefix, is
there any way to add fuzziness?
Slop works for phrase_prefix, but
That is just unfortunate. Sneaky. :)
Fortunately, in your case, you ultimately want to override the tf() method,
which is not marked final. If you want to still play around with the
similarity I created, you can always subclass TFIDFSimilarity and implement
the methods yourself (you can just copy
No you don't Binh
On Wednesday, 9 April 2014 17:57:17 UTC+1, Binh Ly wrote:
I'd be curious to see an example of such a query! :)
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from
Warning to the Christian and Non Muslim world!
We (Muslims) are inviting you to stop oppression on the Muslims. Don't you
see the condition of your country? You are under the grate dissatisfaction
of Allah for tormenting the Muslims of the whole world. As a result,
different kinds of chaos and
I know, it's hideous.
They typically consist of an entity neared with a large number of
negative (user specified) terms. So something like [bad person near50
(long or list near5 negative or terms ...)] which we translate into a big
horrible near_span query. When we can detect a language then
Hi
I am porting over an existing search system over to elastic search. We have
a few custom requirements for search. I played around with ES RESTful APIs
and then now I am trying out the java client for elastic search. I am
wondering how should i go about implementing all the custom logic,
My cluster has gotten into an odd state today. I have a regular job that
deletes indices after X days. The job executed an index deletion this
morning. When it did this the cluster went into a 'red state' claiming that
there were 10 unassigned shards (5 shards + 1 replica). After some
Warning to the Christian and Non Muslim world!
We (Muslims) are inviting you to stop oppression on the Muslims. Don't you
see the condition of your country? You are under the grate dissatisfaction
of Allah for tormenting the Muslims of the whole world. As a result,
different kinds of chaos and
On Wednesday, April 9, 2014 7:49:13 PM UTC+12, Alexander Reelsen wrote:
Hey,
can you try to use date formats instead of the named identificators like
date_time and see if it works? Also, can you check the exception of the
elasticsearch logs?
I tried the -MM-dd .. it gives an
I am trying to remove the TTL setting when doing a document update
(doc_as_upsert) via the java api. I do the following:
UpdateRequestBuilder b = new UpdateRequestBuilder(client);
b.setDocAsUpsert(true);
b.setDoc(dataMap);
...
UpdateRequest request = b.request();
request.doc().ttl(null);
I want to do something like this.
select date_trunc('month', time_stamp), sum(distinct_count) from (
select date_trunc('week', time_stamp) as time_stamp, count(distinct
field_name) as distinct_count
from blah
group by date_trun('week', time_stamp)
)
group by
Thank you so very much,this answer solved a lot of problems,YAY..!! Binh
On Wednesday, April 9, 2014 11:23:57 AM UTC-7, Binh Ly wrote:
Th script could iterate and do something like this for example:
{
script: for ($a: ctx._source.association) { if ($a.code == 546)
$a.imagepath='zzz'; }
}
Hi all - I hate to cross post an issue, but I can't figure out why my ES
instance isn't picking up the MongoDB
plugin: https://github.com/richardwilly98/elasticsearch-river-mongodb/issues/249
I'm not very familiar with Java, so it'd be hard for me to debug the actual
code. I'm trying whatever
Pankaj.
were you able to solve this issue.. I am also stuck in similar need.
If you could able to solve it pl. share the details.
regards
On Tuesday, February 18, 2014 12:50:45 PM UTC+5:30, pankaj ghadge wrote:
Hi,
Is their any other way around for this or not? please let us know.
--
Fantastic, that's exactly what I was looking for, thankyou!
On Wednesday, April 9, 2014 3:12:42 AM UTC+10, Ivan Brusic wrote:
You should be able to use filtered queries instead, where the filter is
your facet filter:
84 matches
Mail list logo