Hi all:
I want to use ESTap to dump my hadoop data to es,but I came across a
problem,when I export data to a new index, I
set es.index.auto.create=yes,so index can be created, and mapping type can
be created as well, but the type of fields always is string,I want to set
one field to long,how
Your user ran out of thread/process space. This is reported as OOM in Java.
You can check the nproc entry in /etc/security.d/limits.conf for maximum
settings and compare this with the process table.
The OS settings regarding threads are usually ok and should not be
modified. Check if you have
Thank you Tony and Mark,
atm I have no more information about the virtualization because it's our
customer systems, maybe later I can provide more information regarding
that.
Other processes are our java applications which use ES to index and search
data.
From htop/top I can see that almost
Hallo,
i have the same problem if i send a query like this
query : {
query_string : {
query : someText,
fields: [field1, field2, field3]
}
The explain will always show *description: max of:*
_explanation: {
value: 20,
description:
Jay, first of all good that this prevents the server from going into an
infinite loop! Can you maybe build elasticsearch from the 0.90 branch (mvn
clean -DskipTests packge) and deploy the artifact that is in
target/release/ - we added some stuff to prevent this can you maybe give
it a go if
I am using JSON-LD, which boils down to something like this
{
...
_source : {
@context : { rel : ... },
@id : 476,
@type : ,
description : Product description,
a8 : 100 mm,
a12 : 250 g,
categories : [ 8,
When doing a query string search, the edit distance can set by doing the
following quikc~*2*.
But how can the edit distance set in the fuzzy_like_this query? It also
uses the Levenshtein distance, right?
Is it the min_similarity parameter? If so, what does a similarity of 0.5
mean?
Thanks
HOW DOES ELASTICSEARCH CALCULATE THE SCORE FOR THE FOLLWING AND WHAT THIS
SCORE MEANS.
hits: [
- {
- _shard: 3
- _node: k8BXmkARRsaaYlUJTRpIqQ
- _index: phone
- _type: iphone
- _id: 2
- *_score: 0.2712221*
- _source: {
- title:
Hi,
On 11/02/2014 6:40 AM, Jong Min Kim wrote:
I was searching infos about ES with HDFS. What I see is, using ES with Hadoop
does not mean using HDFS as main storage
for ES.
You can use HDFS as the main storage of ES if you mount it as a local filesystem (typically NFS). However, your
Hello Friends,
When i use min_score in my query sorting stop working.
Also min_score displays only 10 results?
Below is my code,
$result = $es-search(array(
query = array(
dis_max = array(
queries = array(
0 = array(
field = array(
title = $search
)
)
)
),
Hi,
I've noticed a very disturbing ElasticSearch behaviour ...
my environment is:
1 logstash (1.3.2) (+ redis to store some data) + 1 elasticsearch (0.90.10)
+ kibana
which process about 7 000 000 records per day,
everything worked fine on our test environment, untill we run some tests
for a
Hello,
i am new to elasticsearch and i am trying to import a csv file.
can anyoone help?
an example csv file :
name comment
me hello
json file
{
type : csv,
csv_file : {
folder : ~/test,
filename_mask : .*\\.csv$,
poll:5m,
fields : [
Simon, prior to this post I decided to try a custom build from the 0.90.11
tag with the following commits cherry picked onto my branch:
ad1097f1ba109d6cb235ba541251ba63abb27c16
b4ec18814b3eeb35d948c01abec3e04745f57458
93e9d2146e77f6c0523875b93c768ab7f81cfe04
Hey,
maybe it is possible to exclude the segment statistics (if you do not need
them) and not run into that performance problem as a quick hack...
--Alex
On Mon, Feb 10, 2014 at 6:54 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:
Interesting fact is that your stacktraces point to a
Is the @ important?
when i use the command you send i get:
# curl -XPOST http://localhost:9200/_river/my_csv_river/_meta --data-binary
@insertrivercsv.json
Warning: Couldn't read data from file insertrivercsv.json, this makes an
Warning: empty POST.
{error:MapperParsingException[failed to
Read the curl man page for sending a file with @
Jörg
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this
I have found the answer and i wish to share.
I just need to pass the 'type' : 'phrase' on match query or just use '
match_phrase' query.
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails
Hey,
you can use the nodes statistics to find out which thread pool contains all
those rejected tasks, before trying to tune. If it is the search thread
pool, you can try to increase the size or the queue, alternatively add
another node to scale out your search load or try to improve your
Hey,
never thought about such a use-case, but it sounds useful. Feel free to
create an issue, and even better, a pull request to add that functionality
to DistanceUnit
--Alex
On Tue, Feb 11, 2014 at 12:54 AM, Raffaele Sena raff...@gmail.com wrote:
One nautical mile is one minute of arc
Alternatively, if you want to preserve email address and web urls, you can
use the uax_url_email tokenizer and then term and match queries should work
without any problems.
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from
Hey,
please see
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-highlighting.html
You can just add a 'highlight' field to your JSON document, which specifies
the fields to highlight on. Also, dont use explain in production, but only
for debug purposes.
Hey,
with recent elasticsearch versions (including newer 0.90), you can see if
bootstrap.mlockall setting is really applied in the nodes info. So make
sure setting it, was really successful.
curl -XGET 'http://localhost:9200/_nodes' and search for mlockall, which
must be set to true.
--Alex
Hi Jörg,
Thank you for your answer. Lots of new stuff in there though which will
require some studying to understand :) !
JSON-LD seems like an excellent addition to JSON which could actually mean
some competition for graph databases?!
I've tried to setup the following simple 2 dispenser, 2
Hey,
the description about tf/idf similarity in the lucene javadocs might help
here:
https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
--Alex
On Tue, Feb 11, 2014 at 11:51 AM, Navneet Mathpal
navneetmathpa...@gmail.com wrote:
hi ,
I want
Works great - thanks for the quick turnaround!
Regards,
Al.
Original message
From: Boaz Leskes
To: elasticsearch@googlegroups.com
Cc:
Date: 04/02/2014 16:59:00
Subject: Re: Marvel and basic_auth
Hey Al,
We just release marvel 1.0.2, which contains support for basic auth for the
Hey,
can you provide a minimal example using curl, so people can reproduce it
using commandline tools and do not need a programming language and its
environment? See http://www.elasticsearch.org/help
A quick test was not revealing any suspicious behaviour to me, but maybe
you are doing something
Hey,
you can use the range facet, as it also supports dates and provide your own
custom date ranges there maybe?
See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-range-facet.html
--Alex
On Tue, Feb 11, 2014 at 2:36 PM, mooky nick.minute...@gmail.com
I'd probably just simplify and eliminate all the nested stuff. So for
example, if your document is like this:
{
name: Shirt1,
color: [Red]
size: [XL, S, M]
}
It's easy to execute queries like this:
{
query: {
filtered: {
filter: {
bool: {
must: [
Hi, I've just published new library to work with elastic search in node.
Check it out, https://www.npmjs.org/package/baio-es
Suggestions, bug reports are appreciated.
Thank.
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from
Hi all,
I have been analysing Elasticsearch results with explain:true condition, I
am not able to understand what technique has been applied to calculate
idf. I went through the lucene scoring formula i.e.
idf(t) = 1+log(NumDocs/Doc frequency+1)
Does not matches my results.
Following is
Hey thanks for clarifying! I actually ended up setting it up as 1x
master-only, 2x master-eligible data-nodes, realizing that I would need 3
eligible masters while putting it all together.
On the heap problems, could you be more specific about what you are
referring to, or maybe point me towards
I'm not sure, and I try hard to understand your use case.
I assume you want a single query that can filter attributes of both the
entity 1 and for attributes of related entities 2 and 3.
As you have noticed, in a single query, this is not possible unless you had
bubbled up the relevant
I just realized. The second node just LogStash. I just update now they are
has a single node ...
Le mercredi 5 février 2014 17:20:05 UTC+1, David Patiashvili a écrit :
Where can I find the other node?
Le mercredi 5 février 2014 17:17:51 UTC+1, Tony Su a écrit :
Hi David,
Your latest
Hi all,
get MapperParsingException failed to parse in 0.90.10
[2014-02-11 16:05:09,402][DEBUG][action.bulk ] [Thunderbolt]
[logstash-2014.02.11][4] failed to execute bulk item (index) index
{[logstash-2014.02.11][suricata][deuCC2bkRvehNSA62tuuHw],
It's the semantic web.
For inference, see http://www.w3.org/standards/semanticweb/inference
Materialization is the pre-computation and storage of inferred triples
http://www.w3.org/wiki/LargeTripleStores
In fact, I use JSON-LD, which is convenient for both storing triples and
loading them for
Those commits look good to me! I'd be super curious what you see in the
logs especially coming from this:
logger.warn(Searcher was released twice, new
ElasticSearchIllegalStateException(Double release));
The bulk index does it contain delete by query or so?
simon
On Tuesday, February 11,
Alex,
I created issue https://github.com/elasticsearch/elasticsearch/issues/5085
I don't use GitHub that much, and I kinda muffed the issue, so I'll let
someone else add the one enumeration to wherever it should best go:
NAUTICALMILES(1852.0, nm, nmi),
Thanks!
Brian
Brian
On Tuesday,
Hi Alex,
thank you.
I've run the command and also it shows that mlockall is set to true !
On Tuesday, February 11, 2014 2:33:49 PM UTC+1, Alexander Reelsen wrote:
Hey,
with recent elasticsearch versions (including newer 0.90), you can see if
bootstrap.mlockall setting is really applied in
Sorry to resurrect a dead thread, but I figured it out:
https://github.com/elasticsearch/elasticsearch/issues/5086
High level:
1. Hit ~1 million documents with a script score.
2. Do something like (doc['foo'].empty ? 0 : doc['foo'].value) * doc['bar'].
The .empty is the key here.
3. If most of
Heya,
We just released elasticsearch-transport-thrift plugin 1.8.0 for elasticsearch
0.90.10 (and ) and 2.0.0.RC2 for elasticsearch 1.0.0.RC1 (and ):
https://github.com/elasticsearch/elasticsearch-transport-thrift
Issue fixed in both branches:
Hi,
this might be a rookie problem since I'm very new to elasticsearch.
I'm trying to put JSON documents into elasticsearch with a field lang.
However if lang is set to it elasticsearch doesn't seem to recognize
the field since it's only returned when I filter for missing fields.
The problem
Three master nodes are enough, for as many data nodes as you wish to add.
You can search this mailing list for discussions where kimchy explained the
dedicated master nodes, and how it fits for split-brain situations
For example
https://groups.google.com/forum/#!topic/elasticsearch/dxjpMd4vNXQ
Great catch. Which Elasticsearch version and which JDK?
Thankfully my documents are uniform, so I have been able to skip isEmpty
checks.
--
Ivan
On Tue, Feb 11, 2014 at 7:52 AM, Nikolas Everett nik9...@gmail.com wrote:
Sorry to resurrect a dead thread, but I figured it out:
Very rookie problem. :)
The default (aka standard) analyzer uses a stopword filter and it is a
stopword. Try configuring your field with a custom analyzer which does not
use stopwords or a custom set of stopwords.
Cheers,
Ivan
On Tue, Feb 11, 2014 at 7:57 AM, felix.kof...@gameforge.de wrote:
Actually in your case, your search terms probably do not need to be
analyzed at all since you are not executing full-text searches on that
field. Try setting the field as non_analyzed and use a term query (which
does not analyze search terms). Better yet, using a term filter since
filters are
Try setting use_dis_max to false in your query.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_multi_field_2
Cheers,
Ivan
On Tue, Feb 11, 2014 at 1:15 AM, Alexander Ott
alexander.ott...@gmail.comwrote:
Hallo,
i have the same
This will be addressed better in the future. For now, you can split/rewrite
your multi_match query into a bool query with multiple should clauses where
each clause is targeting each field in your multi_match (or query_string)
query. This still won't be quite a sum, but at least it will combine
Working on a pull request... I've created a fork off of master and cloned
it to my laptop. (First time using git and GitHub in this way...)
Brian
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving
Update:
Whereas my previous tries to optimize for recovery failed miserably, the
gateway.recover_after_nodes setting in elasticsearch.yml worked... To a
point.
I noticed
- No ES node was responsive at all after nodes were brought online until the
quorum was met.
- It can take a long time for
Create a branch for your changes. Submit a PR from the branch and not
master. Make sure to update DistanceUnitTests.java as well. The trickiest
part is getting the Elasticsearch team to notice your PR. :) They must be
super busy with the 1.0 release.
Lots of tutorials online:
As of the time of this posting:
elasticsearch-0.90.9-1
jdk-1.7.0_51
ES_HEAP_SIZE=12g
ES_DIRECT_SIZE=20g
index.number_of_replicas: 1
Shards:
number_of_nodes : 2,
number_of_data_nodes : 2,
active_primary_shards : 30,
active_shards : 60,
And rather than a block of text, here are the
You write ES will usually crash - but how does it crash? Are there
messages in the log?
Do not use Java 7u51, it may cause trouble, 7u25 is known to be stable.
Why do you only use 12G heap if you have 64G RAM on a node? Why do you
limit your resources with ES_DIRECT_SIZE? Why do you use 5 shards
Also be aware that the log should be a natural log, i.e. the base is e
instead of 10. So for example, pulling the first IDF from your results:
value: 5.88784
description: idf(docFreq=2, maxDocs=398)
idf = 1 + ln(398 / (2 + 1)) = 5.8878397166163280134321081764042
Tony,
What you are seeing with the shard recovery is normal - but doesn't mean it
couldn't use more improvement in the future. For now you can throttle the
recovery using a combination of settings (but cannot 100% avoid it).
Just FYI, there is a reason hashing cannot be done (for now) and this
Hi all,
I am having an issue with the Cluster. After loading 30+ million records
into the system, over the weekend one of the servers (out of 6) ran out of
disk space and ever since, I cannot seem to get it back online. Any help
would be appreciated. If anyone has any suggestions at all, any
I have my mappings as such
{
attachment: {
properties: {
file: {
type: attachment,
fields: {
title: {
store: yes
},
author: {
What is your current mapping? Use the GetMapping API.
The file field is an inner object, but you do not have one defined in your
mapping. Very likely you already have indexed a document with the file
field as another type.
--
Ivan
On Tue, Feb 11, 2014 at 7:12 AM, Stefan Sabolowitsch
Hi,
twitter typeahead is autocomplete library which is broadly used to
implement such features on websits. It allows to fetch remote suggests via
AJAX which is the way how I use it. The AJAX query has the value the user
searched for , in below case I typed j in the search bar. The typeahead
Hi Ivan,
thanks for your answer, i use as an indexer logstash.
this is my current mapping:
{
template : logstash-*,
settings : {
index.refresh_interval : 5s,
analysis : {
analyzer : {
default : {
type : standard,
stopwords : _none_
}
}
}
},
That is your template. Use the Get Mapping API to find out what actually is
in effect.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-get-mapping.html
On Tue, Feb 11, 2014 at 12:17 PM, Stefan Sabolowitsch
sabolowitsc...@in-trier.de wrote:
Hi Ivan,
thanks for
Oops, obvious answer. :) I see questions about incorrect TFIDF scores and
my mind automatically goes to DFS scoring (which is actually about TF, not
IDF).
--
Ivan
On Tue, Feb 11, 2014 at 10:22 AM, Binh Ly b...@hibalo.com wrote:
Also be aware that the log should be a natural log, i.e. the
Is there any way we can define our own aggregation functions beyond the
provided metric and bucket aggregations?
Thanks!
Justin
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it,
If all your other nodes contain enough replicas of all your indexes (i.e.
you have lost no data), then you can safely take down the bad node, wipe
out whatever data is in the data directory (assuming it is local to the
node) and then join it back to the cluster. If the bad node actually
You can simply add more fields on the same level as file. For example:
mappings: {
doc: {
properties : {
file : {
type : attachment,
fields: {
file: { store: yes }
}
},
meta1: {
type: string,
On Tuesday, February 11, 2014 12:44:02 PM UTC-8, Binh Ly wrote:
If all your other nodes contain enough replicas of all your indexes (i.e.
you have lost no data), then you can safely take down the bad node, wipe
out whatever data is in the data directory (assuming it is local to the
node)
On Tuesday, February 11, 2014 12:44:02 PM UTC-8, Binh Ly wrote:
If all your other nodes contain enough replicas of all your indexes (i.e.
you have lost no data), then you can safely take down the bad node, wipe
out whatever data is in the data directory (assuming it is local to the
node)
I've read some blogs and some email groups where users have indicated they
have had data loss. In some cases user is able to recover using the source.
I am wondering what are the common reasons this could happen due to ES
software issue assuming there are 2+ replicas and multiple nodes available?
An analyzer plugin is the right thing. Adding the recognized/extracted
terms needs access to ES mapping service. There are a few plugins out there
which work in this manner, for example, the attachment mapper plugin.
Or the lang-detect plugin, it adds the recognized language(s) as a keyword
code
Thanks, I guess i should be asking how would i set those meta data during
indexing, is there a feature that ES can perform for me. i saw a way to
have custom analyzers and was trying those out to see if i can use that to
set the meta for me if there is something in that document.
On Tuesday,
Great, thanks Jörg!
I'll start fiddling around with the langdetect plugin to see if I can get
it going with our library.
On Tue, Feb 11, 2014 at 1:18 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:
An analyzer plugin is the right thing. Adding the recognized/extracted
terms needs
Thanks for the feedback Mark / Binh,
I am not sure if it is a single node that is causing the problem. Querying
_cluster/health/indexdata?level=shards gives me this response below. Is
deleting the data from the bad node, consistent when the shards are in the
state as below?
{
Chris,
You'll probably need to find out which node contains whichever shards that
you think are bad. If you do something like this, you can get a detailed
breakdown of which indexes has which shards on which nodes and their
corresponding shard states:
curl
Hi Binh,
That command did not seem to work. I am running version 90.6, is that
supported in this version?
$ curl http://server:9200/_cluster/state/routing_table?pretty
{
error : IndexMissingException[[_cluster] missing],
status : 404
}
Thanks,
--
You received this message because you are
Forgot to mention, Marvel only works with ES 0.90.9 and later. Just FYI.
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
I am trying to figure if/how it is possible to craft a specific query using
nested objects:
For example, given a simple author with nested books mapping:
{
author:{
properties : {
name : { type : string },
books : {
type : nested,
properties : {
I have some documents with ~30 fields, most of which i just want to analyze
with the defaults, a couple i want to use snowballing or other custom
analyzers on.
The recommended way to do this seems to be using the index_name property to
aliase a custom _all field, such as:
curl -XPOST
Cool.
Thx all.
Tony
On Tuesday, February 11, 2014 10:38:18 AM UTC-8, Binh Ly wrote:
Tony,
What you are seeing with the shard recovery is normal - but doesn't mean
it couldn't use more improvement in the future. For now you can throttle
the recovery using a combination of settings (but
Nice work!
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com
On 12 February 2014 11:40, Tony Su tonysu...@gmail.com wrote:
On my openSUSE wiki, I have created and updated a couple pages...
*For Beginners learning
I'll definitely post back with what we find in the logs with the updated
version.
The bulk indexes do not contain a delete by query; we're just converting
some Java objects to a XContentBuilder and sending them to be indexed like
so:
BulkRequestBuilder bulkRequest = client.prepareBulk();
for
hi,
what is the queryWeight and fieldweight and how does it calculated.I am
getting the follwing results.
Thanks
Navneet Mathpal
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from
hi,
what is queryWeight and fieldWeight value which we get as an output of
explain api, and how is it calculated?
Thanks
Navneet Mathpal
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving
Thanks for replying Binh,
Yeap first 25 matches having closer distance (lat,lon values) but they are
not relevant matches.
For Ex.
When i search for palexpo from San Francisco (37.77519600,-122.41920400)
it gives me,
1. PLUGZ -- 37.80664300 -122.41628300 -- 2.181 Miles from san francisco
2.
Thanks Brin, your answer solved my problem.
Thanks Ivan to you too, I am having 5 shards, idf is getting calculated on
the maxdocs present in that shard. Doesn't that leads to misleading idf?
On Tuesday, 11 February 2014 20:10:52 UTC+5:30, sunayana choudhary wrote:
Hi all,
I have been
Thanks for replying Alex,
You can see my below code so that you can come to know what i am using,
--Creating the index first,
curl -X PUT 'http://localhost:9200/adminvenue/?pretty=true' -d '
{
settings : {
analysis : {
analyzer : {
venue_analyzer : {
I have 2 types of document as below.
- *poi*: 4 millions, it has a point(lon, lat) field
- *region*: it has a polygon field
I'd like to search POI documents in a certain region by using geo_shape
filter.
So I made query as below.
{
query: {
filtered: {
query: {
85 matches
Mail list logo