It sounds like you are using nested documents. I guess it comes from here.
--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 6 sept. 2014 à 07:49, Ron Sher a écrit :
Hi,
I've started to use aggregations and it works very fast and cool. But it seems
to get the wrong doc_co
Hi,
I've started to use aggregations and it works very fast and cool. But it
seems to get the wrong doc_count as opposed to the total hits.
For example - here's a typical search I use (I got for it total hits - 111
and doc_count 1348) :
{
"query": {
"filtered": {
"query": {
"m
This is what I observed. I set es_heap_size from 2000m to 3000m (on 6gb
machine) and restart ES service and marvel still shows the machine as
having 2gb. I restarted the machine itself and still the same.
I finally got Marvel to show the right data when i uninstalled and
reinstalled the service.
I am playing around with snapshot/restore and have a local 1.3.2 cluster
running on Mac OS X with 894MB of index data.
I have registered a backup repository like so (straight from the docs):
curl -XPUT 'http://localhost:9200/_snapshot/my_backup' -d '{
"type": "fs",
"settings": {
Thanks Vineet,
Well, I wanted to search across the types, i.e US and ES but only return
one document not 2.
The problem with the approach you suggested is that search is then
limited to documents with isDuplicate=true/false
On Friday, September 5, 2014 2:46:20 AM UTC-5, vineeth mohan wrote:
>
Hi,
Epic fail:
[2014-09-05 22:14:00,043][DEBUG][action.admin.cluster.node.hotthreads]
[Bloodsport] failed to execute on node [fLwUGA_eSfmLApJ7SyIncA]
org.elasticsearch.ElasticsearchException: failed to detect hot threads
at
org.elasticsearch.action.admin.cluster.node.hotthreads.Transpor
Apparently I have left a serious bug in my searches. We upgraded from 0.9
to 1.0.1 some time ago.
I changed all the multi_fields to the new format, where I put "type" :
"string" instead of "multi_field." As documented here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_
Christoffer,
How much JVM heap are you giving ES and what are the size of the sets?
According to
this
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html
it looks like in 1.4 you will be able to control the circuit breaker more
via config. How
Jorg,
Thanks. I actually have used the term list plugin (thanks) for some quick
prototype / experiments.
I actually meant I am not familiar with SOLR. Lucene I do have some
familiarity with. In this case I was wanting to really be able to send the
analysed text on to some post processing ei
I am getting the following intermittent failure on random different tests
(I presume during the teardown) when the build is running on TeamCity.
I cant seem reproduce it locally. I get a failure about 1 in 10-20 test
runs.
Its not clear to me why I am getting the failure. Anyone have any
sugges
Hi,
I would like to find the best efficient approach on performing manual
joins. A little background:
Our documents are updated quite frequently and they are rather large
(reason why we don't nest them). In addition, one document may be related
to two or more other documents (may be on differe
>
>
>
> Appreciate your explanation, and as per your suggestion range filter gives
> correct results.
> I am still confused with the usage of exists filter.
>
> As per my understanding the implementation of exists filter is changed in
> v1.3 to increase the speed but why it deviates from it's ex
Hi Alex,
how exactly could this work?
For example we are using the pattern "Quotedstring" to extract the up to 4
IPs in the X-Forwared-For header of our Apache Logs.
When we then try using this one in the geoip filter the filter seems to
miss the IP.
example:
grok {
type => http_log
pa
Ah, that makes total sense. Thanks David!
On Fri, Sep 5, 2014 at 10:15 AM, David Pilato wrote:
> Got it…
>
> You must not repeat the repo name in the JSON doc. It should be:
>
> {
> "type":"s3",
> "settings":{
> "region":"us-east",
> "bucket":"my-bucket"
> }
Got it…
You must not repeat the repo name in the JSON doc. It should be:
{
"type":"s3",
"settings":{
"region":"us-east",
"bucket":"my-bucket"
}
}
--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr
Le 5 septembre 2014 à 14:
Cool! Lot of things have changed since though! :)
--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 5 sept. 2014 à 15:41, kazoompa a écrit :
Fantastic, Thanks David.
BTW, I have to thank you for your video presentation in French, it really
helped me a lot to understand t
According to
http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
filters inside an and-filter don't use a BitSet but only check the
conditions inside the iterated documents.
When I look at the explanation of
GET megacorp/employee/_search?explain=1
{
"query": {
"filt
Fantastic, Thanks David.
BTW, I have to thank you for your video presentation in French, it really
helped me a lot to understand the basics od ES two years ago.
Cheers.
On Thursday, September 4, 2014 5:09:35 PM UTC-4, David Pilato wrote:
>
> You could try
> http://www.elasticsearch.org/guide/e
Hello Darren ,
If its term frequency of a word that you are looking for , you can use
script fields -
{
"fields": [
"text"
],
"query": {
"term": {
"text": "god"
}
},
"script_fields": {
"tf": {
"script": "_index['text']['god'].tf()"
}
}
}
SCRIPTING -
ht
This question is about querying documents from the whole index using a
"filtered" query (and not about filtering further down some query results).
According
to http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
the "and" filter is a very bad choice to query from all docum
Hello Darren ,
What do you mean by number of hits ?
Is it the number of occurrence of a term in a document ?
Thanks
Vineeth
On Fri, Sep 5, 2014 at 6:32 PM, Darren Trzynka
wrote:
> In our current application, it is important to know the number of times
> hits were found within a doc
Hi Jorg,
Anton is right we removed the plugin and double checked ES is taking up our
bulk of the time. We do see that number of evictions are high
filter_cache: {
memory_size_in_bytes: 10508060
evictions: 0
}
id_cache: {
memory_size_in_bytes: 276840500
}
fielddata: {
memory_size_in_bytes: 41618
In our current application, it is important to know the number of times
hits were found within a document for a given search. We are considering
using elasticsearch but this is one area I have yet to find a solution for
with elasticsearch. The only thing I have found remotely possible is
gett
Got it thanks
On Friday, 5 September 2014 11:44:42 UTC+3, Thomas wrote:
>
> Hi,
>
> I have been performing indexing operations in my elasticsearch cluster for
> some time now. Suddenly, I have been facing some latency while indexing and
> I'm trying to find the reason for it.
>
> Details:
>
> I
Active in this contact means currently indexing documents.
On Sep 5, 2014 8:17 AM, "Thomas" wrote:
> Hi,
>
> I wanted to clarify something from the blog post you mentioned. You
> specify that based on calculations we should "give at most ~512 MB
> indexing buffer per active shard...". What i
It looks like there's a classpath issue (notice the HiveUtils error there).
Most likely because you have two versions
of es-hadoop in your classpath (2.1.0.Beta1 and 1.3.0.M1).
Use only one - I suggest 2.1.0.Beta1.
Cheers,
On 9/5/14 3:39 PM, Mohit Kumar Yadav wrote:
hi folks,
I facing followi
hi folks,
I facing following error while load data into elasticsearch using hive
query.
ERROR:-
14/08/30 02:05:04 INFO log.PerfLogger:
14/08/30 02:05:04 INFO ql.Driver: Starting command: CREATE EXTERNAL TABLE
eslogs (time STRING, extension STRING, clientip STRING, request STRING,
response INT, age
Hi,
I wanted to clarify something from the blog post you mentioned. You specify
that based on calculations we should "give at most ~512 MB indexing
buffer per active shard...". What i wanted to ask is what do we mean with
the term active? Do you mean the primary only or not?
Thank you agai
Thanks David. I typed the command I was running to try to create the
repository incorrectly. I am running
curl *-XPUT* 'localhost:9200/_snapshot/products-v0.2.1' -d '{
"products-v0.2.1": { "type": "s3", "settings": { "region": "us-east",
"bucket": "my-bucket" } } }'
So I am using the PUT meth
> Why is Elasticsearch allowed to get into this state? Is it poor
configuration on our part or a bug in the software?
It is the JVM with low memory condition. No Java code can execute when
stack and heap is full and free memory is below a few bytes. ES works hard
to overcome these situations.
>
*Background*
We have a three node cluster comprised of *prd-elastic-x*, *prd-elastic-y*
and *prd-elastic-z*. Each box is an EC2 m2.xlarge, with 17.1 GB of RAM.
Elasticsearch is run with the following java memory configuration:
java -server -Djava.net.preferIPv4Stack=true
-Des.config=/usr/local/e
Thx Michael,
I will read the post in detail and let you know for any findings
Thomas.
On Friday, 5 September 2014 11:44:42 UTC+3, Thomas wrote:
>
> Hi,
>
> I have been performing indexing operations in my elasticsearch cluster for
> some time now. Suddenly, I have been facing some latency while
Maybe index throttling is happening (ES would say so in the logs) because
your merging is falling behind? Do you throttle IO for merges (it's
throttled at paltry 20 MB / sec by default)? What does hot threads report?
How about top/iostat?
We just got a blog post out about improving indexing thr
You must handle exceptions very carefully in plugins. You should log errors
to the log, skip/disable the plugin operation, and that's it.
Jörg
On Fri, Sep 5, 2014 at 3:14 AM, Srinivasan Ramaswamy
wrote:
> Hi Joerg,
>
> I tried the data loading part as a separate module and it works, but i
> ha
What version of es have you been using, afaik in later versions you can
control the percentage of heap space to utilize with update settings api,
try to increase it a bit and see what happens, default is 60%, increase it
for example to 70%:
http://www.elasticsearch.org/guide/en/elasticsearch/re
Hi,
I have been performing indexing operations in my elasticsearch cluster for
some time now. Suddenly, I have been facing some latency while indexing and
I'm trying to find the reason for it.
Details:
I have a custom process which is uploading every interval a number of logs
with bulk API.
Hi all,
is there a way to create a nested field from a plugin such as the
attachment mapper?
The parseContext.externalValue(...), and fieldMapper.parse(parseContext)
technique creates only normal Lucene fields as far as I can see.
Thanks,
Jakub
--
You received this message because you ar
Hi, Jörg. Thanks for replay, as I said before we using this on multiply
instances and met this problem only one particular one. I removed this
plugin and checked again, this won't help. Here is hotthreads from this
run. I realy appretiate if you suggest next steps what we can look.
Thanks.
четв
I manually deleted the indices through the following command and now it
works:
curl -XDELETE 'http://localhost:9200/index_name'
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
The significant terms aggregation is a really great feature that allows for
some really interesting data analysis. We quite often experience out of
memory errors, "CircuitBreakingException: Data too large, data would be
larger than limit"
Which is not hard to understand, due to the amount of dat
Hello Anand ,
I dont see any direct way to do this from the query.
The way i have in my mind goes like this
1. Identify duplicates while indexing. and mark the duplicate feed as
duplicate. A field names "isDuplicate" : "true/false" would be the best.
2. While doing search filter out al
41 matches
Mail list logo