Dear All,
i have about 20GB size of document, and i want to index all the document
content using attachment plugin, my question is, what is the size of the
index, is't the size will be also 20gb
thank you
--
You received this message because you are subscribed to the Google Groups
Hey.
judging from the exception this looks like an unstable network connection?
Are you using persistent HTTP connections? Pinging the nodes by each other
is not a problem I guess?
--Alex
On Thu, Jun 19, 2014 at 12:12 AM, alekjouhar...@gmail.com wrote:
Hello all,
So here's the issue, our
It is correct you noted that Elasticsearch comes with developer settings -
that is exactly what a packages ES is meant for.
If you find issues when configuring and setting up ES for critical use, it
would be nice to post your issues so others can also find help too, and
maybe share their
Hey,
not all parent documents (and not the data), just their ids. Still this can
accumulate, which is the reason why you should monitor the size of that
data structure (exposed in the nodes stats).
Hope that helps.
--Alex
On Thu, Jun 19, 2014 at 6:03 AM, Drew Kutcharian d...@venarc.com
Information
My note_source contain picture (.jpg, .png ...) in base64 and text.
For my mapping I have used :
type = string
analyzer = reuteurs (the name of my analyzer)
Any idea ?
Le jeudi 19 juin 2014 17:57:46 UTC+2, Tanguy Bernard a écrit :
Hello
I have some issue, when I index a
Hey,
the exception you showed, can possibly happen, when you remove an alias.
However you mentioned NullPointerException in your first post, which is not
contained in the stacktrace, so it seems, that one is still missing.
Also, please retry with a newer version of Elasticsearch.
--Alex
On
Hey,
not a hundred percent sure, what you mean here. The post_filter setting?
There are two possibilities: Either use the search_type=count or use a
filtered query in the count API. See
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-count.html
Hey,
can you provide more information about the OOM exception? Also you should
use the nodes stats API to monitor your system, so you can maybe easily
spot, where this memory consumption stems from. Also, are you just indexing
or doing searches/queries/gets as well?
--Alex
On Thu, Jun 19,
Hey,
a client node with a full 10gb heap and garbage collection does not free
anything, so those objects are still in use (which clearly explains THAT
the OOM happens, but not WHY). Do you have huge searches going on spanning
a lot of shards with deep pagination (all the time). Do you have some
Hi Andrej,
Thank you for using the puppet module :-)
The 'port' and 'discovery minimum' settings are both configuration settings
for the elasticsearch.yml file.
You can set those in the 'config' option variable, for example:
elasticsearch::instance { 'instancename':
config = { 'http.port' =
Hi guys,
Just wondering what is the most efficient way of executing a query that
takes time(parent/child documents) and returns large amount of entries, and
store the result in randomly evenly divided block to HDFS? e.g, the query
will return 100million records and I want every random 1million
Thanks for the help.
I am able to see the correct results now, but could you please suggest how
to write following query in java
curl -X POST localhost:9200/hotels/_suggest -d '
{
hotels : {
text : m,
completion : {
field : name_suggest
}
}
}'
--
View this message in
I'm using elasticsearch as the database for a service. It would make things
easier. For example, I could just return the _source field when other apps
query my service. Related to that is that on the javascript client side, I
am inserting the _id field into the _source JSON object as id and
Hi
For performance improvement I'm trying to combine
elasticsearch/logstash/kibana with hadoop (cdh4). Unfortunately I'm
familiar only with HDFS where I store logs. In my opinion the combination
of elasticsearch and hadoop should use hdfs as storage and transparent
hadoop map/reduce
Hey,
can you be more precise and create a fully fledged example (generating the
repository, executing the snapshot on cluster one, executing restore on
cluster 2, etc) and include the concrete error message in order to find out
what 'the process breaks' means here? Also provide info about
Hello,
Let's say you have an indexed text t1 t3 t3 with shingles. The token
positions are also indexed, so you get : t1 (at pos 1), t1 t2 (pos
1), t2 (pos 2), t2 t3 (pos 2) and t3 (pos 3).
So if you are searching with a match_phrase for t1 t2 t3 (even if
not tokenized as shingles) it will
java.lang.IllegalStateException: this writer hit an OutOfMemoryError;
cannot complete merge
at
org.apache.lucene.index.IndexWriter.commitMerge(IndexWriter.java:3546)
at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4272)
at
Hello,
It wouldn't surprise me if both Black Mamba and Slapstick were hitting
100%, they have more shards and have to handle more requests than the
others nodes. But in your case it's only one node.
First, are you http requests evenly spread over the 4 nodes? You could also
check that all your
Sorry, I wasn't clear enough. I mean Java client's CountRequest.source()'s
argument content, { filter: ... } in particular.
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send
Does it mean your applying the reuters analyzer on your base64
encoded pictures?
I guess it generates a really huge number of tokens for each entry
because of your nGram filter (with a max at 250).
Cédric Hourcade
c...@wal.fr
On Fri, Jun 20, 2014 at 9:09 AM, Tanguy Bernard
hello,
https://stackoverflow.com/questions/24323480/elasticsearch-queries-always-return-all-the-datas-stored-in-the-index#
I'm trying to index and query an index store in ES 1.2. I both create and
populate the index with the JAVA API using the transportclient api. I have
the following
Hi all,
I just joined the mailing list, so sorry if this topic was discussed before.
I would like to set the query size to infinite (or no limit).
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html
This page explains what the parameters do, but
Hey Alexandre,
This is correct. You are searching for a carte which contains an adherent.
Elasticsearch gives you a carte object as an answer. And elasticsearch gives
you back exactly what you have indexed.
That being said, I think you could look at parent/child feature for that use
case.
Or
Hello,
thanks for your response
When I add an other carte
put /tp/carte/20450813
{
dateEdition: 2014-06-01T22:00:00.000Z,
adherents: [
{
birthday: 1963-03-22T23:00:00.000Z,
firstname: FLORENCE,
Searching for DOE gives you that answer?
If so, it's not normal IMHO. You should try to reproduce it with a full SENSE
script recreation so we can replay it and help you from here.
See http://www.elasticsearch.org/help/ for information.
About parent child, you could read this:
You don't want to do that!
If your need is to extract (download) 1 000 000 000 records, you need to use
scanscroll API:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html#scan-scroll
--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet |
Yes, I am applying reuters on my document (compose by text and picture).
My goal is to do my research on the text of the document with any word or
part of a word.
Yes the problem it's my nGram filter.
How do I solve this problem ? Deacrease nGram max ? Change Analyzer by an
other but who
Yes
My request for doe always return that answer
Le vendredi 20 juin 2014 11:24:33 UTC+2, David Pilato a écrit :
Searching for DOE gives you that answer?
If so, it's not normal IMHO. You should try to reproduce it with a full
SENSE script recreation so we can replay it and help you from
It looks like you are doing a GET rather than a POST, if so your query
content is ignored.
Cédric Hourcade
c...@wal.fr
On Fri, Jun 20, 2014 at 11:26 AM, Alexandre Touret alexan...@touret.info
wrote:
Yes
My request for doe always return that answer
Le vendredi 20 juin 2014 11:24:33
That's right
Thanks for your help :)
Regards
Le vendredi 20 juin 2014 11:28:26 UTC+2, Cédric Hourcade a écrit :
It looks like you are doing a GET rather than a POST, if so your query
content is ignored.
Cédric Hourcade
c...@wal.fr javascript:
On Fri, Jun 20, 2014 at 11:26 AM,
No. GET works for running searches.
It could be an issue if you are using an OLD SENSE version and not Marvel.
--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr
Le 20 juin 2014 à 11:28:23, Cédric Hourcade (c...@wal.fr) a écrit:
It looks like you are doing
I set max_gram=20. It's better but at the end I have this many times :
[2014-06-20 11:42:14,201][WARN ][monitor.jvm ] [ik-test2]
[gc][young][528][263] duration [2s], collections [1]/[2.1s], total
[2s]/[43.9s], memory [536mb]-[580.2mb]/[1015.6mb], all_pools {[young]
I just upgraded to ES 1.2.1 and the latest release of mavel.
I have the same behaviour
Le vendredi 20 juin 2014 11:34:59 UTC+2, David Pilato a écrit :
No. GET works for running searches.
It could be an issue if you are using an OLD SENSE version and not Marvel.
--
*David Pilato* |
Right... that makes sense :)
I'll give it a try, thank you!
Nuno
On Friday, 20 June 2014 10:26:07 UTC+1, David Pilato wrote:
You don't want to do that!
If your need is to extract (download) 1 000 000 000 records, you need to
use scanscroll API:
The user copy/paste the content of an html page and me, I index this
information. I take the entire document with image. I can't change this
behavior.
I set max_gram=20. It's better but at the end I have this many times :
[2014-06-20 11:42:14,201][WARN ][monitor.jvm ] [ik-test2]
Ah yes sorry you are right, I am using some old tools :)
Cédric Hourcade
c...@wal.fr
On Fri, Jun 20, 2014 at 11:49 AM, Alexandre Touret alexan...@touret.info
wrote:
I just upgraded to ES 1.2.1 and the latest release of mavel.
I have the same behaviour
Le vendredi 20 juin 2014 11:34:59
Hello Hourcade, Thanks for your response.
Does that mean different values should be set to index_analyzer and
search_analyzer? (e.g. index_analyzer: shingle, and
search_analyzer: standard)
What if I want to re-use the same shingle analyzer in both index and
search? will the match_phrase t1 t2
If it fails on the primary shard, then a failure is returned. If it worked, and
a replica failed, then that replica is deemed a failed replica, and will get
allocated somewhere else in the cluster. Maybe an example of where a failure on
all shards would help here?
On Jun 18, 2014, at 11:45,
On Fri, Jun 20, 2014 at 7:08 AM, Shay Banon kim...@gmail.com wrote:
If it fails on the primary shard, then a failure is returned. If it
worked, and a replica failed, then that replica is deemed a failed replica,
and will get allocated somewhere else in the cluster. Maybe an example of
where a
Yes, you can use two different analyzers. In your case what you can do is:
- for the the indexation you apply a shingle filter.
- for the query you also apply a shingle filter, but this time you
disable the unigrams (output_unigrams: false), so it will only
generate the shingles, in your case : t1
Ahh, I see. If its related to searches, then yea, the search response includes
details about the total shards that the search was executed on, the successful
shards, and failed shards. They are important to check to understand if one
gets partial results.
In the REST API, if there is a total
I got it! Thank you!
Cédric Hourcade於 2014年6月20日星期五UTC+8下午8時00分36秒寫道:
Yes, you can use two different analyzers. In your case what you can do is:
- for the the indexation you apply a shingle filter.
- for the query you also apply a shingle filter, but this time you
disable the unigrams
If your base64 encodes are long, they are going to be splited in a lot
of tokens by the standard tokenizer.
Theses tokens are often going to be a lot longer than standard words,
so your nGram filter will generate even more tokens, a lot more than
with standard text. That may be your problem
Mike - The above sounds like happened due to machines sending too many
indexing requests and merging unable to keep up pace. Usual suspects would
be not enough cpu/disk speed bandwidth.
This doesn't sound related to memory constraints posted in the original
issue of this thread. Do you see
Hi Alexander,
Here is the stack trace for the NullpointerException -
[23:24:38,929][DEBUG][action.bulk ] [Rasputin, Mikhail]
[17f85dcb67b64a13bfef2be74595087e][0], node[a-eZTR9XRiWq-o0QmsM2aA], [P],
s[STARTED]: Failed to execute
Thank you Cédric Hourcade !
Le vendredi 20 juin 2014 15:32:29 UTC+2, Cédric Hourcade a écrit :
If your base64 encodes are long, they are going to be splited in a lot
of tokens by the standard tokenizer.
Theses tokens are often going to be a lot longer than standard words,
so your nGram
There were a couple of times during development workflow I have started ES
script the second time. It results in red status (I use Elastic HQ) and
not-working. So I'm forced to regenerate all indexes (with all test data)
again. It takes noticeable time.
At the moment I use this script
I am not sure highlight will work as i suspect it will encounter the same
obstacle, see in:
https://github.com/elasticsearch/elasticsearch/issues/5245
as for suggestion #2, this will break our current schema and will require a
significant model change (we store the data in MongoDB as well) -
Hello :)
I have some log data indexed in ES and trying to visualize in Kibana and
getting strange behavior related to dates. I have Terms panel with the
following settings:
Terms mode: terms
Field: date
Length 10
Order: count
For some reason, the date column in the panel is showing up as a
Hi,
Is it possible to get elasticsearch to return the number of terms matched
per result in a query. I know these are evaluated as they make up the score
but there doesn't seem to be a way to get a simple count?
For example with :query = {:in = {:user_ids = [user_ids...],
use start-stop-daemon or adapt /etc/init.d/elasticsearch to set up pidfile
guarding es instance. Or just run this way:
pgrep -f elasticsearch || ./start_es.sh
On Friday, June 20, 2014 3:21:08 PM UTC+1, Andrew Gaydenko wrote:
There were a couple of times during development workflow I have
You can either use the startup scripts that come with the package when you
install via apt/yum [1] or use the service wrapper [2].
[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-repositories.html
[2] https://github.com/elasticsearch/elasticsearch-servicewrapper
On Friday, June 20, 2014 6:49:04 PM UTC+4, Maciej Dziardziel wrote:
use start-stop-daemon or adapt /etc/init.d/elasticsearch to set up pidfile
guarding es instance. Or just run this way:
pgrep -f elasticsearch || ./start_es.sh
Aha, thanks! - at my case pgrep is the most appropriate.
--
Heya,
We are pleased to announce the release of the Elasticsearch Thrift transport
plugin, version 2.2.0.
The thrift transport plugin allows to use the REST interface over thrift on top
of HTTP..
https://github.com/elasticsearch/elasticsearch-transport-thrift/
Release Notes -
Thomas,
Thanks for your insights and experiences. As I am someone who has explored
and used ES for over a year but is relatively new to the ELK stack, your
data points are extremely valuable. Let me offer some of my own views.
Re: double the storage. I strongly recommend ELK users to disable
I'm seeing multi-fields of type boolean silently being reduced to a normal
boolean field in 1.2.1 which wasn't the behavior in 0.90.9.
See https://gist.github.com/Omega359/0c2a93690b4db30693a1 for an example of
this.
Is this expected? To me it seems like it should work - the boolean field
Function_score is the way to go IMHO.
Best
--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 20 juin 2014 à 19:50, hugo lassiege hlassi...@gmail.com a écrit :
Hi,
I'm looking for help :) This is maybe trivial but I can't find the good
solution.
I have some
Hi,
I'm looking for help :) This is maybe trivial but I can't find the good
solution.
I have some documents and those documents have two boolean properties,
basically thumbs up and thumbs down to show that the administrator approve
or not those documents.
I try to boost a document if it is
I have the following structure on my ElasticSearch:
{
_index: 3_exposureindex
_type: exposuresearch
_id: 12738
_version: 4
_score: 1
_source: {
Name: test2_update
Description:
CreateUserId: 8
I forgot to mention that I have asked the same question in StackOverflow
http://stackoverflow.com/questions/24333655/getting-complete-value-from-elasticsearch-query
On Friday, June 20, 2014 11:52:49 AM UTC-7, Vinay Pandey wrote:
I have the following structure on my ElasticSearch:
{
I can easily query for documents that are missing a particular term field,
however I'd like to free up that space and remove those documents. I've
tried this with no luck:
DELETE /my_index/pages/_search
{
filter : {
missing : {
field : sentences,
This just got answered:
You should be able to specify _source in the fields
Example:
{
fields: [
_parent,
_source
],
query: {
terms: {
Id: [
12738
]
}
}}
On Friday, June 20, 2014 11:52:49 AM UTC-7, Vinay Pandey wrote:
I have the following
Hi Team,
I am new to elasticsearch and learning about the searchapi/queryapi in
elasticsearch.
I have a requirement to fetch the data from ES. My data is as below assume
in a table format
Prop-Name Type Use
Place1 Sale Office
Place2 LeaseOffice
Hi,
The issue was not with Hector API, issue has been fixed by using WITH
COMPACT STORAGE when creating column families in Cassandra.
Here i have posted it:
http://stackoverflow.com/questions/21089453/cassandra-column-name-trailing-with-blank-characters
--
You received this message because
I do not use delete by query, but have you tried using a fully formed query
and not just a filter? Perhaps an implicit match_all query is not being
set. Try using a filtered query with a match_all query and your filter.
Patrick,
Here's my template, along with where the _all field is disabled. You may
wish to add this setting to your own template, and then also add the index
setting to ignore malformed data (if someone's log entry occasionally slips
in null or no-data instead of the usual numeric value):
{
Hi Costin,
Thanks for the tip. I replaced the old version of jackson and it works now
:).
Cheers
Shankar
On Sunday, June 15, 2014 3:09:27 AM UTC-6, Costin Leau wrote:
What version of MapR are you using? MapR uses an old version of jackson
which es-hadoop should detect and use an
Guys,
its been more than a week i've been struggling with this issue,
if possible, please give it a look and try to help :-(
i have a config file that im running logstash with which is suppose to
fetch the log file i specified in it and stream it to elasticsearch.
problem is that it worked
Hi ,
My write to es from mapr fails because of the automatic date detection
being enabled . Is there a way to disable date detection from the external
hive table properties. ?
Request to please guide me regarding this.
--
You received this message because you are subscribed to the Google
heya bruce
that looks like a bug - please open an issue
clint
On 20 June 2014 19:41, Bruce Ritchie bruce.ritc...@gmail.com wrote:
I'm seeing multi-fields of type boolean silently being reduced to a normal
boolean field in 1.2.1 which wasn't the behavior in 0.90.9. See
You'll have better luck sending this to the Logstash mailing list :)
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com
On 21 June 2014 08:02, Eitan Vesely eitan...@gmail.com wrote:
Guys,
its been more than a week i've
Alternatively, if you mode this with parent-child, then you can use
min_children/max_children which is available in the next release
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-child-filter.html#_min_max_children_2
clint
On 20 June 2014 17:15, Mike
I wasn't aware that the elasticsearch_http output wasn't recommended?
When I spoke to a few of the ELK devs a few months ago, they indicated that
there was minimal performance difference, at the greater benefit of not
being locked to specific LS+ES versioning.
Regards,
Mark Walkom
Infrastructure
And in your config file, set:
node.max_local_storage_nodes: 1
that way you won't start two nodes on a single instance
On 20 June 2014 16:54, Andrew Gaydenko andrew.gayde...@gmail.com wrote:
On Friday, June 20, 2014 6:49:04 PM UTC+4, Maciej Dziardziel wrote:
use start-stop-daemon or
You seriously don't want 3..250 length ngrams That's ENORMOUS
Typically set min/max to 3 or 4, and that's it
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_ngrams_for_partial_matching.html#_ngrams_for_partial_matching
On 20 June 2014 16:05, Tanguy Bernard
I have a simple document schema on which I am trying to run the following
query :
curl -XPOST 'localhost:9200/indexName/topn/_search?pretty' -d '{
aggregations : {
applid : {
terms : {
field : applid,
size : 3,
order : {
ttbyt_sum : desc
}
On Saturday, June 21, 2014 2:33:28 AM UTC+4, Clinton Gormley wrote:
And in your config file, set:
node.max_local_storage_nodes: 1
that way you won't start two nodes on a single instance
Great, thanks!
--
You received this message because you are subscribed to the Google Groups
Mark,
I've read one post (can't remember where) that the Node client was
preferred, but have also read where the HTTP interface is minimal overhead.
So yes, I am currently using logstash with the HTTP interface and it works
fine.
I also performed some experiments with clustering (not much,
Eitan,
My recommendation is to use the stdin input in logstash and avoid its file
input. Then, for testing you pipe the file into your logstash instance. But
in production, you should run the GNU version of *tail -F* (uppercase F
option) to correctly follow all forms of rotated logs, and the
I just posted this question on Stackoverflow:
I have been setting up a cluster of Elasticsearch in Azure, using Ubuntu
VM, following the tutorial on the plugin page (elasticsearch-cloud-azure)
on github. I've managed to configure everything and I have elasticsearch
running, but I have 3
Hi ,
can you please provide inputs to update the existing field type in the
mapping.Below is the requirement.
I have crated contractIndex and it is type is conract. In that i have
fields contractid as long, contract number as long but i want to change
contract number type as string.
You must create each VM under the same cloud service.
azure vm create azure-elasticsearch-cluster
Cloud service name is azure-elasticsearch-cluster
--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 21 juin 2014 à 03:54, Pedro Alonso pedro@gmail.com a écrit :
I just
You can't.
You basically need to reindex.
That said, you can try to use a multifield which add a String representation of
the same field. But old values (old docs) won't have this new field populated.
HTH
--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 21 juin 2014 à
I just discoverd these strange update_mapping loglines come from a
completely unrelated thing, so please take this post as invalid and accept
my apologies.
On Thursday, June 19, 2014 1:21:32 PM UTC-4, JoeZ99 wrote:
This is a somehow bizarre question. I really hope somebody jumps in,
because
Thanks Alex. What do you mean by not all parent documents (and not the data),
just their ids what decides what which parent document ids get loaded? Also,
this ids that get loaded are per query or they stay around longer? I ask
because in our use case we're going to keep adding more and more
85 matches
Mail list logo