Hi everyone,
I'm using elasticsearch for my webshop products to have a fast
search/navigation and have approximate 5000 products (85000 documents
indexed). I'm using elasticsearch as service and it works fine, but it's
eating my CPU. I think my use of elasticsearch is very minimal and still my
Ivan,
thank you for your help and good explanations.
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this
I wrap up my Elasticsearch apps in a Wildfly web app with JAX-RS API, using
the native java transport protocol.
You do not need to use HTTP with Java app logs. Just write a log4j appender
or something like this and connect this to a bulk indexer.
Jörg
--
You received this message because you
Use Case
- Hotels (~some thousand) with Offers (~some million)
- Hotels and offers each have a lot of data where filters are applied
- Sorting needs to be done by minimum offer price or some other ratings
- Need to retrieve the cheapest offer per hotel (while matching all
hotel
In my application I want to index user profiles and events. Each event is
performed by a specific user at a specific time. I would like to be able to
look at a specific event and see statistics on the users that have
preformed them. To make this possible I have done some initial experiments
There’s also an edge-case with shard allocation during cluster restarts that
can result in the loss of data if a shard is being re-allocated.
I saw this behaviour in 0.90.1, recently upgraded to 0.90.10 and haven’t had a
failure case like this yet. My use case is logstash style daily indexes
Hi Francois,
currently ES just supports the WGS84 projection, which is also the base for
the GPS system. In the future we maybe allow custom projections. May I ask
for your use-case?
cheers,
Florian
On Wednesday, February 12, 2014 8:08:18 PM UTC+9, Francois Brunet wrote:
Which
It was just to be sure about the projection system.
I have no need to store another projection for the moment. WGS84 is the
best choice :)
Thanks a lot for your answer.
Le mercredi 12 février 2014 13:34:32 UTC+1, Florian Schilling a écrit :
Hi Francois,
currently ES just supports the WGS84
I can't think of a way this can be done at the moment (unless of course the
categories are finite and you can build a massive query using combinations
of them). However, you can always precompute the distinct category count
per author prior to indexing and then include it as an extra field in
Hey,
you may have used an analyzer which removed stopwords? Try with another one
and see if that works... side note: The default analyzer of elasticsearch
0.90 removes stopwords, the default one of 1.x does not, so take care of
the version you are trying this iwth.
--Alex
On Tue, Feb 11, 2014
Is there any way to optimize query in Elasticsearch? I am using below
query. Its taking average `15-20s` and sometimes it little bit fast `4-5s`.
My server configuration :- Centos 6.3, 8 Core 16GB RAM
{
fields: [
_id,
aff_id,
post_uri,
blog_cat,
Hello All,
I want to create a application wise dashboard in kibana like as
*1) Dashboard for domain example1.com:* It should show php, css, js, and
jpg/png hit count for domain example1.com
*2) Dashboard for domain example2.com:* It should show php, css, js, and
jpg/png hit count for domain
I'm not sure your mapping actually does what you think/expect it to do.
Actually, I don't believe you can combine multiple analyzed-already tokens
from different fields into 1 field at all. Your best bet for correctness is
probably just to leave all the multi-fields alone and then run queries
Btw, I found a different way to achieve my primary goal using a different
percolate query :
curl -XPUT 'http://127.0.0.1:9200/_percolator/tester/test1' -d '{
query:{
query_string:{
fields:[
uri_field
],
query:*test*
}
}
}'
But it would be better if I could use simple query strings with
Oh I see, sounds like you want to sort by relevance and then have the
distance factored into the relevance score also. You might want to take a
look at the function_score query:
May I ask what is your ES_MAX_MEM setting? Is it possible that it is set
too low - like the default 1G?
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
It will be the original field value from the source document. And then it
is analyzed by whatever analyzer you assign to the _all field.
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails
FYI, ES has very frequent releases to fix bugs discovered by the community.
If you find a data loss problem in your current install (and assuming it is
indeed an ES problem), please try the latest build and see if it fixes it.
Chances are it has already been discovered and fixed in the latest
Hello,
I would like to collect some stats on the entries being written when
running a MapReduce job using the elasticsearch-hadoop library. I am using
the default Mapper.class with a batch of entries in JSON files as input to
MR, e.g.
job.setInputFormatClass(TextInputFormat.class);
A couple of suggestions:
1) You probably want range condition to go down to the filter part also (so
bool it with the query_string filter)
2) The term (url.cat=sports) query can potentially move down to the filter
section too (so bool it with the query_string filter)
2) The query_string/query
I'm using 3Gb. Is that sufficient?
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the
We can introduce such counters. What exactly are you interested in?
The default counters in Hadoop provide information on the amount of data
read/written.
Do you want to extract the information directly in Hadoop as oppose to ES
proper?
On 12/02/2014 5:13 PM, Abhijit Bose wrote:
Hello,
I
The number of categories is finite and relatively low count. As you are
suggesting, querying for all combinations is an option as well as
precomputing. I wanted to see if there was a way to do it efficiently at
query time.
Thanks,
Colin
On Wednesday, February 12, 2014 8:36:44 AM UTC-5, Binh
Never mind,
Root cause was a network issue which got identified.
Setting keepalive in ES kernel params solved the problem.
Cheers,
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from
New patch for elasticsearch-zookeeper supports
0.90.11: https://github.com/sonian/elasticsearch-zookeeper/pull/21
My group recently chose to use zookeeper to manage elasticsearch clusters.
I started using sonian/elasticsearch-zookeeper plugin, but ran into
compatibility problems (supports
I would like to be able to search all fields for a certain string and get
all distinct matching key value pairs as a result. It should also be
possible to add filters/queries to constrain the results. The original data
consists of millions of documents and a few thousand possible keys so I
Don't miss out: *Spring* *Data*
*Elasticsearch*https://twitter.com/search?q=%23Elasticsearchsrc=hash1.0.0.M1
released
https://spring.io/blog/2014/02/11/spring-data-elasticsearch-1-0-m1-released
On Friday, 22 March 2013 15:56:48 UTC, Mohsin Husen wrote:
Following support is added
1)
Appreciated, but keep in mind large installations can’t just constantly
upgrade. And if ES is being used in critical infrastructure upgrading may mean
many hours of recertification work with auditors and assessors. The project is
still relatively young, but just upgrade isn’t always
I'm glad I was able to steer you in the right direction. I flubbed a PR
recently since I have not used git consistently in the past few years, so I
am glad someone else can learn from my mistakes. Your PR seemed to have
gained some attention! :)
Ivan
On Tue, Feb 11, 2014 at 1:17 PM,
Pankaj:
You should be able to pin a query on each dasboard to filter down to only
the log events that you're interested in. So for example in your first
case, you can pin a filter (query) like:
vhost.raw:example1.com
And everything in your current dashboard will narrow down to example1.com
Hi all, I have the following parent/child/grandchild documents on my ES:
DOC. TYPE: /users/user (parent is none)
_source: {
prefix: TST_LUIZ,
id: c0338fde-981c-478a-bcbe-2548ab967dce,
cookieids: [
b6d3c8b4-a075-49a3-9493-f1bf0f56312e
]
}
DOC. TYPE:
This is to capture the time taken by ES to process the items in that batch
of records. Yes the total size written in bytes will already be in a MR
counter.
On Feb 12, 2014 8:30 AM, Costin Leau costin.l...@gmail.com wrote:
We can introduce such counters. What exactly are you interested in?
The
Hey
I updated to 1.0.0, but I'm getting a strange issue:
As from
http://www.elasticsearch.org/guide/en/elasticsearch/client/java-api/current/search.html
I tried to run
SearchResponse response = client.prepareSearch(index1, index2)
.setTypes(type1, type2)
I'd try to up it to 8GB, assuming of course you still have a lot (like
close to 8GB) free. If that still doesn't work, once you get to the high
CPU state, try to run this and it'll tell you what threads in ES is doing
what with the CPU:
curl localhost:9200/_nodes/hot_threads?pretty
--
You
The documentation has not been correct for version 1.0 [1]. The method
should be now called setPostFilter. Better yet, you should look into
filtered queries [2].
[1] https://github.com/elasticsearch/elasticsearch/pull/4461
[2]
Ah great, thanks.
Is this stackoverflow answer incorrect then, or still correct?
http://stackoverflow.com/questions/14595988/queries-vs-filters
Namely, which is more efficient: a query, a post filter, or a filter on a
query?
Eg, which one is the best?
1)
The answer is still correct. What the git commit that I referenced
essentially accomplished was to remove the ambiguity between the different
filters. The filter that is part of a filtered query can be thought of as a
prefilter. Here is the breakdown of what happens in your three cases:
1)
IMO evaluating this issue starts with applying the CAP Theorem which in
summary states that networked clusters with multiple nodes can offer only 2
of the following 3 desirable objectives
Consistency
Availability
Partition tolerance (data distributed across nodes).
ES clearly does the
So what you are saying is, there is no way to aggregate together into one
place all the tokens generated by one document?
I mostly wanted to do this so that an end user doesn't have to understand
what fields are in the document, or lucene query syntax to get the results
they are looking for.
Maybe try something like this (assuming frequency is available for all
children)?
GET /users/browser_TST_LUIZ/_search
{
min_score: 6,
query: {
has_child: {
type: browser_TST_LUIZ_fr,
score_type: sum,
query: {
custom_score: {
query: {
Hello,
Would like to ask if there is any update/change to using an Elasticsearch
logo since this Q was originally asked a year ago.
BTW - I notice that the current Elasticsearch website doesn't even display
a logo...
Tony
On Sunday, March 25, 2012 4:48:27 AM UTC-7, kimchy wrote:
Go
I'm sure it isn't the case for everyone that is having data/shard problems,
but I had some real trouble doing a full cluster restart on an 18 node
cluster. Kinda nightmarish, actually, shards failing all over the place,
lost data because of lost shards, etc.
I finally realized that the
Great, thank you, that makes it very clear. That explanation should be
added to the query/filter/postfilter docs!
On Wed, Feb 12, 2014 at 9:46 PM, Ivan Brusic i...@brusic.com wrote:
The answer is still correct. What the git commit that I referenced
essentially accomplished was to remove the
Thanks for sharing this info. It's really helpful. In any case data loss
shouldn't be acceptable to anyone, especially index corruption and not able
to recover at all. I think one shouldn't confuse consistency with data loss
as suggested in this thread. It's also good to hear that most of the bugs
Josh,
Your experience about recovering in only about 10 minutes is very interesting.
Because my little 5-node cluster/15GB data/3500 indices is taking about an hour
to recover and i know the bottleneck is the disk subsystem I'm currently on,
Am curious
- What is the total size data in your
I use replica shard level 1, and always use latest ES version. I never had
data loss, and that is also due to the fact I have access to dedicated real
servers in our DC just a few meters away, and there are no servers at cloud
server farms with unknown and unstable network environment.
I do
This particular cluster is 16 data nodes with SSD RAIDs connected to each
other and the two master nodes with infiniband.
Under 100 indexes and usually 3 shards per index with 1 replica. Overall
data volume is in the 1TB range.
I haven't tweaked the shard allocation settings from default.
-Josh
On Wed, Feb 12, 2014 at 1:58 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:
I use replica shard level 1, and always use latest ES version. I never had
data loss, and that is also due to the fact I have access to dedicated real
servers in our DC just a few meters away, and there are no
I want to scale ES between data centers for use with logstash and tribe
node(s) sound like a great solution to this. One problem is that I'm not
sure how aliases work across clusters.
Here's the general idea. two clusters and an alias index on the tribe
cluster. Can you do this? Thanks for
Thanks for your help!
{
_index: myindex,
_type: knowledge,
_id: 1288,
_version: 1,
exists: true,
_source: {
id: 1288,
question: Est-ce que je dois installer des applications ou des
bibliotheques de KOMPLETE 9 a nouveau si celles-ci sont déja sur mon
ordinateur,
answer:
I didn't see anything on the list, but there's a blog post about 1.0.0
hitting general availability!
http://www.elasticsearch.org/blog/1-0-0-released/
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com
--
You received
Hi,
JDBC river plugin 1.0.0.1 for Elasticsearch 1.0.0 has been released.
https://github.com/jprante/elasticsearch-river-jdbc
Changes:
- compiled against Elasticsearch 1.0.0
- refactored some classes for preparing the move to a more robust data
gathering plugin
- improved JSON building when
Hi,
The method setFilter on this page no longer exists:
http://www.elasticsearch.org/guide/en/elasticsearch/client/java-api/current/search.html
Any tips on what that has changed to?
Thanks,
Ben
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
Hello
If I push to same index, but different type with elasticsearch.
Can I do it??
Thanks
David Pilato於 2014年2月11日星期二UTC+8下午12時56分15秒寫道:
Which process pull your data from MongoDB and MySQL and push to
elasticsearch?
If you can't do that at index time, then I guess you will need to manage
However it looks like the repo's are still on RC2;
markw@na0-esd-a-001:~$ sudo apt-get dist-upgrade
Reading package lists... Done
Building dependency tree
Reading state information... Done
Calculating upgrade... Done
The following packages will be upgraded:
elasticsearch
1 upgraded, 0 newly
Thanks! I take it that's just a name change for clarity and not a
functional change...
Thanks,
Ben
On Wed, Feb 12, 2014 at 7:25 PM, Kevin Wang kevin807...@gmail.com wrote:
It has been changed to setPostFilter(...)
On Thursday, February 13, 2014 1:39:11 PM UTC+11, Ben McCann wrote:
Hi,
Mark,
I believe this has to do with the way the packages have been named. I see
the same issue with rpm packages.
Rpm (and I believe dpkg) will consider 1.0.0.RC2 to be newer than 1.0.0.
[root@dlpuppet01 rpmbuild]# rpmdev-vercmp 1.0.0-1 1.0.0.RC2-1
0:1.0.0.RC2-1 is newer
On Wed, Feb 12, 2014 at 1:58 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:
For my requirements, downtime of 15 min is acceptable.
I can only wish! I run an ecommerce site, so my requirement is no downtime.
Ever.
--
Ivan
--
You received this message because you are subscribed to
I don't think you can.
--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 13 févr. 2014 à 03:54, Nick Chang nick.ch...@kland.com.tw a écrit :
Hello
If I push to same index, but different type with elasticsearch.
Can I do it??
Thanks
David Pilato於
Hi Binh,
Thanks for reply,
But, I want to search php, css ,js and jpg count from example1.com then I
think, I need to query like below:
1) vhost:example1.com AND *.php for php count.
2) vhost:example1.com AND *.css for css count.
3) vhost:example1.com AND *.js for js count.
I think it
60 matches
Mail list logo