Is it possible to create an aggregation where I can do a sum on the results
of a sub bucket?
I'm working on twitter data. In this data I have a bunch of retweets of
different users.
Say that user A has 10 tweets that are retweeted a hundred times in my
dataset. I want to find the maximum
Quick question about the ES twitter river at
https://github.com/elasticsearch/elasticsearch-river-twitter
The twitter streaming API allows you to filter, and you apparently get up
to 1% of the stream total, with our search queries. So, if I were filtering
for coffee, I'd get coffee tweets that
I have heard that ideally, you want to have a similar number of documents
per shard for optimal search times, is that correct?
I have data volumes that are just all over the place, from 100k to tens of
millions in a week.
I'm thinking about a river plugin that could:
Take a mapping object as a
I'm trying to build a basic understanding of how indexing and searching
works, hopefully someone can either point me to good resources or explain!
I'm trying to figure out what having multiple coordinator nodes as
defined in the elasticsearch.yml would do, and what having multiple search
load
javascript:
web: www.campaignmonitor.com
On 22 March 2014 08:25, Josh Harrison hij...@gmail.com javascript:wrote:
I'm trying to build a basic understanding of how indexing and searching
works, hopefully someone can either point me to good resources or explain!
I'm trying to figure out what having
It doesn't look like the elasticsearch-py API covers the river use case.
When I've run into things like this I've always just run a manual CURL
request, or if I need to do it from within a script I just do a basic
command with requests, ala
Say I have clusters A and B. Cluster A is consuming data using an ActiveMQ
river. I would like to stream data to cluster B as well. Do I just create a
secondary outbound AMQ channel and subscribe cluster B to it, or is there a
decent way to have a live copy of data going two places at once?
--
Analytics
Solr Elasticsearch Support * http://sematext.com/
On Wednesday, March 12, 2014 2:55:58 PM UTC-4, Josh Harrison wrote:
Say I have clusters A and B. Cluster A is consuming data using an
ActiveMQ river. I would like to stream data to cluster B as well. Do I just
create a secondary
I restarted my cluster the other day, but something odd stuck, resulting in
15/16 data nodes starting up an extra ES instance in the same cluster. This
ended badly as there were two nodes with identical display names, the
system locked up, etc.
When restarting again, to my horror, we were
I need to be able to pull 100s of thousands to millions of random documents
from my indexes. Normally, to pull data this large I'd do a scan query, but
they don't support sorting, so the suggestions I've seen online for
randomizing your results don't work (such as those discussed here:
.
On Wed, Feb 19, 2014 at 9:04 PM, Josh Harrison hij...@gmail.comjavascript:
wrote:
I need to be able to pull 100s of thousands to millions of random
documents from my indexes. Normally, to pull data this large I'd do a scan
query, but they don't support sorting, so the suggestions I've seen
I've got indexes storing the same kind of data split into weekly chunks -
there has been some fairly substantial variation in data volume.
I've got a mapping change I need to make across all the back data, and I'm
thinking it might make sense to try to rebalance the documents per shard so
that
I'm sure it isn't the case for everyone that is having data/shard problems,
but I had some real trouble doing a full cluster restart on an 18 node
cluster. Kinda nightmarish, actually, shards failing all over the place,
lost data because of lost shards, etc.
I finally realized that the
This particular cluster is 16 data nodes with SSD RAIDs connected to each
other and the two master nodes with infiniband.
Under 100 indexes and usually 3 shards per index with 1 replica. Overall
data volume is in the 1TB range.
I haven't tweaked the shard allocation settings from default.
-Josh
Great, thanks Jörg!
I'll start fiddling around with the langdetect plugin to see if I can get
it going with our library.
On Tue, Feb 11, 2014 at 1:18 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:
An analyzer plugin is the right thing. Adding the recognized/extracted
terms needs
Are there any decent ES specific stress testing tools out there that would
allow me to test what kinds of simultaneous load my cluster can handle with
concurrent users making queries? Searched around a bit and didn't see
anything.
Figured I'd ask before I come up with a test approach of my own!
In our case, we're just interested in query stress testing. We've got a web
app that queries our indexes that are organized based on weeks of the year,
with a bunch of aliases making it so specific portions of the data can be
reached easily. Questions about scaling the app have come up. In our
like
I said, it'd be in python since that's my language of choice. So it
wouldn't be as optimal a testing platform as a native Java app, I guess,
but still useful as a proof of concept.
On Thursday, January 30, 2014 4:41:06 PM UTC-8, Josh Harrison wrote:
In our case, we're just interested
Thanks Jörg, Mark and Nikolas, some great information here. The 6x6
configuration was something of a worst case example, the farthest we'd
probably stretch it would be 3 nodes per host on 16-18 hosts, which should
be a little more reasonable. Hopefully we'll be able to do a support
contract
I've got fields that have a few hundred thousand+ unique values that I'd
like to be able to facet on. Is there some way of essentially streaming the
exhaustive list of facet results, like I can search hits?
--
You received this message because you are subscribed to the Google Groups
While ES is still in a pre deployment stage at my job, there is growing
interest in it. For various reasons, a monster cluster holding everyone's
stuff is simply not possible. Individual projects require complete control
over their data and the culture and security requirements here are such
The subject says it all pretty much, is it possible to turn off the
reporting of version data in response to GET http://localhost:9200?
Thanks,
Josh
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop
that you can disable
returning the version field.
--
Ivan
On Thu, Dec 19, 2013 at 12:27 PM, Josh Harrison hij...@gmail.comjavascript:
wrote:
The subject says it all pretty much, is it possible to turn off the
reporting of version data in response to GET http://localhost:9200?
Thanks,
Josh
23 matches
Mail list logo