Storing and analyzing user agent strings, general approach

2014-06-26 Thread Mark Dodwell
I want to store a bunch of documents in elasticsearch (which represent a hit to a website) including the user agent of the client that made the original HTTP request. Since user agent strings have a lot of variance, and the useful parts need parsing out (OS, browser, version etc.) I would like

Re: Storing and analyzing user agent strings, general approach

2014-06-26 Thread Patrick Proniewski
Hi, You should give http://logstash.net/docs/1.4.2/filters/useragent a try before anything else. Here is the relevant part of logstash.conf I'm using: filter { if [type] == apache { if [user-agent] != - and [user-agent] != { useragent {

Re: No terms generated for trigram analyzer

2014-06-26 Thread Andreas Falk
Ok, thanks again for the help! On Wednesday, June 25, 2014 3:37:00 PM UTC+2, Cédric Hourcade wrote: In fact they are in the _all field, but not analyzed with your trigrams analyzer. Cédric Hourcade c...@wal.fr javascript: On Wed, Jun 25, 2014 at 3:12 PM, Andreas Falk adde...@gmail.com

Response time of Java percolate API is unstable

2014-06-26 Thread Seungjin Lee
Hi, We're now in performance test and seeing some unexpected result. We use Java percolate API client.preparePercolate().setIndices(index).setDocumentType(projectName).setSource(log).execute().actionGet(); LOGGER.info(duration+ms for percolation, es time +response.getTookInMillis()+ ms for log

Re: #W%$ Watch Godzilla Full Movie Online 2014

2014-06-26 Thread julisa swan

Sorry for the spam, folks .... Re: [High] =|) Watch X-Men: Days of Future Past 2014 Full Movie Online

2014-06-26 Thread Leslie Hawthorn
Hello everyone, Sorry for the recent spam. User banned, and I'm heading clean up the online archives now. Cheers, LH -- Leslie Hawthorn Community Manager http://elasticsearch.com Other Places to Find Me: Freenode: lh Twitter: @lhawthorn Skype: mebelh Voice: +31 20 794 7300 -- You received

Re: Rivers are reimporting data at each ElasticSearch restart

2014-06-26 Thread Stéphane Seng
Thanks for your quick reply, I need some clarifications about what you meant by delete the river, delete the _river index and by this state is useful for flow control. From what I have understand from your reply and supposing that I have imported data into a documents river using the JDBC

Realtime search + fast indexing

2014-06-26 Thread Nico Krijnen
Hi, We have recently migrated our application from 'bare Lucene + Zoie for realtime search' to Elastic Search. Elastic search is awesome and next to scalability, it gives us lots of additional features. The one thing we really miss though is realtime search. Search is the core of our

Re: Elastic-search as our primary database.

2014-06-26 Thread Swen Thümmler
Am Mittwoch, 25. Juni 2014 11:50:42 UTC+2 schrieb rayman: We are thinking of using Elastic-search as our primary database. But i am concerned about few things: 1. If we need to modify a document type(let's say add new field) we will need to re-index all raw data. therefor we need to keep

Update to include _all

2014-06-26 Thread Shawn Ritchie
Hi, Our current mapping does not support _all field, is their a way to update the mapping to include it or is re indexing required? Regards Shawn -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop

Re: Rivers are reimporting data at each ElasticSearch restart

2014-06-26 Thread joergpra...@gmail.com
Yes, removing a river is DELETE _river/rivername and deleting river index is DELETE _river The JDBC river state keeps track of some timestamps, counters, and the last row of SQL statement. Yes, in case of a node switchover, where the river instance is restarted on another node, the new node could

Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-26 Thread Maco Ma
Added the Solr benchmark as well: Number of different meta data field ES with disable _all/codec bloom filter ES (Ingestion Query concurrently) Solr Solr(Ingestion Query concurrently) Scenario 0: 1000 13 secs -769 docs/sec CPU: 23.68% iowait: 0.01% Heap: 1.31G Index Size: 248K Ingestion

Re: Realtime search + fast indexing

2014-06-26 Thread joergpra...@gmail.com
Zoie is not for distributed search. If you want to analyze the LinkedIn developments for this area with Lucene, you should look at Sensei There was also a BalancedSegmentMergePolicy donated to Lucene 2.x from the Zoie project https://issues.apache.org/jira/browse/LUCENE-1924 but there was not

Re: search point in polygon

2014-06-26 Thread Peter Johnson
Did you get this working in the end Maarten? I have the same problem with the way 'intersects' works and Jilles's solution doesn't work for me; possibly due to the 'tree_levels' accuracy for quad tree. As a kind of workaround, I was thinking that you could draw 2 'envelope' geo_shape

Re: performance of multi_match

2014-06-26 Thread Christoph Lingg
Hm, I encounter strange scoring results I do not understand I tracked down the scoring and it seems like the 'queryWeight' is missing sometimes. thats what explain give me for one document: { value: 8.252264, description: weight(collector_1.default.raw:salzburg^18.0 in 11412869)

Anyone have elasticsearch-1.2.0-noarch.rpm?

2014-06-26 Thread Антон Кикоть
need it asap, please share me somebody river-mongodb don't work with 1.2.1 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to

Re: Anyone have elasticsearch-1.2.0-noarch.rpm?

2014-06-26 Thread Mark Walkom
There's a critical bug with 1.2.0 which is why it was removed. See http://www.elasticsearch.org/blog/elasticsearch-1-2-1-released/ Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 26 June 2014 21:38, Антон Кикоть

Re: Anyone have elasticsearch-1.2.0-noarch.rpm?

2014-06-26 Thread Антон Кикоть
it is cool Man, but i have production server with elasticsearch (v 1.2.1) and i need elasticsearch-river-mongodb but when i try to setup river in new version ES it is don't work: {1.2.1}: Initialization Failed ... - ExecutionError[java.lang.NoClassDefFoundError:

Re: Getting OOME's (Out of Memory Exceptions) to stop

2014-06-26 Thread Robin Clarke
Very few GC messages in the logs, and none around the OOM instances... Cheers, -Robin- On Wednesday, 25 June 2014 16:55:03 UTC+2, Michael Hart wrote: What does your GC Old Count and GC Old Duration look like? Do you have warnings in the logs about long GC's? I've got similar issues and

Re: Bulk API : inconsistent responses with update actions

2014-06-26 Thread David Pilato
It looks like a bug to me. I think you should open issue and add all those details in. Best --  David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 26 juin 2014 à 13:53:03, Tanguy Moal (tanguy.m...@gmail.com) a écrit: Dear group, I'm experiencing an issue

Re: Anyone have elasticsearch-1.2.0-noarch.rpm?

2014-06-26 Thread David Pilato
It sounds like it has not been updated for es 1.2:  https://github.com/richardwilly98/elasticsearch-river-mongodb/pull/283 --  David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 26 juin 2014 à 13:51:25, Антон Кикоть (antony.ki...@gmail.com) a écrit: it is

Re: Elasticsearch river for postgres

2014-06-26 Thread Jorge von Rudno
Hi David and Jörg, Many thanks for your help. Finally I can found the problem. It was in the version of the postgres driver. Kind regards!!! Jorge von Rudno 2014-06-25 18:01 GMT+02:00 joergpra...@gmail.com joergpra...@gmail.com: You did not specify an index for the JDBC river to index to,

Re: puppet-elasticsearch options

2014-06-26 Thread Andrej Rosenheinrich
Hi Richard, thanks for your answer, it for sure helped! Still, I am puzzling with a few effects and questions: 1.) I am a bit confused by your class/instance idea. I can do something pretty simple like class { 'elasticsearch' : version = '0.90.7' } and it will install elasticsearch in the

Re: Anyone have elasticsearch-1.2.0-noarch.rpm?

2014-06-26 Thread Антон Кикоть
hi i try to do, to the same the link https://github.com/richardwilly98/elasticsearch-river-mongodb/pull/283 https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Frichardwilly98%2Felasticsearch-river-mongodb%2Fpull%2F283sa=Dsntz=1usg=AFQjCNGaUmM2RZAnuGNK3qnMeL2hlf5A9w but nothing change =((

Re: Elastic-search as our primary database.

2014-06-26 Thread Stephane Bastian
Just for what itś worth: we have been using ES as our primary datastore for almost 2 years. so far so good. I think that the blog post you are referring to is *very* interesting *but* at the same time, think about how many sql databases out there are not even backed-up in production... are they

Re: unable to link C library. native methods (mlockall) will be disabled

2014-06-26 Thread Antonio Augusto Santos
I've run into this problem because my /tmp is mounted as noexec. Worked around this using these java opts: -Djna.tmpdir=/usr/share/elasticsearch/tmp -Djava.io.tmpdir=/usr/share/elasticsearch/tmp Cheers On Wednesday, June 25, 2014 7:28:17 PM UTC-3, dup90011 wrote: Centos 6(64) Tried to

Update dataset import to use bulk api now all the documents are nested under a doc field.

2014-06-26 Thread Jef Statham
I switched up my importing of a csv file from doing single inserts to a bulk insert but I'm not sure why all my document are nested in a doc field instead of inserted as value. There is no 'doc' field in the dataset so I'm not sure where that value is coming from. ie: hits: [ {

Re: Corss-index parent/child relationship

2014-06-26 Thread Matt Weber
See PR #3278. Hopefully it will get merged into one of the next releases. https://github.com/elasticsearch/elasticsearch/pull/3278 Thanks, Matt Weber On Thu, Jun 26, 2014 at 12:10 AM, Thomas thomas.bo...@gmail.com wrote: Hi, Unfortunately this is not supported by elasticsearch, the

Re: Query on Id field of nested documents fails.

2014-06-26 Thread dazraf
Hi, Could anyone help please? We're kind of stuck right now - trying to get to a point that we demonstrate ES working for our use-cases to get management blessing. thanks Fuzz. On Tuesday, 24 June 2014 14:43:01 UTC+1, dazraf wrote: Hi, Very grateful for any help with the following (rather

Re: performance of multi_match

2014-06-26 Thread Christoph Lingg
other unexpected results arise due to different queryNorms: for the first result i get a query norm: { value: 0.0059806756, description: queryNorm } for some other documents it's: { value: 0.0031318406, description: queryNorm } the querynorm is multiplied to create the score, so it

Re: Copy index from production to development instance

2014-06-26 Thread Brian Lamb
Thank you for your suggestion. I tried the stream2es library but I get a OutOfMemoryError when trying to use that. On Friday, June 6, 2014 5:13:19 PM UTC-4, Antonio Augusto Santos wrote: Take a look at stream2es https://github.com/elasticsearch/stream2es On Friday, June 6, 2014 2:13:06 PM

Re: Query on Id field of nested documents fails.

2014-06-26 Thread dazraf
Is this related to this issue? https://github.com/elasticsearch/elasticsearch/issues/3022 On Tuesday, 24 June 2014 14:43:01 UTC+1, dazraf wrote: Hi, Very grateful for any help with the following (rather urgent) issue. Gist: https://gist.github.com/dazraf/55ebb900b3c17583bf58 The script

Using elasticsearch percolate + N indices as a feed system

2014-06-26 Thread Yan Pritzker
Hey all, We are considering building a fan-out feed inbox type system on top of ES for Reverb.com. The way it would work is each user can follow some number of searches. Using the percolator, we would plop new items as they matched searches into individual user feeds. We are going to have the

Average over buckets doc_count

2014-06-26 Thread Demetrius Nunes
Hi there, Is there a way to calculate the average over the doc_count result of a bucket aggregation? For instance, I have this aggregation query: GET channel/Subscription/_search { size: 1, aggs: { SubscriptionsPerUser: { terms: { field: UserId, min_doc_count: 0,

Elastic+PHP SDK issue - script not working when served by apache but works fine in cmd line (ubuntu on EC2 instance)

2014-06-26 Thread Xavier Sorgel
Im looking for some pointers on how to debug my issue: my php scripts works fine when run from the command line but somehow crashes when served by apache (Im using an EC2 instance that runs apache and elasticsearch on ubuntu) It seems $client = new Elasticsearch\Client() fails when served by

Multi-tenancy strategy: 1 index with 1 shard and 1 replica per client

2014-06-26 Thread Drew Kutcharian
Hey Guys, I'm working on an analytics dashboard project where we collect events into Elasticsearch for clients. Each client could have millions of events per month. We are thinking of using one index with one shard and one replica per client. Looking at Logstash, it seems like Logstash creates

Re: unable to link C library. native methods (mlockall) will be disabled

2014-06-26 Thread dup90011
Hmmm ... How /tmp affected link C library? My /tmp is fine Any elasticsearch support exists here or only for paid version? -- View this message in context:

Re: Copy index from production to development instance

2014-06-26 Thread Himanshu Agrawal
From the data you have provided I see that your bucket and keys for development and production are different. Point your development elasticsearch instance to the same AWS account and bucket in which you are storing the snapshot. On Jun 26, 2014 9:15 PM, Brian Lamb brian.l...@researchsquare.com

Re: unable to link C library. native methods (mlockall) will be disabled

2014-06-26 Thread Antonio Augusto Santos
Check *mlockall *section on http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html The ES devs are around and they may answer your question. But real support only on paid version. On Thu, Jun 26, 2014 at 4:11 PM, dup90011 regis...@xdbr.com wrote: Hmmm

Re: Elastic+PHP SDK issue - script not working when served by apache but works fine in cmd line (ubuntu on EC2 instance)

2014-06-26 Thread Xavier Sorgel
oh well - curl was not installed it seems. I installed curl, restarted apache and now everything works fine... X On Thursday, June 26, 2014 10:16:21 AM UTC-7, Xavier Sorgel wrote: Im looking for some pointers on how to debug my issue: my php scripts works fine when run from the command line

Re: Multi-tenancy strategy: 1 index with 1 shard and 1 replica per client

2014-06-26 Thread Andrew Selden
Drew, The Elasticsearch default is to create 5 shards for each index. I would start with this. Typically it is best to actually over-shard, which is to say have more than 1 shard per node per index. There is not really any measurable cost to this and it gives you flexibility in your design as

Re: unable to link C library. native methods (mlockall) will be disabled

2014-06-26 Thread joergpra...@gmail.com
ES runs on RHEL/Centos 6. What exact RHEL/Centos 6 is this? Find out with command cat /etc/redhat-release What Java JVM do you use? What file system do you have ES installed on? It must have permission to execute binaries. Also you can run this little Java program to find the reason: Save

Re: Notifications from Elasticsearch when documents are added.

2014-06-26 Thread Matthew Parrott
Hi! Have there been any further explorations in the area of wan replication? I have ES clusters in multiple datacenters connected via high-speed private network. I'm wondering if multi-master replication would be possible in this environment or if we'd need some type of 'shovel' plugin like

[Aggregation] Be able to count number of item in a sub-collection

2014-06-26 Thread Grégoire Pineau
Hello I have the following data [ { id: 1, collection: [ { key: val1 }, { key: val2 }, { key: val3 } ] }, { id: 2,

Unsure of growth of index when nothing happening (clean install)

2014-06-26 Thread Grant Christensen
Hi all, first time Elasticsearch user. I am using Elastic as part of running SugarCRM 7 on-site, as it is a pre-req. I have a clean Ubuntu 14.04 install with a LAMP stack and Java installed. I have installed Elasticsearch using apt-get based on the repository instructions on the Elastic

Re: Unsure of growth of index when nothing happening (clean install)

2014-06-26 Thread Mark Walkom
Which index is growing? Chances are it is the marvel index(es), which is expected. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 27 June 2014 10:07, Grant Christensen grant.christen...@supercorp.com.au wrote:

RE: Unsure of growth of index when nothing happening (clean install)

2014-06-26 Thread Grant Christensen
Yes that is correct. Looks like a daily Marvel index is growing. Is this the plugin grabbing stats? Grant Christensen General Manager - Sales and Product [More about Supercorp]http://www.supercorp.com.au/ e: grant.christen...@supercorp.com.aumailto:%20grant.christen...@supercorp.com.au |

Re: [720] +}/) Watch Chef 2014 Full Movie Online for Free

2014-06-26 Thread Alanza Belicia
*Be smart people please.Do not believe the person who gave you the fake link.here is reliable as the trusted standard for watch box office movies.just click for watch or

Re: Unsure of growth of index when nothing happening (clean install)

2014-06-26 Thread Mark Walkom
Sure is. There's more info in the docs http://www.elasticsearch.org/guide/en/marvel/current/ Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 27 June 2014 10:11, Grant Christensen grant.christen...@supercorp.com.au

Re: [720] +}/) Watch Chef 2014 Full Movie Online for Free

2014-06-26 Thread Cheney brochu

Re: Multi-tenancy strategy: 1 index with 1 shard and 1 replica per client

2014-06-26 Thread Drew Kutcharian
Hi Andrew, Not sure if you read my original question. The question is about having a separate index per customer since we are going to have 1000 customers but each would have a lot of data. Each shard comes with it's own overhead since it's an instance of Lucene. I was going with the 1 shard

Re: Multi-tenancy strategy: 1 index with 1 shard and 1 replica per client

2014-06-26 Thread Mark Walkom
Pretty sure he read it as I'd have offered the same advice :) You cannot change the sharding of an index after creation, you need to completely reindex the data to do so. This may not be a major issue for you but it's something to take into account when you have hundreds or thousands of customers,

Re: Multi-tenancy strategy: 1 index with 1 shard and 1 replica per client

2014-06-26 Thread Drew Kutcharian
Hi Mark, The problem that we have is that each customer could generate 60-80 million docs/month on average. In addition, when a customer leaves, we would need to delete all their data. So hence it makes sense to have an index per customer (or even multiple indexes per customer). Another issue

Re: Corss-index parent/child relationship

2014-06-26 Thread Drew Kutcharian
Thanks Matt, that feature is exactly what we need. One thing I couldn't figure out was that I would be able to pass a routing key so only relevant shards would be queried, right? On Jun 26, 2014, at 8:14 AM, Matt Weber matt.we...@gmail.com wrote: See PR #3278. Hopefully it will get merged

Re: Corss-index parent/child relationship

2014-06-26 Thread Matt Weber
I have not tested routing but I did put that functionality in so it should work fine. Let me know if you have any issues! Thanks, Matt Weber On Thu, Jun 26, 2014 at 7:20 PM, Drew Kutcharian d...@venarc.com wrote: Thanks Matt, that feature is exactly what we need. One thing I couldn’t

Elasticsearch maven SNAPSHOT repository

2014-06-26 Thread Veerapuram Varadhan
Hi All, I am working on a project with elasticsearch and require the top_hits aggregation. With maven central having only upto 1.2.0, I am currently unable to test/develop the module that requires top_hits aggregation. This is the key feature that made us to move to elasticsearch. If there

Re: Multi-tenancy strategy: 1 index with 1 shard and 1 replica per client

2014-06-26 Thread Mark Walkom
Ahh ok, knowing this extra info is good as it helps us help you :) Logstash doesn't define how many shards to use, at least not that I can see here - https://github.com/elasticsearch/logstash/blob/master/lib/logstash/outputs/elasticsearch/elasticsearch-template.json - or through some quick tests.

Re: Elasticsearch maven SNAPSHOT repository

2014-06-26 Thread David Pilato
You will find it in sonatype repo. HTH -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 27 juin 2014 à 06:49, Veerapuram Varadhan v.varad...@gmail.com a écrit : Hi All, I am working on a project with elasticsearch and require the top_hits aggregation. With maven