Re: Can we perform the text search presnet in the images or pdf files through elasticsearch

2014-04-18 Thread Rafał Kuć
Hello! Please look at the attachment plugin for Elasticsearch: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-attachment-type.html It uses Apache Tika under the hood. The list of supported formats is available here: http://tika.apache.org/0.10/formats.html --

Re: Can we perform the text search presnet in the images or pdf files through elasticsearch

2014-04-18 Thread Prashant Agrawal
Hi , If I am not wrong you are talking about https://github.com/elasticsearch/elasticsearch-mapper-attachments https://github.com/elasticsearch/elasticsearch-mapper-attachments So in this I can index the attachments(say pdf file) and that will be stored as base64 encoding. So is this plugin

Re: Can we perform the text search presnet in the images or pdf files through elasticsearch

2014-04-18 Thread Rafał Kuć
Hello! You'll need to send the file contents to Elasticsearch in base64 form and Elasticsearch will use Tika to extract data from the file. However, in typical case, you would rather store, not the whole data of the binary file (as it can be quite big), but rather a path to the file, so that the

Re: Can we perform the text search presnet in the images or pdf files through elasticsearch

2014-04-18 Thread Prashant Agrawal
So can I say that the mapper-attachment plugin is made to work like below: Whether I am sending text file or pdf file or image file to ES , the plugin will extract the *text content* in all three scenarios and will store it into the ES and then it will be available for search as well? -- View

Re: Solr SearchComponent-like functionality?

2014-04-18 Thread Srinivasan Ramaswamy
I would like to influence the ranking with few fields that are not stored in the index (eg click data for keyword-documents). I have used custom SearchComponent in Solr to implement similar functionality in the past. I am wondering how can i achieve the same in ElasticSearch. I know this

Re: Kibana-auth install under RHEL6 server ?

2014-04-18 Thread Andrea Martines
No one ? :( I keep trying but there's always a tool that does not work :/ -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to

Re: Can we perform the text search presnet in the images or pdf files through elasticsearch

2014-04-18 Thread Rafał Kuć
Hello! The attachment plugin will use Tika to extract the text from binary file content that you send in the base64. Tika does a good job with text extraction, however you have to test it yourself, if your files are parsed well enough for your use case. -- Regards, Rafał Kuć Performance

Re: ELK stack needs tuning

2014-04-18 Thread R. Toma
Hi Jörg, Thank you for pointing me to this article. I needed to read it twice, but I think I understand it now. I believe shard overallocating works for use-cases where you want to store search 'users' or 'products'. Such data allows you to divide all documents into groups to be stored in

[ANN] Elasticsearch AWS cloud plugin 2.1.1 released

2014-04-18 Thread David Pilato
Heya, We are pleased to announce the release of the Elasticsearch AWS cloud plugin, version 2.1.1. The Amazon Web Service (AWS) Cloud plugin allows to use AWS API for the unicast discovery mechanism and add S3 repositories.. https://github.com/elasticsearch/elasticsearch-cloud-aws/ Release

Re: Wildcard query is not working.

2014-04-18 Thread Dan Tuffery
You're setting the size parameter to 0 in your queries so it won't return anything. Also, you need to have an copy of the URL value in your index that is not analyzed which you can use for your wildcard query. In your mapping you need to specify that you want to index the URL value verbatim:

Re: Is ElasticSearch the Right Tool for This

2014-04-18 Thread Clinton Gormley
Hiya It's a bit more verbose, but yes you can do queries like that easily. I've assumed that all of your fields are exact value not_analyzed string fields, rather than full text fields: GET /_search { _source: [ col1, col2 ], query: { filtered: { filter: { bool: {

Setting Node ID

2014-04-18 Thread Michael Salmon
I'm planning on trying out multiple nodes on one host and I'd like to be able to control the node id but as far as I can see this is set in NodeEnvironment to the first unused value. The reason for setting the id is so that I would like to include it in the node name which I currently set to

Word count per document

2014-04-18 Thread Aharon Twizer
Hi, I'm new to ElasticSearch. What I want to do is to upload a few hundred documents and then look for words in those documents. The most important part is to get the count of the each word per document. e.g. If I look for the word boy, the answer I'll get is that it appears 3 times in

Re: Word count per document

2014-04-18 Thread Itamar Syn-Hershko
Yes, take a look here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-termvectors.html -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Author of RavenDB in Action http://manning.com/synhershko/

Re: Word count per document

2014-04-18 Thread Aharon Twizer
Thanks Itamar. But with the Term Vector I'll have to make a separate call for each document (I can have up to 20K documents). I want to be able to make a single call with the word I'm looking for and to get the statistics for each document. On Friday, April 18, 2014 2:52:53 PM UTC+3, Aharon

Re: Word count per document

2014-04-18 Thread Itamar Syn-Hershko
You should be able to do this using the aggregations framework: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations.html The idea is that you bucket on document ID, and then on terms, then do a count But I'm not sure it was designed to handle this scenario,

Elasticsearch on java7u55 ?

2014-04-18 Thread Lukáš Vlček
Hi, is anybody using Oracle Java 1.7.0_55 with Elasticsearch (v0.90.5)? Is it safe and recommended? I found Robert and Uwe discussed this Java version here: http://lucene.472066.n3.nabble.com/Update-lucene-apache-org-java-recommendations-with-java7u55-td4131353.html I found couple of failed

Re: Need some help for creating my model

2014-04-18 Thread Stefan Kruse
Ok new try. Is it general possible to do this with the PHP API, i dont find nothing in the docu. Maybe i dont see it. Regards Stefan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from

Re: Elasticsearch on java7u55 ?

2014-04-18 Thread Jason Wee
will these two links help? https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/SYSTEM_REQUIREMENTS.txt http://people.apache.org/~mikemccand/lucenebench/indexing.html lucene performance test is using java 1.70 u40. that's the same version i'm using for lucene 4.6.0. jason On Fri, Apr 18,

Re: logstash 1.4.0 debian package init script not working

2014-04-18 Thread Goofy03
Do you have check permission on /opt/logstash and /var/log/logstash /etc/logstash … same user than in the init script ? Solve this for me on debian but i can't get event when apache log is update. than if i run it in root (console way) all is working … Ho and i have add logstash user to adm

Re: Solr SearchComponent-like functionality?

2014-04-18 Thread Matt Weber
Yes, you can use the Function Score Query [1] in combination with a native script written in java [2]. With the native script you can basically do whatever you want, but be careful you can significantly impact your query performance if you are not careful. [1]

Getting phrase count for each document separately.

2014-04-18 Thread Amit
I would like to get a phrase count for every document. I do not wish to run a query for every document, i would rather run one single query. For example if i have the following documents: { name : John, Message : The lion is *very *fast } { name : Ben, Message : The

Getting phrase count for each document separately.

2014-04-18 Thread Amit
I would like to get a phrase count for each document separately. I do not wish to run a query for every document, i would rather run one single query. For example if i have the following documents: { name : John, message : The lion is *very **fast* } { name : Ben,

Re: Elasticsearch on java7u55 ?

2014-04-18 Thread Michael McCandless
1.7u55 should be safe for ElasticSearch; we just put out a blog post about this: http://www.elasticsearch.org/blog/java-1-7u55-safe-use-elasticsearch-lucene/ And I'll fix the nightly Lucene benchmarks to use u55 too! I should NOT have been using u40: it's not safe. Mike

Re: Solr SearchComponent-like functionality?

2014-04-18 Thread Srinivasan Ramaswamy
Thats great, thanks for your reply. This looks like a good solution for my requirement ! Is this script applied in each shard ? I want to apply this function to all the documents so that the Top N picked from each shard is picked by my custom score. Also, can you elaborate a little bit on be

Re: Elasticsearch on java7u55 ?

2014-04-18 Thread Lukáš Vlček
Excellent, thanks Michael. Dne 18.4.2014 18:18 Michael McCandless m...@mikemccandless.com napsal(a): 1.7u55 should be safe for ElasticSearch; we just put out a blog post about this: http://www.elasticsearch.org/blog/java-1-7u55-safe-use-elasticsearch-lucene/ And I'll fix the nightly Lucene

Error installing ldap river plugin

2014-04-18 Thread Tom Wilson
I'm completely new to elasticsearch and am trying to put together a proof-of-concept using LDAP as a data store. However, I came across a problem right out of the starting gate, attempting to install the ldap river plugin, according to the instructions here:

Re: searching most recent objects

2014-04-18 Thread Phil Greenberg
Oh, awesome, thank you so much for the help, I'll give that a try! On Thursday, April 17, 2014 2:51:23 PM UTC-7, Itamar Syn-Hershko wrote: For recent X just sort on the _timestamp field and specify X as the page size

Cache cleaner in hot threads

2014-04-18 Thread Nikolas Everett
I'm still doing performance work and I keep seeing the CacheCleaner pop up [1]. I don't know how much of an effect its actually having, but I imagine its something. It looks like entries in the cache get queued for deletion both by cache clear commands and by readers closing. Would it make

Re: Solr SearchComponent-like functionality?

2014-04-18 Thread Matt Weber
Well, the scripts runs against all matching documents of the query so you can do a match_all query [1] to have the logic applied to all your documents. This is going to be expensive though, so try to filter out as many documents as possible before applying the custom scoring. Maybe even perform

Switching back to ConcurrentMergeScheduler

2014-04-18 Thread David Smith
I see that ES switch back to ConcurrentMergeScheduler in 1.1.1 due to it affecting indexing performance in 1.1.0. https://github.com/elasticsearch/elasticsearch/issues/5817 We're on 1.1.0 and cannot upgrade to 1.1.1 for the time being. Is there a way to switch it back using the API? I tried the

Re: Function Score Query and Native scripts

2014-04-18 Thread David Smith
Yes, function score query works with native scripts. We use it with them. I'm not sure whether native scripts are automatically cached. On Saturday, April 12, 2014 1:49:32 PM UTC-4, Eric T wrote: Hi, The function score documentation doesn't mention any support for native scripts, does it

Re: Function Score Query and Native scripts

2014-04-18 Thread David Smith
You can use a function score query with a native script in this manner. { function_score : { query : { match_all : { } }, functions : [ { filter : { terms : { myfield : [ 103, 104, 134, 180 ], _cache : true } },

Query and Filter

2014-04-18 Thread Matt Hughes
Trying to compose a query and filter combination to no avail: { from:0, size:200, query:{ filtered:{ query:{ query_string:{ fields:[ _all ], query:\Test message\ } },

Re: Filter first then search

2014-04-18 Thread David Smith
I'm also curious to know if there is way to do the opposite of FilteredQuery... basically QueriedFilter. Filter first and then run a query on the filtered results. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group

Re: Query and Filter

2014-04-18 Thread Matt Weber
Chances are your appId and processId fields are analyzed so it is breaking up the id's. Update your mapping of these fields so it is not analyzed [1]. Also, you should not use an and filter to combine term filters. Use a boolean filter [2] with must clauses for better performance. Read why at

ANN Elastisch 2.0.0-beta4 is released

2014-04-18 Thread Michael Klishin
Elastisch [1] is a small, feature complete Clojure client for ElasticSearch. Release notes: http://blog.clojurewerkz.org/blog/2014/04/11/elastisch-2-dot-0-0-beta4-is-released/ 1. http://clojureelasticsearch.info -- MK http://github.com/michaelklishin http://twitter.com/michaelklishin -- You

Continuous async replication

2014-04-18 Thread Mohit Anchlia
As I understand there is currently no feature that does async replication between 2 clusters or even within the same cluster, but we have a need to write one. What would be the best way to do it in elasticsearch? I was thinking of leveraging Scroll for this. -- You received this message because

Testing for an Empty String

2014-04-18 Thread Paul
Hi, Thanks for everyone's patience while I learn the elasticsearch query DSL. I'm trying to get used to its verbosity. How would I do a query like this, again in SQL parlance: select col1 from mysource where col2 = ? -- You received this message because you are subscribed to the Google

Splunk vs. Elastic search performance?

2014-04-18 Thread Frank Flynn
We have a large Splunk instance. We load about 1.25 Tb of logs a day. We have about 1,300 loaders (servers that collect and load logs - they may do other things too). As I look at Elasticsearch / Logstash / Kibana does anyone know of a performance comparison guide? Should I expect to run on

Re: Query and Filter

2014-04-18 Thread Matt Hughes
Thanks for the quick reply! I updated the mappings and confirmed both types read not_analyzed. I also updated the query to use bool/must: { from:0, size:200, query:{ filtered:{ query:{ query_string:{ fields:[ _all

Testing for an Empty String With the Following

2014-04-18 Thread Paul
Hi, Thanks for everyone's patience while I learn the elasticsearch query DSL. I'm trying to get used to its verbosity. How would I do a query like this, again in SQL parlance: select col1 from mysource where col2 = and col3 in [, one, two] and col4 = foo -- You received this message

Re: Splunk vs. Elastic search performance?

2014-04-18 Thread Mark Walkom
That's a lot of data! I don't know of any installations that big but someone else might. What sort of infrastructure are you running splunk on now, what's your current and expected retention? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web:

LDAP plugin not populating

2014-04-18 Thread Tom Wilson
I'm trying to set up search of LDAP objects using the ldap river plugin. I managed to install the plugin and set up my new river, but all searches are coming up empty. The elasticsearch stdout says: [2014-04-18 15:00:16,904][INFO ][river.ldap ] [Silver Scorpion] [ldap][hpd] now,

Re: ELK stack needs tuning

2014-04-18 Thread Mark Walkom
If you want unlimited retention you're going to have to keep adding more nodes to the cluster to deal with it. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 17 April 2014 22:48, R. Toma renzo.t...@gmail.com wrote:

Re: Error installing ldap river plugin

2014-04-18 Thread Tom Wilson
I was able to install the plugin by building it from source locally and specifying the JAR file. -tom On Friday, April 18, 2014 10:50:54 AM UTC-7, Tom Wilson wrote: I'm completely new to elasticsearch and am trying to put together a proof-of-concept using LDAP as a data store. However, I

Re: Query and Filter

2014-04-18 Thread Matt Weber
Did you reindex your docs after updating the mapping? Can you post your mapping and original docs? On Friday, April 18, 2014, Matt Hughes hughes.m...@gmail.com wrote: Thanks for the quick reply! I updated the mappings and confirmed both types read not_analyzed. I also updated the query to

Re: Query and Filter

2014-04-18 Thread Matt Hughes
Nevermind. It was an error on my part; these changes worked. Thanks again! On Friday, April 18, 2014 5:51:31 PM UTC-4, Matt Hughes wrote: Thanks for the quick reply! I updated the mappings and confirmed both types read not_analyzed. I also updated the query to use bool/must: {

Re: elasticsearch 1.1.1 initialization failed

2014-04-18 Thread Eric Jain
This issue has been resolved with cloud-aws 2.1.1: https://github.com/elasticsearch/elasticsearch-cloud-aws/issues/74 On Thursday, April 17, 2014 6:32:05 PM UTC-7, Eric Jain wrote: Just tried to upgrade elasticsearch 1.1.0 to 1.1.1 (with the cloud-aws plugin 2.1.0), and am no longer able

Problem of Term Suggester

2014-04-18 Thread le trung Trung
I have a problem with term suggester. I dont know what was happening. All friends, plz help me to explain it. I have two 3 documents: [doc1:{content: Anh yêu ta},doc2:{content:Anh yêu ta}, doc3:Anh yêu tí] (content was indexed with vi_annalyzer) I using term suggester as: SuggestionBuilder

Re: Splunk vs. Elastic search performance?

2014-04-18 Thread Greg Murnane
I'm running elasticsearch much smaller than this, but with a PowerEdge R900 with 2 X7350 CPUs, and 64 GB of RAM (24GB heap for elasticsearch) I'm able to sustain something like 80GB per day (1/16 your volume). Some of the latest Intel CPUs are about 4 times as powerful as the X7350, so

Re: Splunk vs. Elastic search performance?

2014-04-18 Thread 熊贻青
We have a cluster with 10 nodes, 48g heap for each ES process. The total indexing rate is about 25000 doc per second, about 20 indices actively receiving new data. I'm really courious to compare and evaluate the indexing performance numers. Thanks! -- You received this message because you are