Hello!
Please look at the attachment plugin for Elasticsearch:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-attachment-type.html
It uses Apache Tika under the hood. The list of supported formats is
available here: http://tika.apache.org/0.10/formats.html
--
Hi ,
If I am not wrong you are talking about
https://github.com/elasticsearch/elasticsearch-mapper-attachments
https://github.com/elasticsearch/elasticsearch-mapper-attachments
So in this I can index the attachments(say pdf file) and that will be stored
as base64 encoding. So is this plugin
Hello!
You'll need to send the file contents to Elasticsearch in base64 form
and Elasticsearch will use Tika to extract data from the file.
However, in typical case, you would rather store, not the whole data
of the binary file (as it can be quite big), but rather a path to the
file, so that the
So can I say that the mapper-attachment plugin is made to work like below:
Whether I am sending text file or pdf file or image file to ES , the plugin
will extract the *text content* in all three scenarios and will store it
into the ES and then it will be available for search as well?
--
View
I would like to influence the ranking with few fields that are not stored
in the index (eg click data for keyword-documents). I have used custom
SearchComponent in Solr to implement similar functionality in the past. I
am wondering how can i achieve the same in ElasticSearch.
I know this
No one ?
:( I keep trying but there's always a tool that does not work :/
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
Hello!
The attachment plugin will use Tika to extract the text from binary
file content that you send in the base64. Tika does a good job with
text extraction, however you have to test it yourself, if your files
are parsed well enough for your use case.
--
Regards,
Rafał Kuć
Performance
Hi Jörg,
Thank you for pointing me to this article. I needed to read it twice, but I
think I understand it now.
I believe shard overallocating works for use-cases where you want to store
search 'users' or 'products'. Such data allows you to divide all
documents into groups to be stored in
Heya,
We are pleased to announce the release of the Elasticsearch AWS cloud plugin,
version 2.1.1.
The Amazon Web Service (AWS) Cloud plugin allows to use AWS API for the unicast
discovery mechanism and add S3 repositories..
https://github.com/elasticsearch/elasticsearch-cloud-aws/
Release
You're setting the size parameter to 0 in your queries so it won't return
anything. Also, you need to have an copy of the URL value in your index
that is not analyzed which you can use for your wildcard query. In your
mapping you need to specify that you want to index the URL value verbatim:
Hiya
It's a bit more verbose, but yes you can do queries like that easily. I've
assumed that all of your fields are exact value not_analyzed string
fields, rather than full text fields:
GET /_search
{
_source: [ col1, col2 ],
query: {
filtered: {
filter: {
bool: {
I'm planning on trying out multiple nodes on one host and I'd like to be able
to control the node id but as far as I can see this is set in NodeEnvironment
to the first unused value. The reason for setting the id is so that I would
like to include it in the node name which I currently set to
Hi,
I'm new to ElasticSearch.
What I want to do is to upload a few hundred documents and then look for
words in those documents.
The most important part is to get the count of the each word per document.
e.g. If I look for the word boy, the answer I'll get is that it appears 3
times in
Yes, take a look here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-termvectors.html
--
Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer Consultant
Author of RavenDB in Action http://manning.com/synhershko/
Thanks Itamar.
But with the Term Vector I'll have to make a separate call for each
document (I can have up to 20K documents).
I want to be able to make a single call with the word I'm looking for and
to get the statistics for each document.
On Friday, April 18, 2014 2:52:53 PM UTC+3, Aharon
You should be able to do this using the aggregations framework:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations.html
The idea is that you bucket on document ID, and then on terms, then do a
count
But I'm not sure it was designed to handle this scenario,
Hi,
is anybody using Oracle Java 1.7.0_55 with Elasticsearch (v0.90.5)? Is it
safe and recommended?
I found Robert and Uwe discussed this Java version here:
http://lucene.472066.n3.nabble.com/Update-lucene-apache-org-java-recommendations-with-java7u55-td4131353.html
I found couple of failed
Ok new try. Is it general possible to do this with the PHP API, i dont find
nothing in the docu. Maybe i dont see it. Regards Stefan
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from
will these two links help?
https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/SYSTEM_REQUIREMENTS.txt
http://people.apache.org/~mikemccand/lucenebench/indexing.html
lucene performance test is using java 1.70 u40. that's the same version i'm
using for lucene 4.6.0.
jason
On Fri, Apr 18,
Do you have check permission on /opt/logstash and /var/log/logstash
/etc/logstash … same user than in the init script ?
Solve this for me on debian but i can't get event when apache log is
update. than if i run it in root (console way) all is working …
Ho and i have add logstash user to adm
Yes, you can use the Function Score Query [1] in combination with a native
script written in java [2]. With the native script you can basically do
whatever you want, but be careful you can significantly impact your query
performance if you are not careful.
[1]
I would like to get a phrase count for every document.
I do not wish to run a query for every document, i would rather run one
single query.
For example if i have the following documents:
{
name : John,
Message : The lion is *very *fast
}
{
name : Ben,
Message : The
I would like to get a phrase count for each document separately.
I do not wish to run a query for every document, i would rather run one
single query.
For example if i have the following documents:
{
name : John,
message : The lion is *very **fast*
}
{
name : Ben,
1.7u55 should be safe for ElasticSearch; we just put out a blog post about
this:
http://www.elasticsearch.org/blog/java-1-7u55-safe-use-elasticsearch-lucene/
And I'll fix the nightly Lucene benchmarks to use u55 too! I should NOT
have been using u40: it's not safe.
Mike
Thats great, thanks for your reply. This looks like a good solution for my
requirement ! Is this script applied in each shard ? I want to apply this
function to all the documents so that the Top N picked from each shard is
picked by my custom score.
Also, can you elaborate a little bit on be
Excellent, thanks Michael.
Dne 18.4.2014 18:18 Michael McCandless m...@mikemccandless.com
napsal(a):
1.7u55 should be safe for ElasticSearch; we just put out a blog post about
this:
http://www.elasticsearch.org/blog/java-1-7u55-safe-use-elasticsearch-lucene/
And I'll fix the nightly Lucene
I'm completely new to elasticsearch and am trying to put together a
proof-of-concept using LDAP as a data store.
However, I came across a problem right out of the starting gate, attempting
to install the ldap river plugin, according to the instructions here:
Oh, awesome, thank you so much for the help, I'll give that a try!
On Thursday, April 17, 2014 2:51:23 PM UTC-7, Itamar Syn-Hershko wrote:
For recent X just sort on the _timestamp field and specify X as the page
size
I'm still doing performance work and I keep seeing the CacheCleaner pop up
[1]. I don't know how much of an effect its actually having, but I imagine
its something.
It looks like entries in the cache get queued for deletion both by cache
clear commands and by readers closing. Would it make
Well, the scripts runs against all matching documents of the query so you
can do a match_all query [1] to have the logic applied to all your
documents. This is going to be expensive though, so try to filter out as
many documents as possible before applying the custom scoring. Maybe even
perform
I see that ES switch back to ConcurrentMergeScheduler in 1.1.1 due to it
affecting indexing performance in 1.1.0.
https://github.com/elasticsearch/elasticsearch/issues/5817
We're on 1.1.0 and cannot upgrade to 1.1.1 for the time being. Is there a
way to switch it back using the API? I tried the
Yes, function score query works with native scripts. We use it with them.
I'm not sure whether native scripts are automatically cached.
On Saturday, April 12, 2014 1:49:32 PM UTC-4, Eric T wrote:
Hi,
The function score documentation doesn't mention any support for native
scripts, does it
You can use a function score query with a native script in this manner.
{
function_score : {
query : {
match_all : { }
},
functions : [ {
filter : {
terms : {
myfield : [ 103, 104, 134, 180 ],
_cache : true
}
},
Trying to compose a query and filter combination to no avail:
{
from:0,
size:200,
query:{
filtered:{
query:{
query_string:{
fields:[
_all
],
query:\Test message\
}
},
I'm also curious to know if there is way to do the opposite of
FilteredQuery... basically QueriedFilter. Filter first and then run a query
on the filtered results.
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group
Chances are your appId and processId fields are analyzed so it is breaking
up the id's. Update your mapping of these fields so it is not analyzed
[1]. Also, you should not use an and filter to combine term filters.
Use a boolean filter [2] with must clauses for better performance. Read
why at
Elastisch [1] is a small, feature complete Clojure client for ElasticSearch.
Release notes:
http://blog.clojurewerkz.org/blog/2014/04/11/elastisch-2-dot-0-0-beta4-is-released/
1. http://clojureelasticsearch.info
--
MK
http://github.com/michaelklishin
http://twitter.com/michaelklishin
--
You
As I understand there is currently no feature that does async replication
between 2 clusters or even within the same cluster, but we have a need to
write one. What would be the best way to do it in elasticsearch? I was
thinking of leveraging Scroll for this.
--
You received this message because
Hi,
Thanks for everyone's patience while I learn the elasticsearch query DSL.
I'm trying to get used to its verbosity.
How would I do a query like this, again in SQL parlance: select col1 from
mysource where col2 = ?
--
You received this message because you are subscribed to the Google
We have a large Splunk instance. We load about 1.25 Tb of logs a day. We
have about 1,300 loaders (servers that collect and load logs - they may do
other things too).
As I look at Elasticsearch / Logstash / Kibana does anyone know of a
performance comparison guide? Should I expect to run on
Thanks for the quick reply!
I updated the mappings and confirmed both types read not_analyzed. I also
updated the query to use bool/must:
{
from:0,
size:200,
query:{
filtered:{
query:{
query_string:{
fields:[
_all
Hi,
Thanks for everyone's patience while I learn the elasticsearch query DSL.
I'm trying to get used to its verbosity.
How would I do a query like this, again in SQL parlance: select col1 from
mysource where col2 = and col3 in [, one, two] and col4 = foo
--
You received this message
That's a lot of data! I don't know of any installations that big but
someone else might.
What sort of infrastructure are you running splunk on now, what's your
current and expected retention?
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web:
I'm trying to set up search of LDAP objects using the ldap river plugin. I
managed to install the plugin and set up my new river, but all searches are
coming up empty. The elasticsearch stdout says:
[2014-04-18 15:00:16,904][INFO ][river.ldap ] [Silver
Scorpion] [ldap][hpd] now,
If you want unlimited retention you're going to have to keep adding more
nodes to the cluster to deal with it.
Regards,
Mark Walkom
Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com
On 17 April 2014 22:48, R. Toma renzo.t...@gmail.com wrote:
I was able to install the plugin by building it from source locally and
specifying the JAR file.
-tom
On Friday, April 18, 2014 10:50:54 AM UTC-7, Tom Wilson wrote:
I'm completely new to elasticsearch and am trying to put together a
proof-of-concept using LDAP as a data store.
However, I
Did you reindex your docs after updating the mapping? Can you post your
mapping and original docs?
On Friday, April 18, 2014, Matt Hughes hughes.m...@gmail.com wrote:
Thanks for the quick reply!
I updated the mappings and confirmed both types read not_analyzed. I
also updated the query to
Nevermind. It was an error on my part; these changes worked. Thanks again!
On Friday, April 18, 2014 5:51:31 PM UTC-4, Matt Hughes wrote:
Thanks for the quick reply!
I updated the mappings and confirmed both types read not_analyzed. I
also updated the query to use bool/must:
{
This issue has been resolved with cloud-aws 2.1.1:
https://github.com/elasticsearch/elasticsearch-cloud-aws/issues/74
On Thursday, April 17, 2014 6:32:05 PM UTC-7, Eric Jain wrote:
Just tried to upgrade elasticsearch 1.1.0 to 1.1.1 (with the cloud-aws
plugin 2.1.0), and am no longer able
I have a problem with term suggester. I dont know what was happening. All
friends, plz help me to explain it.
I have two 3 documents: [doc1:{content: Anh yêu ta},doc2:{content:Anh
yêu ta}, doc3:Anh yêu tí] (content was indexed with vi_annalyzer)
I using term suggester as: SuggestionBuilder
I'm running elasticsearch much smaller than this, but with a PowerEdge R900
with 2 X7350 CPUs, and 64 GB of RAM (24GB heap for elasticsearch) I'm able
to sustain something like 80GB per day (1/16 your volume). Some of the
latest Intel CPUs are about 4 times as powerful as the X7350, so
We have a cluster with 10 nodes, 48g heap for each ES process. The total
indexing rate is about 25000 doc per second, about 20 indices actively
receiving new data. I'm really courious to compare and evaluate the
indexing performance numers.
Thanks!
--
You received this message because you are
52 matches
Mail list logo