Hi list,
I have an RDD with a field included that contains an ID that I'd like to
become the parent document when I execute saveToEs (all authored in scala).
Something like this...
{
units_sold: 100,
unit_price: 8.99,
revenue: 899,
parentId: binlin\\staglow(L28AF) //i.e. it has
Hi list,
I have an RDD with a field included that contains an ID that I'd like to
become the parent document when I execute saveToEs (all authored in scala).
Something like this...
{
units_sold: 100,
unit_price: 8.99,
revenue: 899,
parentId: maplin\\staging(L28AF) //i.e. it has
I'm not sure whether you have one or multiple questions but it's perfectly fine
to use ES for both storage and search.
You can use HDFS as a snapshot/backup store to further improve the resilience
of your system.
Millions of documents is not an issue for ES
On 1/29/15 4:29 PM, Manoj Singh
Hi all,
i need to write a search query which collects documents from 3 types in my
index.
basically i would use a multi_match like
{
multi_match : {
query:SearchQuery,
fields: [ account.name, group.title, post.content ]
}
}
but the result of this query needs to be filtered
Hi all,
This is my first post as I'm relatively new to ElasticSearch, Logstash,
Kibana etc. and I'm really enjoying the challenge of learning it all and
applying it!
I'm reasonably familiar with basic aggregations now, but I'm trying to
produce a particular report from an index and I would
Hi:
The online documentation for Elasticsearch on yarn (version 2.1.0-Beta)
indicates that ... Elasticsearch on YARN is a separate, stand-alone,
self-container CLI (command-line interface)...
Does this mean that this instance of Elasticsearch will only be accessible
via CLI ? (curl commands
Hi all,
I'm running a standalone node (by using *node.local: true* in
elasticsearch.yml) and want to connect to this node via TransportClient
which fails.
Connecting to the node via Sense succeeds.
I didn't change the cluster.name property in elasticsearch.yml.
My Code is:
Client client =
Hi Mark,
Right now. 28 GB across two indices 5 shards 1 replica per index on 3 AWS
large servers.
Frequently 1-10 million records or more get imported. During this time all
ES nodes hit a CPU usage of over 75%. We want to break the index down and
add routing at some point.
Refresh is
Shay twitted this about this matter:
https://twitter.com/kimchy/status/560124652472008704
https://twitter.com/kimchy/status/560124652472008704
Shay Banon @kimchy https://twitter.com/kimchyFollow
https://twitter.com/kimchy
@m_hughes https://twitter.com/m_hughes yes, it affects performance,
Well...this is hardly a satisfactory answer. Of course I expect a slowdown
because encryption takes down. But how much, and what data does shield
encrypt (e.g. only the initial authentication step or every bit of
communication)? For example, I would not be surprised if Shield does the
simplest
Hi,
Can anyone shed some light on the impact of Shield on performance, assuming
that secured communication is enabled for node to node communication?
When Elasticsearch team says that node-to-node encryption is enabled, does
it mean that every bit of data transported on port 9300 is encrypted?
Hi there
I have two different types of data, for one type , I dont want it to be
tokenized, so I write the config file elasticsearch.yml like this:
index.analysis.analyzer.default:
type: custom
tokenizer: keyword
filter: standard
But, for other type of data, I want it be tokenized by
Just an idea.
You could try running two ES instances as a cluster on one machine if there
is no other option.
On Wednesday, January 28, 2015 at 2:09:22 PM UTC+1, Oto Iashvili wrote:
Hi
I have a website for classified. For this I'm using elasticsearch,
postgres and rails on a same ubuntu
You should be using the bulk API, that's what it exists for!
On 29 January 2015 at 19:13, webish greg...@yoursports.com wrote:
Hi Mark,
Right now. 28 GB across two indices 5 shards 1 replica per index on 3 AWS
large servers.
Frequently 1-10 million records or more get imported. During
Can anyone help me on this problem, please !
--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/not-able-to-refine-from-o-p-of-query-in-logstash-tp4069573p4069775.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.
--
You received this
why not ? Could u tell me how to do such ? and also explain why will it be
better ?
thanks a lot for your help
On Thursday, January 29, 2015 at 10:02:00 AM UTC+1, Arie wrote:
Just an idea.
You could try running two ES instances as a cluster on one machine if
there is no other option.
On
Don't tell me nobody here ever made such a simple request?
On Thursday, January 22, 2015 at 11:57:26 AM UTC+1, Aldian wrote:
Hi!
I am using the usual ELK stack with the default template (
http://pastebin.com/DtYiazVr
On Thursday, January 22, 2015 at 11:57 CET,
Aldian aldian...@gmail.com wrote:
I am using the usual ELK stack with the default template
([1]http://pastebin.com/DtYiazVr). In every log message, the date in
stored in field named log_date, which the date filter converts in a
@timestamp. I
On Thursday, January 29, 2015 at 06:51 CET, ma...@venusgeo.com wrote:
Can anyone please look into this.
This is a volunteer-based mailing list. If want a 24-hour SLA there are
paid options for that.
On Wednesday, January 28, 2015 at 5:43:23 AM UTC-8, ma...@venusgeo.com
wrote:
I don't
What about not setting node.local: true?
--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 29 janv. 2015 à 14:18, Abid Hussain huss...@novacom.mygbiz.com a écrit :
After doing some research, It seems to me that it is not possible to connect
to a node configured as
Thanks for help. As you can see in the original quesion above, I already
tried setting node.local: true.
This works on server side, but I'm not able to connect to the node via
TransportClient using the Java API.
My requirements are:
* run elasticsearch as single node
* Use Java API to perform
Hi there,
I need to search in multiple fields where I do not know field names in
advance, so I can't use multi_match syntax. So I found, that _all field
aggregates all fields set to be included in all in mapping. Unfortunately
it returns different result set, that multi_match. Here is complete
Hi,
I was trying the settings to block data writes and also metadat writes to
an index applying this:
curl -XPUT 'http://testserver:9200/test_index/_settings' -d '{
index.blocks.read_only : true
}'
which works fine but now I would like to remove this index and I'm facing
this issue;
I am having an issue with queries in Kibana. It seems it is not searching
all the fields. I have to specify id:1 or something similar to actually get
any results. I am trying to figure out what configuration would cause this
to happen? Would it have anything to do with the Elasticsearch
Thank you for the good news! I'm a little swamped currently, but I will
definitely give it a try when I get a minute.
Just to make sure - disable Output cache for the website - where is it in
IIS Management Console?
On Wednesday, January 28, 2015 at 4:38:01 PM UTC-8, Cijo Thomas wrote:
Its
I have been fighting with this for quite some time, Finally found the
workaround. Let me know if it helps you!
On Thu, Jan 29, 2015 at 10:12 AM, Konstantin Erman kon...@gmail.com wrote:
Thank you for the good news! I'm a little swamped currently, but I will
definitely give it a try when I get
Hi,
I have one question related to performance of ES with Hadoop.
Our Architecture:
1) use hadoop for storage big data as we have millions of data.
2) Feed to ES from Hadoop via API.
3) Search will work through ES.
Will this architecture have performance issue ?
OR We simple use ES for
Hi David
We are aware of scroll API, and are not using it as it will not scale.
That is the very reason I was stressing the fact that there is no
update/delete/create; as with multiple queries all bets are off if any of
this thing happen.
However with steady state)no change in data) I would
Thanks to both, David and Jürgen.
I used Davids solution which works well for know and keep in mind Jürgens
proposal for production installation.
Best regards,
Abid
Am Donnerstag, 29. Januar 2015 09:05:15 UTC+1 schrieb Abid Hussain:
Hi all,
I'm running a standalone node (by using
So disable multicast and you are done.
See elasticsearch.yml file comments.
--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 29 janv. 2015 à 14:56, Abid Hussain huss...@novacom.mygbiz.com a écrit :
Sorry, I overread the not in you post ;-)
Removing node.local:true
Sorry, I overread the not in you post ;-)
Removing *node.local:true* works in terms that I am then able to connect to
node via TransportClient.
The reason for using *node:local:true *is that *I want to run several
independent nodes in my network that do not communicate with each other.*
...?
Hello Abid,
you may bind the Elasticsearch network/transport interface to
127.0.0.1, prohibiting any connections from outside the local machine.
This will effectively give you a fully-functional local node with
transport connections enabled locally - not over the network from other
machines.
I'm curious about reaching deeper into the lucene internals with es-hadoop,
in a similar way that the aggregations module works. While aggregations
are amazing, there are cases where they aren't an ideal solution, mainly
due to the inability to shuffle/repartition the data as it moves through
What es-hadoop/spark are you using? Can you post snippet/gist on how you are calling saveToEs and what the Es-spark
configuration looks like (does the RDD contain JSON or rich objects, etc..)?
There are multiple ways to specify the parentId and in master (dev build) this
should work no problem.
What's the best way to change the TTL of all documents already written in
an index? Can up just update the TTL or do I have to re-index everything?
I was thinking that if I have to update the TTL often maybe I just write a
manual garbage collector and do my own cleanup.
--
You received this
Hi Jorge
The `doc` should be passed in the `body` parameter:
$e-update(
index = 'myindex',
type = 'mytype',
id= mykey,
body = {
doc = {
link = http://www.nw-kicoso.com;,
sortierung = 5
}
}
);
On 8 December 2014 at 10:29, Jorge von Rudno
I can't RTFM on this because I can't find the documentation.
It looks like some of our queries are taking about 1 second per index shard
per index.
However, the drives are still have low utilization. Around 10% ... so I'm
trying to figure out how to improve performance. My hunch is that I
Unfortunately I could not replicate your success :-(
Let me show you what I did in case you may be notice any difference from
your case.
https://lh6.googleusercontent.com/-HzQRKhGl9ag/VMqfkWnSF8I/Ah0/SsXrJlQ2vW8/s1600/Output_Caching.png
I'm using ES-Hadoop 1.2.0.Beta3 Spark variant with Scala 2.10.4 and Spark
1.1.0 Hadoop 2.4 (but without an actual Hadoop installation - I'm running
on Windows).
I'm working with a Map-based RDD rather than json.
https://gist.github.com/andrassy/273179ed7cb01a38973d is a short example
that
Each shard is queried in parallel.
But if you don't have enough threads to query multiple shards at once, then
it's not the strict definition of parallel as it has to context switch.
On 30 January 2015 at 11:05, Kevin Burton burtona...@gmail.com wrote:
Ha. I appreciate the feedback but this
Can you show your URL rewrite rules ? Also are you using Kibana 4 beta 3 ?
On Thu, Jan 29, 2015 at 1:09 PM, Konstantin Erman kon...@gmail.com wrote:
Unfortunately I could not replicate your success :-(
Let me show you what I did in case you may be notice any difference from
your case.
Ha. I appreciate the feedback but this doesn't answer my question.
Does it query them sequentially or in parallel.
Using parallel dispatch can dramatically improve performance so I'm trying
to track down how this works.
and I'm aware that the documentation was there, but I couldn't find
I assume you mean hardware threads? What I want to avoid is a
configuration setting. I want all the shards to execute in parallel. Not
totally concerned about the physical hardware mapping as in practice this
will be a few hundred nanoseconds :-P
On Thursday, January 29, 2015 at 4:09:15 PM
Yes, Kibana 4 beta 3. And I have just one URL rewrite rule (pictured).
Were you getting the same error when it was not working for you?
https://lh3.googleusercontent.com/-oDiu_ncjJlA/VMrEJL-Qj_I/Aic/so2IvrgTQbY/s1600/RewriteRule.png
On Thursday, January 29, 2015 at 3:31:56 PM UTC-8,
I suggest trying master (the dev build - see the docs for more
information[1]). You should not have to use the JSON format. By the way,
one addition in master is that you can use case classes instead of Maps and
es-spark will know how to serialize them.
That plus having the metadata separated from
Hi ,
am ingesting 6 million docs in elastic search,after 2.8 million docs
ingested head show unkown for size and no of docs for the index .
Any idea ? Any way I can use this index ?
Then each is queried in parallel.
On 30 January 2015 at 11:18, Kevin Burton burtona...@gmail.com wrote:
I assume you mean hardware threads? What I want to avoid is a
configuration setting. I want all the shards to execute in parallel. Not
totally concerned about the physical hardware mapping
Got it going as a service... ugh.. the user I was using didn't have rights
to run a service. Had to do it in the services.msc instead of service
manager.
On Thursday, January 29, 2015 at 10:09:46 PM UTC-5, GWired wrote:
I've been messing with things on Host2 and it will no longer Start as a
Hello ,
You need to use a bool query
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
or a filtered query
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html
for this purpose.
In bool , you can mix and
Hi All,
I need to implement best bets using elastic search. Where in few results
will be ranked and displayed at top depending upon the keyword searched by
user.
Please let me know if such implementation is possible using elasticsearch.
If yes, any link/white paper/information on this would
Thanks, that did the trick :)
Radim
Dne pondělí 26. ledna 2015 10:02:24 UTC+1 David Pilato napsal(a):
Can you use a filter agg?
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filter-aggregation.html
David
Le 26 janv. 2015 à 09:46, Radim
Hi Mark,
Thanks for the reply. I will definitely try adding timestamp in the
mapping, as discussed
here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-timestamp-field.html.
It seems that logstash will also generate a default @timestamp, if there is
no
Just set discovery.zen.ping.unicast.hosts: [host1.mydomain.com,
host2.mydomain.com] on both hosts, unless you are changing the port it
will use the default.
Also, cluster.name needs to be exactly the same on both hosts.
On 30 January 2015 at 14:35, GWired garrettcjohn...@gmail.com wrote:
Got
53 matches
Mail list logo