Thanks All for your suggestions!
Rgds,
Mark.
On Thu, Feb 11, 2016 at 9:45 AM, Upayavira wrote:
> Your biggest issue here is likely to be http connections. Making an HTTP
> connection to Solr is way more expensive than the ask of adding a single
> document to the index. If you are expecting to a
Your biggest issue here is likely to be http connections. Making an HTTP
connection to Solr is way more expensive than the ask of adding a single
document to the index. If you are expecting to add 24 billion docs per
day, I'd suggest that somehow merging those documents into batches
before sending
Hi Mark,
Nothing comes for free :) With doc per action, you will have to handle
large number of docs. There is hard limit for number of docs per shard -
it is ~4 billion (size of int) so sharding is mandatory. It is most
likely that you will have to have more than one collection. Depending on
Thanks everyone for your suggestions.
Based on it I am planning to have one doc per event with sessionId common.
So in this case hopefully indexing each doc as and when it comes would be
okay? Or do we still need to batch and index to Solr?
Also with 4M sessions a day with about 6000 docs (events
Thanks everyone for your suggestions.
Based on it I am planning to have a doc per event.
On Wed, Feb 10, 2016 at 3:38 AM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:
> Hi Mark,
> Appending session actions just to be able to return more than one session
> without retrieving large numb
Hi Mark,
Appending session actions just to be able to return more than one
session without retrieving large number of results is not good tradeoff.
Like Upayavira suggested, you should consider storing one action per doc
and aggregate on read time or push to Solr once session ends and
aggregat
Thanks for your replies and suggestions!
Why I store all events related to a session under one doc?
Each session can have about 500 total entries (events) corresponding to it.
So when I try to retrieve a session's info it can back with around 500
records. If it is this compounded one doc per sessi
So as I understand your use case, its effectively logging actions within a
user session, why do you have to do the update in NRT? Why not just log
all the user session events (with some unique key, and ensuring the session
Id is in the document somewhere), then when you want to do the query, you
j
Bear in mind that Lucene is optimised towards high read lower write.
That is, it puts in a lot of effort at write time to make reading
efficient. It sounds like you are going to be doing far more writing
than reading, and I wonder whether you are necessarily choosing the
right tool for the job.
Ho
Hi,
Thanks for all your suggestions. I took some time to get the details to be
more accurate. Please find what I have gathered:-
My data being indexed is something like this.
I am basically capturing all data related to a user session.
Inside a session I have categorized my actions like actionA, a
Oops... at 100 qps for a single node you would need 120 nodes to get to 12K
qps and 800 nodes to get 80K qps, but that is just an extremely rough
ballpark estimate, not some precise and firm number. And that's if all the
queries can be evenly distributed throughout the cluster and don't require
fan
So is there any aging or TTL (in database terminology) of older docs?
And do all of your queries need to query all of the older documents all of
the time or is there a clear hierarchy of querying for aged documents, like
past 24-hours vs. past week vs. past year vs. older than a year? Sure, you
ca
Short form: You really have to prototype. Here's the long form:
https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
I've seen between 20M and 200M docs fit on a single piece of hardware,
so you'll absolutely have to shard.
And the other th
Hi Mark,
Can you give us bit more details: size of docs, query types, are docs
grouped somehow, are they time sensitive, will they update or it is
rebuild every time, etc.
Thanks,
Emir
On 08.02.2016 16:56, Mark Robinson wrote:
Hi,
We have a requirement where we would need to index around 2 B
Also if you are expecting indexing of 2 billion docs as NRT or if it will
be offline (during off hours etc). For more accurate sizing you may also
want to index say 10 million documents which may give you idea how much is
your index size and then use that for extrapolation to come up with memory
r
Hi,
We have a requirement where we would need to index around 2 Billion docs in
a day.
The queries against this indexed data set can be around 80K queries per
second during peak time and during non peak hours around 12K queries per
second.
Can Solr realize this huge volumes.
If so, assuming we ha
I need some recommendations for a new SOLR project.
We currently have a large (200M docs) production system using Lucene.Net and
what I would call our own .NET implementation of SOLR (built early on when SOLR
was less mature and did not run as well on Windows).
Our current architecture works
Very cool! "The Life Cycle of the IndexSearcher" would also be a great
diagram. The whole dance that happens during a commit is hard to
explain. Also, it would help show why garbage collection can act up
around commits.
Lance
On Sun, Apr 10, 2011 at 2:05 AM, Jan Høydahl wrote:
>> Looks really go
> Looks really good, but two bits that i think might confuse people are
> the implications that a "Query Parser" then invokes a series of search
> components; and that "analysis" (and the pieces of an analyzer chain)
> are what to lookups in the underlying lucene index.
>
> the first might just
: of the components as well as the flow of data and queries. The result is
: a conceptual architecture diagram, clearly showing how Solr relates to
: the app-server, how cores relate to a Solr instance, how documents enter
: through an UpdateRequestHandler, through an UpdateChain and Analysis a
Hi,
Thank you for this contribution. Such a diagram could be useful in the
official documentation.
David
On Thu, Apr 7, 2011 at 12:15 PM, Jeffrey Chang wrote:
> This is awesome; thank you!
>
> On Thu, Apr 7, 2011 at 6:09 PM, Jan Høydahl wrote:
>
> > Hi,
> >
> > Glad you liked it. You'd like t
This is awesome; thank you!
On Thu, Apr 7, 2011 at 6:09 PM, Jan Høydahl wrote:
> Hi,
>
> Glad you liked it. You'd like to model the inner architecture of SolrJ as
> well, do you? Perhaps that should be a separate diagram.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.comin
Hi,
Glad you liked it. You'd like to model the inner architecture of SolrJ as well,
do you? Perhaps that should be a separate diagram.
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
On 6. apr. 2011, at 12.06, Stevo Slavić wrote:
> Nice, thank you!
>
> Wish there wa
Nice, thank you!
Wish there was something similar or extra to this one depicting where
do SolrJ's CommonsHttpSolrServer and EmbeddedSolrServer fit in.
Regards,
Stevo.
On Wed, Apr 6, 2011 at 11:44 AM, Jan Høydahl wrote:
> Hi,
>
> At Cominvent we've often had the need to visualize the internal ar
Hi,
At Cominvent we've often had the need to visualize the internal architecture of
Apache Solr in order to explain both the relationships of the components as
well as the flow of data and queries. The result is a conceptual architecture
diagram, clearly showing how Solr relates to the app-serv
: B- A backup of the current index would be created
: C- Re-Indexing will happen on Master-core2
: D- When Indexing is done, we'll trigger a swap between Master-core1 and
: core2
...
: But how can B,C, and D. I'll do it manually. Wait! I'm not sure my boss will
: pay for that.
: 1/Can I
oes that sound good to you? Or is there a
better and more elegant way to do the trick when indexing and replication
should be beating at a high pace?
Thank you.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-Architecture-discussion-tp825708p860942.html
Sent from the Solr - User mailing list archive at Nabble.com.
optimization only when the replication activity is not so crucial
in order to avoid degrading the search performances.
Thank you very much. That helps a lot.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-Architecture-discussion-tp825708p860767.html
Sent from the Solr
: 4- trigger swap between core 1 and core2
: 5- At this point Slave index has been renewed ... we can revert back to the
: previous index if there was any issues with the new one.
these steps are largely unneccessary -- within a single SolrCore Solr
already keeps track of the "current" searcher
Do you have any insights that could help me and other people that might be
interested in that discussion?
Thanks.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-Architecture-discussion-tp825708p828658.html
Sent from the Solr - User mailing list archive at Nabble.com.
for sharing.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-Architecture-discussion-tp825708p825708.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi all
New to Solr/Lucene. Our current search is done with Verity and we are
looking to move towards open-source products.
Our first application would have less than 500,000 documents indexed at the
outset. Additions/updates to the index would occur at 2,000-3,000 per
minute. We are currently upd
32 matches
Mail list logo