date:20150801

Re: Fast autocomplete for large dataset

2015-08-01 Thread Olivier Austina

Thank you Eric for your replies and the link. Regards Olivier 2015-08-02 3:47 GMT+02:00 Erick Erickson : > Here's some background: > > http://lucidworks.com/blog/solr-suggester/ > > Basically, the limitation is that to build the suggester all docs in > the index need to be read to pull out the

Re: solr multicore vs sharding vs 1 big collection

2015-08-01 Thread Shawn Heisey

On 8/1/2015 6:49 PM, Jay Potharaju wrote: > I currently have a single collection with 40 million documents and index > size of 25 GB. The collections gets updated every n minutes and as a result > the number of deleted documents is constantly growing. The data in the > collection is an amalgamation

Re: Fast autocomplete for large dataset

2015-08-01 Thread Erick Erickson

Here's some background: http://lucidworks.com/blog/solr-suggester/ Basically, the limitation is that to build the suggester all docs in the index need to be read to pull out the stored field and build either the FST or the sidecar Lucene index, which can be a _very_ costly operation (as in minute

Re: solr multicore vs sharding vs 1 big collection

2015-08-01 Thread Erick Erickson

40 million docs isn't really very many by modern standards, although if they're huge documents then that might be an issue. So is this a single shard or multiple shards? If you're really facing performance issues, simply making a new collection with more than one shard (independent of how many rep

solr multicore vs sharding vs 1 big collection

2015-08-01 Thread Jay Potharaju

Hi I currently have a single collection with 40 million documents and index size of 25 GB. The collections gets updated every n minutes and as a result the number of deleted documents is constantly growing. The data in the collection is an amalgamation of more than 1000+ customer records. The numb

Re: Avoid re indexing

2015-08-01 Thread Nagasharath

Yes, shard splitting will only help in managing large clusters and to improve query performance. In my case as index size is fully grown (no capacity to hold in the existing shards) across the collection adding a new shard will help and for which I have to re index. > On 01-Aug-2015, at 6:34 p

Re: Avoid re indexing

2015-08-01 Thread Upayavira

Erm, that doesn't seem to make sense. Seems like you are talking about *merging* shards. Say you had two shards, 3m docs each: shard1: 3m docs shard2: 3m docs If you split shard1, you would have: shard1_0: 1.5m docs shard1_1: 1.5m docs shard2: 3m docs You could, of course, then split shard2. Y

Re: Avoid re indexing

2015-08-01 Thread Nagasharath

If my current shard is holding 3 million documents will the new subshard after splitting also be able to hold 3 million documents? If that is the case After shard splitting the sub shards should hold 6 million documents if a shard is split in to two. Am I right? > On 01-Aug-2015, at 5:43 pm, Upa

Re: Avoid re indexing

2015-08-01 Thread Upayavira

On Sat, Aug 1, 2015, at 11:29 PM, naga sharathrayapati wrote: > I am using solrj to index documents > > i agree with you regarding the index update but i should not see any > deleted documents as it is a fresh index. Can we actually identify what > are > those deleted documents? If you post doc

Re: Avoid re indexing

2015-08-01 Thread naga sharathrayapati

I am using solrj to index documents i agree with you regarding the index update but i should not see any deleted documents as it is a fresh index. Can we actually identify what are those deleted documents? if there is no option of adding shards to existing collection i do not like the idea of re

Re: Avoid re indexing

2015-08-01 Thread Upayavira

On Sat, Aug 1, 2015, at 10:30 PM, naga sharathrayapati wrote: > I have an exception with one of the document after indexing 6 mil > documents > out of 10 mil, is there any way i can avoid re indexing the 6 mil > documents? How are you indexing your documents? Are you using the DIH? Personally, I

Avoid re indexing

2015-08-01 Thread naga sharathrayapati

I have an exception with one of the document after indexing 6 mil documents out of 10 mil, is there any way i can avoid re indexing the 6 mil documents? I also see that there are few documents that are deleted (based on the count) while indexing, is there a way to identify what are those documents

Re: Peronalized Search Results or Matching Documents to Users

2015-08-01 Thread Mikhail Khludnev

On Sat, Aug 1, 2015 at 9:45 PM, Upayavira wrote: > ticket? > https://issues.apache.org/jira/browse/SOLR-5944 > > On Sat, Aug 1, 2015, at 02:02 PM, Erick Erickson wrote: > > How soon? It's pretty much done AFAIK, but the folks trying to work on > > it have had their priorities re-arranged. > > >

Re: Peronalized Search Results or Matching Documents to Users

2015-08-01 Thread Upayavira

ticket? On Sat, Aug 1, 2015, at 02:02 PM, Erick Erickson wrote: > How soon? It's pretty much done AFAIK, but the folks trying to work on > it have had their priorities re-arranged. > > So I really don't have a date. > > Erick > > On Fri, Jul 31, 2015 at 4:59 PM, Upayavira wrote: > > How soon?

Re: Fast autocomplete for large dataset

2015-08-01 Thread Olivier Austina

Thank you Eric, I would like to implement an autocomplete for large dataset. The autocomplete should show the phrase or the question the user want as the user types. The requirement is that the autocomplete should be fast (not slowdown by the volume of data as dataset become bigger), and easy to

Re: Fast autocomplete for large dataset

2015-08-01 Thread Erick Erickson

Not really. There's no need to use ngrams as the article suggests if the terms component does what you need. Which is why I asked you about what autocomplete means in your context. Which you have not clarified. Have you even looked at terms component? Especially the terms.prefix option? Terms com

Re: Fast autocomplete for large dataset

2015-08-01 Thread Olivier Austina

Thank you Eric for your reply. If I understand it seems that these approaches are using index to hold terms. As the index grows bigger, it can be a performance issues. Is it right? Please can you check this article to see what I mean?

Re: Fast autocomplete for large dataset

2015-08-01 Thread Erick Erickson

Well, defining what you mean by "autocomplete" would be a start. If it's just a user types some letters and you suggest the next N terms in the list, TermsComponent will fix you right up. If it's more complicated, the AutoSuggest functionality might help. If it's correcting spelling, there's the

Fast autocomplete for large dataset

2015-08-01 Thread Olivier Austina

Hi, I am looking for a fast and easy to maintain way to do autocomplete for large dataset in solr. I heard about Ternary Search Tree (TST) . But I would like to know if there is something I missed such as best practice, Solr new feature. Any sugge

Re: Peronalized Search Results or Matching Documents to Users

2015-08-01 Thread Erick Erickson

How soon? It's pretty much done AFAIK, but the folks trying to work on it have had their priorities re-arranged. So I really don't have a date. Erick On Fri, Jul 31, 2015 at 4:59 PM, Upayavira wrote: > How soon? And will you be able to use them for querying, or just > faceting/sorting/displayin

Re: Do not match on high frequency terms

2015-08-01 Thread Mikhail Khludnev

It seems like you need to develop custom query or query parser. Regarding SolrJ: you can try to call http://wiki.apache.org/solr/TermsComponent https://cwiki.apache.org/confluence/display/solr/The+Terms+Component I'm not sure how exactly call TermsComponent in SolrJ, I just found https://lucene.apa

Re: Join Parent and Child Documents

2015-08-01 Thread Mikhail Khludnev

On Sat, Aug 1, 2015 at 10:51 AM, Vineeth Dasaraju wrote: > Hi, > > I had indexed a nested json object into solr as a parent document with > child documents. Whenever I query for a term in the child document, I am > returned only the child documents. Is it possible to get the parent > document alo

Join Parent and Child Documents

2015-08-01 Thread Vineeth Dasaraju

Hi, I had indexed a nested json object into solr as a parent document with child documents. Whenever I query for a term in the child document, I am returned only the child documents. Is it possible to get the parent document along with the child documents as a part of the results? I have been tryi

Re: Fast autocomplete for large dataset

Re: solr multicore vs sharding vs 1 big collection

Re: Fast autocomplete for large dataset

Re: solr multicore vs sharding vs 1 big collection

solr multicore vs sharding vs 1 big collection

Re: Avoid re indexing

Re: Avoid re indexing

Re: Avoid re indexing

Re: Avoid re indexing

Re: Avoid re indexing

Re: Avoid re indexing

Avoid re indexing

Re: Peronalized Search Results or Matching Documents to Users

Re: Peronalized Search Results or Matching Documents to Users

Re: Fast autocomplete for large dataset

Re: Fast autocomplete for large dataset

Re: Fast autocomplete for large dataset

Re: Fast autocomplete for large dataset

Fast autocomplete for large dataset

Re: Peronalized Search Results or Matching Documents to Users

Re: Do not match on high frequency terms

Re: Join Parent and Child Documents

Join Parent and Child Documents

23 matches

Site Navigation

Mail list logo

Footer information