Re: if I only need exact search, does frequency/score matter?

2012-12-15 Thread Jie Sun
thanks for the information... I did come across that discussion, I guess I will try to write a customized Similarity class and disable tf. I hope this is not totally odd to do ... I do notice about 10GB .frq file size in cores that have total 10-30GB .fdt files. I wish the benchmark will show me

Re: Dedup component

2012-12-15 Thread Jorge Luis Betancourt Gonzalez
Exist any similar approach that I could use in solr 3.6.1 or should I add this logic to my application? - Mensaje original - De: "Upayavira" Para: solr-user@lucene.apache.org Enviados: Sábado, 15 de Diciembre 2012 12:37:11 Asunto: Re: Dedup component Nope, it is a Solr 4.0 thing. In ord

RE: Need help with delta import

2012-12-15 Thread umajava
I have changed to use dih.xx but still no luck. Even with dataimport or dataimporter the query is able to fetch the delta records but they are not able to commit to solr. Would there be any other reason why this would fail? -- View this message in context: http://lucene.472066.n3.nabble.com/Nee

Suggester not working

2012-12-15 Thread amans19
Hello, I am using SolrCloud 4.0.0 and trying to get the Suggester to work. I have set it up according to the wiki instructions but can't get it to return any suggestions. Here is my setup: *schema.xml* *solrconfig.xml* text_auto suggest org.apache.solr.spelling.sug

Re: Solrcloud and Node.js

2012-12-15 Thread Mark Miller
There is a /zookeeper servlet that the admin UI uses for the Cloud tab. I don't know much about it, I think Ryan wrote it. The other option is to talk to zk directly. I also plan on adding an admin handler for ZooKeeper at some point. - Mark On Dec 15, 2012, at 12:33 PM, Luis Cappa Banda wrot

Re: small QTime but slow results to user

2012-12-15 Thread Otis Gospodnetic
When your index is all cached by OS you won't see disk IO. Smaller heap, smaller caches, more RAM. Otis -- Performance Monitoring - http://sematext.com/spm On Dec 15, 2012 1:11 PM, "S L" wrote: > My virtual machine has 6GB of RAM. Tomcat is currently configured to use > 4GB > of it. The size of

Re: small QTime but slow results to user

2012-12-15 Thread Yonik Seeley
On Sat, Dec 15, 2012 at 1:11 PM, S L wrote: > My virtual machine has 6GB of RAM. Tomcat is currently configured to use 4GB > of it. The size of the index is 5.4GB for 3 million records which averages > out to 1.8KB per record. I can look at trimming the data, having fewer > records in the index to

Re: small QTime but slow results to user

2012-12-15 Thread S L
p.s. Regarding streaming of the dat, my Java servlet uses solrj and iterates through the results. Right now I'm focused on getting rid of the delay that cause some queries to take 6 or 8 seconds to complete so I'm not even looking at the performance of the streaming. -- View this message in con

Re: small QTime but slow results to user

2012-12-15 Thread S L
My virtual machine has 6GB of RAM. Tomcat is currently configured to use 4GB of it. The size of the index is 5.4GB for 3 million records which averages out to 1.8KB per record. I can look at trimming the data, having fewer records in the index to make it smaller, or getting more memory for the VM.

Re: Dedup component

2012-12-15 Thread Upayavira
Nope, it is a Solr 4.0 thing. In order for it to work, you need to store every field, as what it does behind the scenes is retrieve the stored fields, rebuilds the document, and then posts the whole document back. Upayavira On Sat, Dec 15, 2012, at 04:52 PM, Jorge Luis Betancourt Gonzalez wrote:

Re: small QTime but slow results to user

2012-12-15 Thread Yonik Seeley
On Sat, Dec 15, 2012 at 12:04 PM, S L wrote: > Thanks everyone for the responses. > > I did some more queries and watched disk activity with iostat. Sure enough, > during some of the slow queries the disk was pegged at 100% (or more.) > > The requirement for the app I'm building is to be able to r

Re: Solrcloud and Node.js

2012-12-15 Thread Luis Cappa Banda
Thanks a lot, Per. Now I understand the whole scenario. One last question: I've been searching trying to find some kind of request handler that retrieves cluster status information, but no luck. I know that there exists a JSON called clusterstate.json, but I don't know the way to get it in raw JSON

Re: small QTime but slow results to user

2012-12-15 Thread S L
I just did the experiment of retrieving only the metaDataUrl field. I still sometimes get slow retrieval times. One query took 2.6 seconds of real time to retrieve 80k of data. There were 500 results. QTime was 229. So, I do need to track down where the extra 2+ seconds is going. -- View this me

Re: small QTime but slow results to user

2012-12-15 Thread S L
Thanks everyone for the responses. I did some more queries and watched disk activity with iostat. Sure enough, during some of the slow queries the disk was pegged at 100% (or more.) The requirement for the app I'm building is to be able to retrieve 500 results in ideally one second. The index has

Re: Dedup component

2012-12-15 Thread Jorge Luis Betancourt Gonzalez
Is this updatable fields available in Solr 3.6.1, is the one I'm using right now. - Mensaje original - De: "Upayavira" Para: solr-user@lucene.apache.org Enviados: Sábado, 15 de Diciembre 2012 7:56:45 Asunto: Re: Dedup component Make the ID field out of the query text so you don't have t

Re: fieldType custom search

2012-12-15 Thread Antoine LE FLOC'H
Otis, Can you give more details on this ? Sounds interesting to me. What about if you are trying to re-order millions of Lucene documents ? Did you use grouping first ? Antoine. On Thu, Dec 13, 2012 at 8:54 PM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Hi, > > We've done something

Re: optimun precisionStep for DAY granularity in a TrieDateField

2012-12-15 Thread Jack Krupansky
Maybe we're at the stage of raising the issue of whether the significant extra storage for time of day warrants a storage format that is optimized for day only, call it TrieDay (or TrieDateTimeless.) -- Jack Krupansky -Original Message- From: jmlucjav Sent: Saturday, December 15, 201

Re: Dedup component

2012-12-15 Thread Upayavira
Make the ID field out of the query text so you don't have to use the dedup component, then use the updatable fields functionality in Solr 4.0: $ curl http://localhost:8983/solr/update -H 'Content-type:application/json' -d ' [ {"id": "book1", "copies_i" : { "inc" : 1}, "cat" : {

Re: Solrcloud and Node.js

2012-12-15 Thread Per Steffensen
Luis Cappa Banda skrev: Do you know if SolrCloud replica shards have 100% the same data as the leader ones every time? Probably wen synchronizing with leaders there exists a delay, so executing queries to replicas won't be a good idea. As long as the replica is in state "active" it will be 100

Re: Solrcloud and Node.js

2012-12-15 Thread Luis Cappa Banda
Hello, Per. Thanks for your answer! I jave worked a lot with SolrJ and in the last two months also with the new SolrJ 4.0 and specifically with Zookeeper and CloudSolrServer implementation. I've developed a search engine wrapper that dispatches queries to SolrCloud using a CloudSolrServer pool. Th

Re: optimun precisionStep for DAY granularity in a TrieDateField

2012-12-15 Thread jmlucjav
without going through such rigorous testing, maybe for my case (interested only in DAY), I could just index the trielong values such as 20121010, 20110101 etc... This would take less space than trieDate (I guess), and I still have a date looking number (for easier handling). I could even base the

Re: Solrcloud and Node.js

2012-12-15 Thread Per Steffensen
As Mark mentioned Solr(Cloud) can be accessed through HTTP and return e.g. JSON which should be easy to handle in a javascript. But the client-part (SolrJ) of Solr is not just a dumb client interface - it provides a lot of client-side functionality, e.g. some intelligent decision making based o