Re: how to get all the docIds in the search result?

2009-07-23 Thread Avlesh Singh
query.setRows(Integer.MAX_VALUE); Cheers Avlesh On Thu, Jul 23, 2009 at 8:15 AM, shb suh...@gmail.com wrote: When I use SolrQuery query = new SolrQuery(); query.set(q, issn:0002-9505); query.setRows(10); QueryResponse response = server.query(query);

Re: how to get all the docIds in the search result?

2009-07-23 Thread shb
if I use query.setRows(Integer.MAX_VALUE); the query will become very slow, because searcher will go to fetch the filed value in the index for all the returned document. So if I set query.setRows(10), is there any other ways to get all the ids? thanks 2009/7/23 Avlesh Singh avl...@gmail.com

Re: how to get all the docIds in the search result?

2009-07-23 Thread Toby Cole
Have you tried limiting the fields that you're requesting to just the ID? Something along the line of: query.setRows(Integer.MAX_VALUE); query.setFields(id); Might speed the query up a little. On 23 Jul 2009, at 09:11, shb wrote: Here id is indeed the uniqueKey of a document. I want to get

Re: how to get all the docIds in the search result?

2009-07-23 Thread shb
I have tried the following code: query.setRows(Integer.MAX_VALUE); query.setFields(id); when it return 1000,000 records, it will take about 22s. This is very slow. Is there any other way? 2009/7/23 Toby Cole toby.c...@semantico.com Have you tried limiting the fields that you're requesting

Index per user - thousands of indices in one Solr instance

2009-07-23 Thread Łukasz Osipiuk
Hi, I am new to Solr and I want to get a quick hint if it is suitable for what we want to use it for. We are building e-mail platform and we want to provide our users with full-text search functionality. We are not willing to use single index file for all users as we want to be able to migrate

Re: Index per user - thousands of indices in one Solr instance

2009-07-23 Thread Shalin Shekhar Mangar
On Thu, Jul 23, 2009 at 3:06 PM, Łukasz Osipiuk luk...@osipiuk.net wrote: I am new to Solr and I want to get a quick hint if it is suitable for what we want to use it for. We are building e-mail platform and we want to provide our users with full-text search functionality. We are not

Re: Highlight arbitrary text

2009-07-23 Thread Anders Melchiorsen
On Tue, 21 Jul 2009 14:25:52 +0200, Anders Melchiorsen wrote: On Fri, 17 Jul 2009 16:04:24 +0200, Anders Melchiorsen wrote: However, in the normal highlighter, I am using usePhraseHighlighter and highlightMultiTerm and it seems that there is no way to turn these on in

Re: DataImportHandler / Import from DB : one data set comes in multiple rows

2009-07-23 Thread Glen Newton
Chantal, You might consider LuSql[1]. It has much better performance than Solr DIH. It runs 4-10 times faster on a multicore machine, and can run in 1/20th the heap size Solr needs. It produces a Lucene index. See slides 22-25 in this presentation comparing Solr DIH with LuSql:

Question re SOLR-920 Cache and reuse schema

2009-07-23 Thread Brian Klippel
https://issues.apache.org/jira/browse/SOLR-920 Where would the shared schema.xml be located (same as solr.xml?), and how would dynamic schema play into this? Would each core's dynamic schema still be independent?

Re: Question re SOLR-920 Cache and reuse schema

2009-07-23 Thread Noble Paul നോബിള്‍ नोब्ळ्
shareSchema tries to see if the schema.xml from a given file and timestamp is already loaded . if yes ,the old object is re-used. All the cores which load the same file will share a single object On Thu, Jul 23, 2009 at 3:32 PM, Brian Klippelbr...@theport.com wrote:

Re: Question re SOLR-920 Cache and reuse schema

2009-07-23 Thread Shalin Shekhar Mangar
On Thu, Jul 23, 2009 at 3:32 PM, Brian Klippel br...@theport.com wrote: https://issues.apache.org/jira/browse/SOLR-920 and how would dynamic schema play into this? Would each core's dynamic schema still be independent? I guess you mean dynamic fields. If so, then yes, you will still be

Re: Index per user - thousands of indices in one Solr instance

2009-07-23 Thread Shalin Shekhar Mangar
On Thu, Jul 23, 2009 at 4:30 PM, Łukasz Osipiuk luk...@osipiuk.net wrote: See https://issues.apache.org/jira/browse/SOLR-1293 We're planning to put up a patch soon. Perhaps we can collaborate? What are your estimations to have this patches ready. We have quite tight deadlines and

Re: DataImportHandler / Import from DB : one data set comes in multiple rows

2009-07-23 Thread Chantal Ackermann
Hi Paul, hi Glen, hi all, thank you for your answers. I have followed Paul's solution (as I received it earlier). (I'll keep your suggestion in mind, though, Glen.) It looks good, except that it's not creating any documents... ;-) It is most probably some misunderstanding on my side, and

Facet

2009-07-23 Thread Nishant Chandra
Hi, I am new to Solr and need help with the following use case: I want to provide faceted browsing. For a given product, there are multiple descriptions (feeds, the description being 100-1500 words) that my application gets. I want to check for the presence of a fixed number of terms or attributes

Re: Facet

2009-07-23 Thread Ninad Raut
Try out this with SolrJ SolrQuery query = new SolrQuery(); query.setQuery(q); // query.setQueryType(dismax); query.setFacet(true); query.addFacetField(id); query.addFacetField(text); query.setFacetMinCount(2); On Thu, Jul 23, 2009 at 5:12 PM, Nishant Chandra

Re: DataImportHandler / Import from DB : one data set comes in multiple rows

2009-07-23 Thread Noble Paul നോബിള്‍ नोब्ळ्
Is there a uniqueKey in your schema ? are you returning a value corresponding to that key name? probably you can paste the whole data-config.xml On Thu, Jul 23, 2009 at 4:59 PM, Chantal Ackermannchantal.ackerm...@btelligent.de wrote: Hi Paul, hi Glen, hi all, thank you for your answers. I

Re: DataImportHandler / Import from DB : one data set comes in multiple rows

2009-07-23 Thread Chantal Ackermann
Hi Paul, no, I didn't return the unique key, though there is one defined. I added that to the nextRow() implementation, and I am now returning it as part of the map. But it is still not creating any documents, and now that I can see the ID I have realized that it is always processing the

Re: DataImportHandler / Import from DB : one data set comes in multiple rows

2009-07-23 Thread Otis Gospodnetic
Note that the statement about LuSql (or really any other tool, LuSql is just an example because it was mentioned) is true only if Solr is underutilized because DIH uses a single thread to talk to Solr (is this correct?) vs. LuSql using multiple (I'm guessing that's the case becase of the

Re: how to get all the docIds in the search result?

2009-07-23 Thread Otis Gospodnetic
You could pull the ID directly from the Lucene index, that may be a little faster. You can also use Lucene's TermEnum to get to this. And you should make sure that id field is the first field in your documents (when you index them). But no matter what you do, this will not be subsecond for

Sort field

2009-07-23 Thread Jörg Agatz
Hallo... I have a problem... i want to sort a field at the Moment the field type is text, but i have test it with string or date the content of the field looks like 22.07.09 it is a Date. when i sort, i get : failed to open stream: HTTP request failed! HTTP/1.1 500

Re: Sort field

2009-07-23 Thread Erik Hatcher
On Jul 23, 2009, at 11:03 AM, Jörg Agatz wrote: Hallo... I have a problem... i want to sort a field at the Moment the field type is text, but i have test it with string or date the content of the field looks like 22.07.09 it is a Date. when i sort, i get : failed to open stream: HTTP

Re: how to get all the docIds in the search result?

2009-07-23 Thread Erik Hatcher
Rather than trying to get all document id's in one call to Solr, consider paging through the results. Set rows=1000 or probably larger, then check the numFound and continue making requests to Solr incrementing start parameter accordingly until done. Erik On Jul 23, 2009, at 5:35

Re: DataImportHandler / Import from DB : one data set comes in multiple rows

2009-07-23 Thread Glen Newton
Hi Otis, Yes, you are right: LuSql is heavily optimized for multi-thread/multi-core. It also performs better on a single core with multiple threads, due to the heavy i/o bounded nature of Lucene indexing. So if the DB is the bottleneck, well, yes, then LuSql and any other tool are not going

Re: excluding certain terms from facet counts when faceting based on indexed terms of a field

2009-07-23 Thread Bill Au
I want to exclude a very small number of terms which will be different for each query. So I think my best bet is to use localParam. Bill On Wed, Jul 22, 2009 at 4:16 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I am faceting based on the indexed terms of a field by using facet.field.

Re: excluding certain terms from facet counts when faceting based on indexed terms of a field

2009-07-23 Thread Erik Hatcher
Give it is a small number of terms, seems like just excluding them from use/visibility on the client would be reasonable. Erik On Jul 23, 2009, at 11:43 AM, Bill Au wrote: I want to exclude a very small number of terms which will be different for each query. So I think my best

Re: excluding certain terms from facet counts when faceting based on indexed terms of a field

2009-07-23 Thread Bill Au
That's actually what we have been doing. I was just wondering if there is any way to move this work from the client back into Solr. Bill On Thu, Jul 23, 2009 at 11:47 AM, Erik Hatcher e...@ehatchersolutions.comwrote: Give it is a small number of terms, seems like just excluding them from

Re: how to get all the docIds in the search result?

2009-07-23 Thread Otis Gospodnetic
And if I may add another thing - if you are using Solr in this fashion, have a look at your caches, esp. document cache. If your queries of this type are repeated, you may benefit from large cache. Or, if they are not, you may completely disable some caches. Otis -- Sematext is hiring:

Re: Storing string field in solr.ExternalFieldFile type

2009-07-23 Thread Jibo John
Thanks for the response, Eric. We have seen that size of the index has a direct impact on the search speed, especially when the index size is in GBs, so trying all possible ways to keep the index size as low as we can. We thought solr.ExternalFileField type would help to keep the index

Re: Storing string field in solr.ExternalFieldFile type

2009-07-23 Thread Otis Gospodnetic
I'm not sure if there is a lot of benefit from storing the literal values in that external file vs. directly in the index. There are a number of things one should look at first, as far as performance is concerned - JVM settings, cache sizes, analysis, etc. For example, I have one index here

index backup works only if there are committed index

2009-07-23 Thread solr jay
Hi, I noticed that the backup request http://master_host:port/solr/replication?command=backuphttp://master_host/solr/replication?command=backup works only if there are committed index data, i.e. core.getDeletionPolicy().getLatestCommit() is not null. Otherwise, no backup is created. It sounds

Re: Solr and UIMA

2009-07-23 Thread Grant Ingersoll
On Jul 21, 2009, at 11:57 AM, JCodina wrote: Hello, Grant, there are two ways, to implement this, one is payloads, and the other one is multiple tokens at the same positions. Each of them can be useful, let me explain the way I thick they can be used. Payloads : every token has extra

facet.prefix question

2009-07-23 Thread Licinio Fernández Maurelo
i'm trying to do some filtering in the count list retrieved by solr when doing a faceting query , i'm wondering how can i use facet.prefix to gem something like this: Query facet.field=foofacet.prefix=A OR B Response lst name=facet_fields - lst name=foo int name=A12560/int int

Re: how to get all the docIds in the search result?

2009-07-23 Thread Chris Hostetter
: Here id is indeed the uniqueKey of a document. : I want to get all the ids for some other useage. http://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about

Re: index backup works only if there are committed index

2009-07-23 Thread Otis Gospodnetic
Another options is making backups more directly, not using the Solr backup mechanism. Check the green link on http://www.manning.com/hatcher3/ Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original

Re: Storing string field in solr.ExternalFieldFile type

2009-07-23 Thread Jibo John
Thanks for the quick response, Otis. We have been able to achieve the ratio of 2 with different settings, however, considering the huge volume of the data that we need to deal with - 600 GB of data per day, and, we need to keep it in the index for 3 days - we're looking at all possible

Re: Storing string field in solr.ExternalFieldFile type

2009-07-23 Thread Otis Gospodnetic
Jibo, Well, there is always field compression, which lets you trade the index size/disk space for extra CPU time and thus some increase in indexing and search latency. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP,

Re: Solr Cell

2009-07-23 Thread Matt Weber
Found my own answer, use the literal parameter. Should have dug around before asking. Sorry. Thanks, Matt Weber eSr Technologies http://www.esr-technologies.com On Jul 23, 2009, at 2:26 PM, Matt Weber wrote: Is it possible to supply addition metadata along with the binary file when

Re: LocalSolr - order of fields on xml response

2009-07-23 Thread Daniel Cassiano
Hi Ryan, Thanks for the information. Is this expected to be implemented? Regards, -- Daniel Cassiano _ http://www.apontador.com.br/ http://www.maplink.com.br/ On Wed, Jul 22, 2009 at 10:08 PM, Ryan McKinley ryan...@gmail.com wrote: ya... 'expected', but perhaps

JDBC Import not exposing nested entities

2009-07-23 Thread Tagge, Tim
Hi, I'm attempting to setup a simple joined index of some tables with the following structure... EMPLOYEEORGANIZATION - employee_id organization_id first_name organization_name last_name edr_party_id organization_id When

RE: Exception searching PhoneticFilterFactory field with number

2009-07-23 Thread Robert Petersen
Sure Otis, and in fact I can narrow it down to just exactly that query, but with user queries I don't think it is right to throw an exception out of phonetic filter factory if the user enters a number. What I am saying is am I going to have to filter the user queries for numerics before using it

RE: Exception searching PhoneticFilterFactory field with number

2009-07-23 Thread Robert Petersen
Hey I just noticed that this only happens when I enable debug. If debugQuery=true is on the URL then it goes through the debugging component and that is throwing this exception. It must be getting an empty field object from the phonetic filter factory for numbers or something similar

RE: Exception searching PhoneticFilterFactory field with number

2009-07-23 Thread Robert Petersen
Actually my first question should be, Is this a known bug or am I doing something wrong? The only one thing I can find on this topic is the following statement on the solr-dev group when discussing adding the maxCodeLength, see point two below: Ryan McKinley updated SOLR-813:

server won't start using configs from Drupal

2009-07-23 Thread david
I've downloaded solr-2009-07-21.tgz and followed the instructions at http://drupal.org/node/343467 including retrieving the solrconfig.xml and schema.xml files from the Drupal apachesolr module. The server seems to start properly with the original solrconfig.xml and schema.xml files When I

Re: server won't start using configs from Drupal

2009-07-23 Thread Otis Gospodnetic
I think the problem is CharStreamAwareWhitespaceTokenizerFactory, which used to live in Solr (when Drupal schema.xml for Solr was made), but has since moved to Lucene. I'm half guessing. :) Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta,