Re: Experience with indexing billions of documents?

2010-04-13 Thread Thomas Koch
Bradford Stephens: > Hey there, > > We've actually been tackling this problem at Drawn to Scale. We'd really > like to get our hands on LuceHBase to see how it scales. Our faceting still > needs to be done in-memory, which is kinda tricky, but it's worth > exploring. Hi Bradford, thank you for yo

Ignoring dataDir parameter in core creating process

2010-04-13 Thread Sergei Goorov
When solr core is being created (in my case from solrj lib) the parameter dataDir is ignored, the value from solrconfig.xml is used instead it. Under debugger I find strange place in org.apache.solr.core.CoreContainer (solr version 1.4): public SolrCore create(CoreDescriptor dcore) throws ParserC

Re: problem with RegexTransformer and delimited data

2010-04-13 Thread Lance Norskog
It would be nice if the RegexTransformer logged that the user does not know how to use the different parameters... On 4/13/10, Gerald wrote: > > AWESOME. may take me some time to understand the regex pattern but it > worked > > And many thanks for looking into RegexTransformer.process(). Nice t

RE: Internal Server Error

2010-04-13 Thread Sandhya Agarwal
Thanks all. I realized the issue is because of my solr home path. I created a new directory and copied all the config files there and mentioned that as my solr path. However, I failed to notice that solrconfig.xml, uses relative paths for all the jars, which were no longer available. Could so

Re: Internal Server Error

2010-04-13 Thread Lance Norskog
The extracting stuff can use a lot of memory for large documents. Your app may be running out of memory. Tomcat by default has logging for tomcat but not for tomcat apps. If you configure tomcat's log4j to log org.apache.solr classes it will tell you what is wrong. On Tue, Apr 13, 2010 at 5:22 AM

Re: limit rows by field

2010-04-13 Thread Geert-Jan Brits
I believe you're talking about Fieldcollapsing. It's available as a patch, although I'm not sure how well it applies to the current trunk. for more info check out: http://wiki.apache.org/solr/FieldCollapsing Geert-Jan 2010/4/13 Felix Zimmermann > Hi

Re: Experience with indexing billions of documents?

2010-04-13 Thread Bradford Stephens
Hey there, We've actually been tackling this problem at Drawn to Scale. We'd really like to get our hands on LuceHBase to see how it scales. Our faceting still needs to be done in-memory, which is kinda tricky, but it's worth exploring. On Mon, Apr 12, 2010 at 7:27 AM, Thomas Koch wrote: > Hi,

Re: SChema change with an additional copyField

2010-04-13 Thread Grant Ingersoll
On Apr 13, 2010, at 1:29 PM, Andrea Gazzarini wrote: > Hi, after indexing a lot of data I found that in my schema is missing the > copyfield declaration for my "spell" field..:( > The question is : do I have to reindex all the documents? Unfortunately, yes, you do, unless your spell field is th

Re: indexversion not updating on master

2010-04-13 Thread Naomi Dushay
Does it matter that my last index update did NOT add any new documents and did NOT delete any existing documents? (For testing, I just re- ran the last update) - Naomi On Apr 13, 2010, at 11:09 AM, Naomi Dushay wrote: I'm having trouble with replication, and i believe it's because the in

indexversion not updating on master

2010-04-13 Thread Naomi Dushay
I'm having trouble with replication, and i believe it's because the indexversion isn't updating on master. My solrconfig.xml on master: startup commit optimize solrconfig- slave.xml:solrconfig.xml,schema.xml,stopwords.txt BTW, I am certain tha

RE: problem with RegexTransformer and delimited data

2010-04-13 Thread Gerald
AWESOME. may take me some time to understand the regex pattern but it worked And many thanks for looking into RegexTransformer.process(). Nice to know that splitby cant be used with regex or replacewith etc Many thanks Steve. -- View this message in context: http://n3.nabble.com/problem-wit

RE: problem with RegexTransformer and delimited data

2010-04-13 Thread Steven A Rowe
Hi Gerald, Looking at the source for RegexTransformer.process(), which is called for each source row, I can see that there are three mutually exclusive processing cases (warning - (extremely) pseudo code): 1. if (splitBy) then return row.split(splitBy) 2. else if (replaceWith) then return row.r

SChema change with an additional copyField

2010-04-13 Thread Andrea Gazzarini
Hi, after indexing a lot of data I found that in my schema is missing the copyfield declaration for my "spell" field..:( The question is : do I have to reindex all the documents? I'm asking that because the new field is just a copy of an existing one and so I was wondering if SOLR is able to und

Re: Index replicated, though the snapshot is stale

2010-04-13 Thread Bill Au
Are you sure you are using a new Searcher that is created after the replicated index has been installed? Bill On Tue, Apr 13, 2010 at 1:00 PM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > Maybe there's a known bug here? The index is replicated to the index > directory in /data, howev

Re: Snapshooter shooting after commit or optimize

2010-04-13 Thread Bill Au
What's the exact command you used to run snappuller and snapinstaller? What do you mean by "I have set it as a option when Solr starts" Bill On Tue, Apr 13, 2010 at 1:01 PM, william pink wrote: > On Mon, Apr 12, 2010 at 7:02 PM, Bill Au wrote: > > > The lines you have encloses are commented

Re: change get to post ??

2010-04-13 Thread stockii
okay thx. but its not relevant yet :D my chef said, that the search-client dont have any sub-categories. so my problem is that i have only the main cat from where the search starts. and solr must find out the sub-cats and search in this too ... thx for the &fq=cat_id:(7994 7995 8375 8465 8843

Re: Snapshooter shooting after commit or optimize

2010-04-13 Thread william pink
On Mon, Apr 12, 2010 at 7:02 PM, Bill Au wrote: > The lines you have encloses are commented out by the > > Bill > > On Mon, Apr 12, 2010 at 1:32 PM, william pink wrote: > > > Hi, > > > > I am running Solr 1.2 ( I will be updating in due course) > > > > I am having a few issues with doing the sn

Index replicated, though the snapshot is stale

2010-04-13 Thread Jason Rutherglen
Maybe there's a known bug here? The index is replicated to the index directory in /data, however the index.20100408042339 snaphot is what I can search on which has 1 document. Directory: /data index index.20100408042339 index.properties replication.properties Directory: index.20100408042339 dr

Re: change get to post ??

2010-04-13 Thread Ahmet Arslan
> My client uses my autocompletion with an normal > http-Request to solr. like > this: http://XXX/solr/suggestpg/select/?q=harry > > so, when i want to search in a category with all his > childs, my request is > too long.  > How can i change from GET to POST ?? > > my request to solr looks like

Re: change get to post ??

2010-04-13 Thread stockii
what did you mean with "normal queries".. our autosuggestion is on an extra core ;-) and the performance is really good. do anyone known how its the best way for search in much categories ? has solr something like an tree component or similar ? -- View this message in context: http://n3.nabble

Re: change get to post ??

2010-04-13 Thread Michael Kuhlmann
I wouldn't do autosuggestion with normal queries anyway. Because of better performance... :-) I don't use DIH, so I can't tell what to do then. For us, we import data with a simple PHP script, which was rather easy to write. So we have full control on Solr's data structure. You somehow have to add

Re: problem with RegexTransformer and delimited data

2010-04-13 Thread Gerald
Thanks guys. Unfortunately, neither pattern works. I tried various combos including these: ([^|]*)\|([^|]*) with replaceWith="$1" (.*?)(\|.*) with replaceWith="$1" (.*?)\|.* with and without replaceWith="$1" (.*?)\| with and without replaceWith="$1" As previously mentioned, I have tried

Re: problem with RegexTransformer and delimited data

2010-04-13 Thread Gerald
Thanks guys. Unfortunately, neither pattern works. I tried various combos including these: ([^|]*)\|([^|]*) with replaceWith="$1" (.*?)(\|.*) with replaceWith="$1" (.*?)\|.* with and without replaceWith="$1" (.*?)\| with and without replaceWith="$1" As previously mentioned, I have tried many

Re: Displaying fieldValueCache stats in Solr 1.4 admin/stats page

2010-04-13 Thread Yonik Seeley
This is an implicit cache (if you don't define it, it will still exist and show up on stats.jsp). Can you be more specific about "FieldValueCache stats are not getting displayed" If you start the example server, go to the stats page, and search for "fieldValueCache", is it there? Or do you mean t

Re: closest terms, sentence, boosting 'business' keywords instead of field ?

2010-04-13 Thread Grant Ingersoll
On Apr 13, 2010, at 10:02 AM, Abdelhamid ABID wrote: >> >> Do you have an example of what you are trying to do? > > > For instance a request like: "tomcat servlet" should return document which >>> have "tomcat is a servlet container" rather than a document that >>> have"to

Re: Error while sorting by geo_distance in Solr 1.4

2010-04-13 Thread Grant Ingersoll
Hi Sandeep, You're probably better off asking on the LocalSolr mailing list (I think there is one) or trying out the Solr trunk, which has much of this functionality incorporated in a more native manner. For docs on that, refer to http://wiki.apache.org/solr/SpatialSearch, but note it is not y

Displaying fieldValueCache stats in Solr 1.4 admin/stats page

2010-04-13 Thread SandeepTagore
Hi All, FieldValueCache stats are not getting displayed on http://./solr/admin/stats.jsp page. I configured it in solrconfig.xml (Solr 1.4) as Any inputs on this? Thank you. Regards, Sandeep -- View this message in context: http://n3.nabble.com/Displaying-fieldValueCache-stats-in-Solr-

Re: closest terms, sentence, boosting 'business' keywords instead of field ?

2010-04-13 Thread Abdelhamid ABID
> > Do you have an example of what you are trying to do? >>> >>> For instance a request like: "tomcat servlet" should return document which >> have "tomcat is a servlet container" rather than a document that >> have"tomcat offers the last specification implementaion of the servlet >>

Re: change get to post ??

2010-04-13 Thread stockii
heya. okay NOW. i import from database with DIH. every item have "cat_id", more not. for the "normal" search it works to use fq and Post the search. but for my autosuggestion, it didnt work, because our app does not use the autosuggestion with our API. Because of better performance ... --

limit rows by field

2010-04-13 Thread Felix Zimmermann
Hi, for a preview of results, I need to display up to 3 documents per category. Is it possible to limit the number of rows of solr response by field-values? What I mean is: rows: 9 -(sub)rows of "field:cat1" : 3 -(sub)rows of "field:cat2" : 3 -(sub)rows of "field:cat3" : 3 If not, is there a wor

Re: change get to post ??

2010-04-13 Thread Michael Kuhlmann
Hi, Am 13.04.2010 14:52, schrieb stockii: > some cat, have 300 child-categories. And that's the reason why you shouldn't add them all to your filter query. > or, how can i import the cat-data ? Again: How do you do it NOW? -Michael

Re: change get to post ??

2010-04-13 Thread stockii
heya... childs... ^^ hehe not my schema of the database :P we have 2447 categorys and its gonna be more and more... some cat, have 300 child-categories. what do you think is the best way to solve this problem ? the porblem is that the APP (iPhone App) doesnt know all the sub-categories. so, t

Re: closest terms, sentence, boosting 'business' keywords instead of field ?

2010-04-13 Thread Grant Ingersoll
On Apr 12, 2010, at 7:57 PM, Abdelhamid ABID wrote: > Hi, > - I'm bit confused on how analyzer apply filters on query, I know that they > are applied in order on which they are declared, but still, does the search > result > include only the final digest of the filters chain or at each filter ste

Re: Combining Dismax and payload boosting

2010-04-13 Thread Erik Hatcher
Victoria, An example of specifically what types of queries you'd like to do would be helpful. Using nested queries you can leverage dismax and your custom query parser together, which may be what you're looking for. See this article for details on nested queries:

Re: change get to post ??

2010-04-13 Thread Michael Kuhlmann
You need to change the way how your data is imported. Or look for an alternative how to build your query. It depends on your data model, and your import mechanism. Do your really have hundreds of categories? BTW, "childs" is amusing! ;-) -Michael Am 13.04.2010 14:12, schrieb stockii: > > hi. t

Re: Internal Server Error

2010-04-13 Thread Andrea Gazzarini
Some problem with extraction (Tika, etc...)? My suggestion is : try to extract manually the document...I had a lot of problem with Tika and pdf extraction... Cheers, Andrea Il 13/04/2010 13:05, Sandhya Agarwal ha scritto: Hello, I have the following piece of code : ContentStreamUpdateReques

Re: change get to post ??

2010-04-13 Thread stockii
hi. thx for reply =) okay i think im little bit stupid. i dont know how can i filter the right categorys. i get only the id of one category. i importet every childs for each item and the parent_category_id. but it ist only one parent for each item and not for each category. so an example fo

[Job] Software Developers With Solr Experience (Cambridge, MA USA)

2010-04-13 Thread Chris Herron
Hi folks, Our company, a funded startup based in Cambridge, is hiring senior developers with Solr experience. See our ad posted on Startuply: http://www.startuply.com/Jobs/Software_Developer_Server__1973_1.aspx Thanks, Chris

Re: change get to post ??

2010-04-13 Thread Michael Kuhlmann
Hi, the problem is not the GET request type, the problem is that you build a far too complicated query. This won't scale very much and looks rather weird. Why don't you just add all parent category ids to every document at index time? Then you could simply filter your request with the topmost cat

Combining Dismax and payload boosting

2010-04-13 Thread Victoria Kagansky
Hi, We are using payloads for score boosting. For this purpose we've implemented custom boosting QueryParser and similarity function. We followed http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/ . On the other hand, we'd like to use dismax query handling because of it

change get to post ??

2010-04-13 Thread stockii
Hello. My client uses my autocompletion with an normal http-Request to solr. like this: http://XXX/solr/suggestpg/select/?q=harry so, when i want to search in a category with all his childs, my request is too long. How can i change from GET to POST ?? my request to solr looks like this. in sh

Internal Server Error

2010-04-13 Thread Sandhya Agarwal
Hello, I have the following piece of code : ContentStreamUpdateRequest contentUpdateRequest = new ContentStreamUpdateRequest("/update/extract"); contentUpdateRequest.addFile(new File(contentFileName)); contentUpdateRequest.setParam("extractOnly","true"); NamedList result = solrServerSession.req

Re: Multi-core memory problem

2010-04-13 Thread Victoria Kagansky
Thanks! We enlarged the max heap size and it looks ok so far. On Fri, Apr 9, 2010 at 4:23 AM, Lance Norskog wrote: > Since the facet "cache" is hard-allocated and has not eviction policy, > you could do a facet query on each core as part of the wam-up. This > way, the facets will not fail. At t

SolRJ 1.4 : java.lang.VerifyError: class org.apache.solr.search.SolrIndexReader overrides final method setNorm.(ILjava/lang/String;B)V

2010-04-13 Thread Damien Huriet
Hi, I'm starting using SolR 1.4 queried by SolRJ 1.4 (all official release that I've downloaded from the main link on the web site : http://www.apache.org/dyn/closer.cgi/lucene/solr/ mirror : http://apache.multidist.com/lucene/solr/1.4.0/ downloaded the .zip file.) The servers start OK a

Re: How to make SOLR display empty value attributes also

2010-04-13 Thread SandeepTagore
I guess, You can achieve this with DataImportHandler Transformers. -- View this message in context: http://n3.nabble.com/How-to-make-SOLR-display-empty-value-attributes-also-tp712766p715436.html Sent from the Solr - User mailing list archive at Nabble.com.

Error while sorting by geo_distance in Solr 1.4

2010-04-13 Thread SandeepTagore
Hi All, I am using Solr 1.4 (10 November 2009 release), Lucene core 2.9.2, LocalSolr 2.0, LocalLucene 2.0, Tomcat 5.5 I get the following error when I try to sort the result by geo_distance. Here is the stacktrace... SEVERE: java.lang.NullPointerException at org.apache.lucene.search.Sort