How to manage resource out of index?

2010-07-07 Thread Li Li
I used to store full text into lucene index. But I found it's very slow when merging index because when merging 2 segments it copy the fdt files into a new one. So I want to only index full text. But When searching I need the full text for applications such as hightlight and view full text. I can

Re: How to manage resource out of index?

2010-07-07 Thread Rebecca Watson
hi li, i looked at doing something similar - where we only index the text but retrieve search results / highlight from files -- we ended up giving up because of the amount of customisation required in solr -- mainly because we wanted the distributed search functionality in solr which meant making

Re: Adding new elements to index

2010-07-07 Thread Xavier Rodriguez
Thanks for the quick reply! In fact it was a typo, the 200 rows I got were from postgres. I tried to say that the full-import was omitting the 100 oracle rows. When I run the full import, I run it as a single job, using the url command=full-import. I've tried to clear the index both using the

Re: Adding new elements to index

2010-07-07 Thread Govind Kanshi
Just for testing purpose - I would 1. Use curl to create new docs 2. Use Solrj to go to individual dbs and collect docs. On Wed, Jul 7, 2010 at 12:45 PM, Xavier Rodriguez xee...@gmail.com wrote: Thanks for the quick reply! In fact it was a typo, the 200 rows I got were from postgres. I

Huge pages

2010-07-07 Thread Glen Newton
I was wondering if anyone has any experience using huge pages[1] to improve SOLR (or Lucene) performance (esp on 64bit). Some are reporting major performance gains in large, memory intense applications (like EJBs)[2]. Also, ephemeral but significant performance reductions have also been solved

Re: Adding new elements to index

2010-07-07 Thread Alexey Serba
1) Shouldn't you put your entity elements under document tag, i.e. dataConfig dataSource ... / dataSource ... / document name=docs entity ../entity entity ../entity /document /dataConfig 2) What happens if you try to run full-import with explicitly specified entity GET

Re: Modifications to AbstractSubTypeFieldType

2010-07-07 Thread Grant Ingersoll
This looks reasonable. I'll take a look at the patch. Originally, I had intended that it was just for one Field Sub Type, thinking that if we ever wanted multiple sub types, that a new, separate class would be needed, but if this proves to be clean this way, then I see no reason not to

ClassCastException SOLR

2010-07-07 Thread Martin Kysel
Hi, I am trying to make a Lucene module for SKOS-based synonym expansion. As I wanted to implement the Filter in SOLR, I get a ClassCastException. So I tried to take one of the existing SOLR Filters and FilterFactories, change the package information, compress into a jar and use it as a

Re: index format error because disk full

2010-07-07 Thread osocurious2
I haven't used this myself, but Solr supports a http://wiki.apache.org/solr/UpdateXmlMessages#A.22rollback.22 rollback function. It is supposed to rollback to the state at the previous commit. So you may want to turn off auto-commit on the index you are updating if you want to control what that

Re: Modifications to AbstractSubTypeFieldType

2010-07-07 Thread Mark Allan
Currently our only requirement is to be able to search on the numerical part of the daterange field, so our field type overrides getRangeQuery and getFieldQuery to consider only the first two subfields. If we wanted to be able to search the name subfield as well, I suppose we could do

year range field, proper data type?

2010-07-07 Thread Jonathan Rochkind
So I will have a solr field that contains years, ie, 1990, 2010, maybe even 1492, 1209 and 907/0907. I will be doing range limits over this field. Ie, [1950 TO 1975] or what have you. The data represents publication dates of books on a large library shelves; there will be around 3 million

Per-user results sets

2010-07-07 Thread Jean-Michel Philippon-Nadeau
Hi list, I am wondering if Solr/Lucene can help improve my existing search engine. I would like to have different results for each user - but still have relevant results. Each user would have different score multipliers for each searchable item. Is this something possible? Thanks, --

Re: Modifications to AbstractSubTypeFieldType

2010-07-07 Thread Yonik Seeley
On Wed, Jul 7, 2010 at 8:15 AM, Grant Ingersoll gsing...@apache.org wrote: Originally, I had intended that it was just for one Field Sub Type, thinking that if we ever wanted multiple sub types, that a new, separate class would be needed Right - this was my original thinking too.

Handling Updates

2010-07-07 Thread Frank A
I'm still pretty new to SOLR and have a question about handling updates. I currently have a db-config to do a bulk import. I have a single root entity and then some data that comes from other tables. This works fine for an initial bulk load. However, once indexed, is there a way I can tell

Re: Adding new elements to index

2010-07-07 Thread Erick Erickson
Hmmm, let's see your schema definitions please. I'm suspicious because you've implied that you do use a unique key. If it's required, then your definitions don't select it into the same name (i.e. you select as id_carrer in one and id_hidrant in another). So if id_hidrant was defined as your

Re: Wildcards queries

2010-07-07 Thread Erick Erickson
You need to look carefully at your schema.xml. There are plenty of comments in that file describing what's going on. That's where you set up your analyzers by chaining together various tokenizers and filters. I think you're confused about indexing and storing. Generally it's a bad practice to

stemming the index

2010-07-07 Thread sarfaraz masood
My index contains data of 2 different languages, English German. Now which analyzer stemmer should be applied on this data before feeding to index -Sarfaraz

Re: year range field, proper data type?

2010-07-07 Thread Erick Erickson
This isn't a very worrisome case. Most of the messages you see on the board about the dangers of dates arise because dates can be stored with many unique values if they include milliseconds. Then, when sorting on date your memory explodes because all the dates are loaded into memory. In your

Re: stemming the index

2010-07-07 Thread Erick Erickson
The short answer is there isn't a single analyzer and stemmer that really work well for mixed-language indexing and searching. Take a look through the mail archive, try search for multilanguage or multi-language or multiple languages. There's a wealth of info there because this topic has been

terms separated by -

2010-07-07 Thread sarfaraz masood
There are terms in my data like : one-way , separated by '-' , now the problem is that the standard analyzer is considering these as a single term instead of two. but i need that these should be stored as two terms in the index.. but how to do this ?? Sarfaraz

Re: stemming the index

2010-07-07 Thread sarfaraz masood
Thanx Erick :-) --- On Thu, 8/7/10, Erick Erickson erickerick...@gmail.com wrote: From: Erick Erickson erickerick...@gmail.com Subject: Re: stemming the index To: solr-user@lucene.apache.org Date: Thursday, 8 July, 2010, 1:33 AM The short answer is there isn't a single analyzer and stemmer that

Re: terms separated by -

2010-07-07 Thread Erick Erickson
Take a look at WordDelimiterFilterFactory Erick On Wed, Jul 7, 2010 at 4:15 PM, sarfaraz masood sarfarazmasood2...@yahoo.com wrote: There are terms in my data like : one-way , separated by '-' , now the problem is that the standard analyzer is considering these as a single term instead

Re: Solrj throws RuntimeException - Invalid version or the data is not in javabin format

2010-07-07 Thread Chris Hostetter
: Ubuntu server (see exception below). The same configuration works when : injecting from a Windows client to a Windows server. interesting ... so you're saying that if you use the exact same SolrJ code, and just change the host:port, it works on windows? are you certian that the version of

Re: indexing xml document with literals

2010-07-07 Thread Chris Hostetter
: Does anyone know how to read in data from one or more of the example xml docs : and ALSO store the filename and path from which it came? Solr has no knowledge that your xml docs are actually files ... the XML syntax (adddoc...) is just a serialization mechanism for streaming data to solr

Re: Solr Cloud/ Solr integration with zookeeper

2010-07-07 Thread Chris Hostetter
: with multicore. i cannot access: : http://localhost:8983/solr/collection1/admin/zookeeper.jsp why would you expect that URL to work? you don't have a core named collection1 in the solr.xml you posted... : cores adminPath=/admin/cores defaultCoreName=collection1 : core name=GPTWPI

Re: Diiferences in avgRequestsPerSecond of Solr ..

2010-07-07 Thread Chris Hostetter
: I am fectching the following details programatically : 1) you didn't tell us how you were fetching those detials programatically .. what URL are you using? 2) he fact that the handlerStart times are different suggests that you are not looking at the same handler (maybe you are looking at

Using hl.regex.pattern to print complete lines

2010-07-07 Thread Peter Spam
Hi, I have a text file broken apart by carriage returns, and I'd like to only return entire lines. So, I'm trying to use this: hl.fragmenter=regex hl.regex.pattern=^.*$ ... but I still get fragments, even if I crank up the hl.regex.slop to 3 or so. I also tried a pattern of

Re: ClassCastException SOLR

2010-07-07 Thread Lance Norskog
TokenFilterFactory is an interface. Your factory class has to implement this interface. If you look at the Lucene factories, they all subclass from BaseTokenFilterFactory which then subclasses from BaseTokenStreamFactory. That last one does various things for the child factories (I don't know

Re: index format error because disk full

2010-07-07 Thread Lance Norskog
If autocommit does not to an automatic rollback, that is a serious bug. There should be a way to detect that an automatic rollback has happened, but I don't know what it is. Maybe something in the Solr MBeans? On Wed, Jul 7, 2010 at 5:41 AM, osocurious2 ken.fos...@realestate.com wrote: I

Re: Per-user results sets

2010-07-07 Thread Lance Norskog
Yes, for a user's query you would include a different set of boosts as a parameter in the search request. It's easy. You need the user-boost set mapping in your front end, not in Solr. On Wed, Jul 7, 2010 at 8:44 AM, Jean-Michel Philippon-Nadeau j...@jmpnadeau.ca wrote: Hi list, I am wondering

Re: Handling Updates

2010-07-07 Thread Lance Norskog
You can pass variables to the DIH from the URL parameters. This would let you pass a query term into the DIH operation. On Wed, Jul 7, 2010 at 11:53 AM, Frank A fsa...@gmail.com wrote: I'm still pretty new to SOLR and have a question about handling updates.  I currently have a db-config to do a

Re: year range field, proper data type?

2010-07-07 Thread Lance Norskog
There is no 'trie string'. If you use a trie type for this problem, sorting will take much less memory. Sorting strings uses memory both per document and per unique term. The Trie types do not use any memory per unique term. So, yes, a Trie Integer is a good choice for this problem. On Wed, Jul

Re: tomcat solr logs

2010-07-07 Thread Jeff Hammerbacher
Hey Robert, You may want to check out Flume for log file collection: http://github.com/cloudera/flume. We don't currently allow Flume to populate a Solr index, but that would be quite an interesting use case! Later, Jeff On Wed, Jun 30, 2010 at 3:06 PM, Robert Petersen rober...@buy.com wrote:

Re: index format error because disk full

2010-07-07 Thread Li Li
I use SegmentInfos to read the segment_N file and found the error is that it try to load deletedDocs but the .del file's size is 0(because of disk error) . So I use SegmentInfos to set delGen=-1 to ignore deleted Docs. But I think there is some bug. The logic of write my be -- it first writes the