Getting the offset of search keyword in a document

2010-07-23 Thread Ryan Chan
Hello, I am new to Solr/Lucene and I am evaluating if they suit my need and replace our in-house system. Our requirements: 1. I have multiple documents (1M) 2. Each document contains text ranged from few KB to a few MB 3. I want to search for a keyword, search thru all theses document, and it r

Re: Autocommit not happening

2010-07-23 Thread John DeRosa
I'll see you, and raise. My solrconfig.xml wasn't being copied to the server by the deployment script. On Jul 23, 2010, at 3:26 PM, Jay Luker wrote: > For the sake of any future googlers I'll report my own clueless but > thankfully brief struggle with autocommit. > > There are two parts to the

SOLR Memory Usage - Where does it go?

2010-07-23 Thread Stephen Weiss
We have been having problems with SOLR on one project lately. Forgive me for writing a novel here but it's really important that we identify the root cause of this issue. It is becoming unavailable at random intervals, and the problem appears to be memory related. There are basically two

Re: help with a schema design problem

2010-07-23 Thread Chris Hostetter
: > Is there any way in solr to say p_value[someIndex]="pramod" : And p_type[someIndex]="client". : No, I'm 99% sure there is not. it's possibly in code, by utilizing positions and FieldMaskingSpanQuery... http://lucene.apache.org/java/2_9_0/api/all/org/apache/lucene/search/spans/FieldMaskingSpan

Re: Autocommit not happening

2010-07-23 Thread Jay Luker
For the sake of any future googlers I'll report my own clueless but thankfully brief struggle with autocommit. There are two parts to the story: Part One is where I realize my config was not contained within my . In Part Two I realized I had typed "" rather than "". --jay On Fri, Jul 23, 2010 a

Re: Performance issues when querying on large documents

2010-07-23 Thread Alexey Serba
Do you use highlighting? ( http://wiki.apache.org/solr/HighlightingParameters ) Try to disable it and compare performance. On Fri, Jul 23, 2010 at 10:52 PM, ahammad wrote: > > Hello, > > I have an index with lots of different types of documents. One of those > types basically contains extracts o

Re: commit is taking very very long time

2010-07-23 Thread Mark Miller
On 7/23/10 5:59 PM, Alexey Serba wrote: > Another option is to set optimize=false in DIH call ( it's true by > default ). Ouch - that should really be changed then. - Mark

Re: 2 solr dataImport requests on a single core at the same time

2010-07-23 Thread Alexey Serba
> having multiple Request Handlers will not degrade the performance IMO you shouldn't worry unless you have hundreds of them

Re: commit is taking very very long time

2010-07-23 Thread Alexey Serba
> I am not sure why some commits take very long time. Hmm... Because it merges index segments... How large is your index? > Also is there a way to reduce the time it takes? You can disable commit in DIH call and use autoCommit instead. It's kind of hack because you postpone commit operation and ma

Re: help with a schema design problem

2010-07-23 Thread Geert-Jan Brits
Multiple rows in the OPs example are combined to form 1 solr-document (e.g: row 1 and 2 both have documentid=1) Because of this combine, it would match p_value from row1 with p_type from row2 (or vice versa) 2010/7/23 Nagelberg, Kallin > > > > When i search > > > > p_value:"Pramod" AND p_type:"

RE: Novice seeking help to change filters to search without diacritics

2010-07-23 Thread HSingh
Hi Steve, This is extremely helpful! What is the best way to also preserve/append the diacritics in the index in case someone searches using them? I deeply appreciate your help! -- View this message in context: http://lucene.472066.n3.nabble.com/Novice-seeking-help-to-change-filters-to-search

Re: filter query on timestamp slowing query???

2010-07-23 Thread Geert-Jan Brits
just wanted to mention a possible other route, which might be entirely hypothetical :-) *If* you could query on internal docid (I'm not sure that it's available out-of-the-box, or if you can at all) your original problem, quoted below, could imo be simplified to asking for the last docid inserted

RE: help with a schema design problem

2010-07-23 Thread Nagelberg, Kallin
> > > When i search > > > p_value:"Pramod" AND p_type:"Supplier" > > > > > > it would give me result as document 1. Which is incorrect, since in > > > document > > > 1 Pramod is a Client and not a Supplier. Would it? I would expect it to give you nothing. -Kal -Original Message- From:

Re: help with a schema design problem

2010-07-23 Thread Geert-Jan Brits
> Is there any way in solr to say p_value[someIndex]="pramod" And p_type[someIndex]="client". No, I'm 99% sure there is not. > One way would be to define a single field in the schema as p_value_type = "client pramod" i.e. combine the value from both the field and store it in a single field. yep, f

RE: filter query on timestamp slowing query???

2010-07-23 Thread Chris Hostetter
: On top of using trie dates, you might consider separating the timestamp : portion and the type portion of the fq into seperate fq parameters -- : that will allow them to to be stored in the filter cache seperately. So : for instance, if you include "type:x OR type:y" in queries a lot, but : w

Scoring Search for autocomplete

2010-07-23 Thread Frank A
Hi, I have an autocomplete that is currently working with an NGramTokenizer so if I search for "Yo" both "New York" and "Toyota" are valid results. However I'm trying to figure out how to best implement the search so that from a score perspective if the string matches the beginning of an entire fi

Re: Sort by index order desc?

2010-07-23 Thread Ryan McKinley
Looks like you can sort by _docid_ to get things in index order or reverse index order. ?sort=_docid_ asc thank you solr! On Fri, Jul 23, 2010 at 2:23 PM, Ryan McKinley wrote: > Any pointers on how to sort by reverse index order? > http://search.lucidimagination.com/search/document/4a59ded3966

Re: Securing Solr 1.4 in a glassfish container AS NEW THREAD

2010-07-23 Thread Sharp, Jonathan
Are you using the same instance of CommonsHttpSolrServer for all the requests? I was. I also tried creating a new instance every x requests, also resetting the credentials on the new instances, to see if it would make a difference. Doing that, I get an exception after several instances of

Re: a bug of solr distributed search

2010-07-23 Thread Yonik Seeley
On Fri, Jul 23, 2010 at 2:40 PM, MitchK wrote: > That only works if the docs are exactly the same - they may not be. > Ahm, what? Why? If the uniqueID is the same, the docs *should* be the same, > don't they? Documents aren't supposed to be duplicated across shards... so the presence of multiple

Re: help with a schema design problem

2010-07-23 Thread Pramod Goyal
In my case the document id is the unique key( each row is not a unique document ) . So a single document has multiple Party Value and Party Type. Hence i need to define both Party value and Party type as mutli-valued. Is there any way in solr to say p_value[someIndex]="pramod" And p_type[someIn

Performance issues when querying on large documents

2010-07-23 Thread ahammad
Hello, I have an index with lots of different types of documents. One of those types basically contains extracts of PDF docs. Some of those PDFs can have 1000+ pages, so there would be a lot of stuff to search through. I am experiencing really terrible performance when querying. My whole index h

Re: help with a schema design problem

2010-07-23 Thread Geert-Jan Brits
With the usecase you specified it should work to just index each "Row" as you described in your initial post to be a seperate document. This way p_value and p_type all get singlevalued and you get a correct combination of p_value and p_type. However, this may not go so well with other use-cases yo

Re: a bug of solr distributed search

2010-07-23 Thread MitchK
That only works if the docs are exactly the same - they may not be. Ahm, what? Why? If the uniqueID is the same, the docs *should* be the same, don't they? -- View this message in context: http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990563.html Sent from the So

Re: a bug of solr distributed search

2010-07-23 Thread MitchK
... Additionally to my previous posting: To keep this sync we could do two things: Waiting for every server to make sure that everyone uses the same values to compute the score and than apply them. Or: Let's say that we collect the new values every 15 minutes. To merge and send them over the netwo

Re: Autocommit not happening

2010-07-23 Thread John DeRosa
On Jul 23, 2010, at 9:37 AM, John DeRosa wrote: > Hi! I'm a Solr newbie, and I don't understand why autocommits aren't > happening in my Solr installation. > [snip] "Never mind"... I have discovered my boneheaded mistake. It's so silly, I wish I could retract my question from the archives.

Re: a bug of solr distributed search

2010-07-23 Thread Yonik Seeley
On Fri, Jul 23, 2010 at 2:23 PM, MitchK wrote: > why do we do not send the output of TermsComponent of every node in the > cluster to a Hadoop instance? > Since TermsComponent does the map-part of the map-reduce concept, Hadoop > only needs to reduce the stuff. Maybe we even do not need Hadoop for

Re: help with a schema design problem

2010-07-23 Thread Pramod Goyal
I want to do that. But if i understand correctly in solr it would store the field like this: p_value: "Pramod" "Raj" p_type: "Client" "Supplier" When i search p_value:"Pramod" AND p_type:"Supplier" it would give me result as document 1. Which is incorrect, since in document 1 Pramod is a Clien

Re: a bug of solr distributed search

2010-07-23 Thread MitchK
Yonik, why do we do not send the output of TermsComponent of every node in the cluster to a Hadoop instance? Since TermsComponent does the map-part of the map-reduce concept, Hadoop only needs to reduce the stuff. Maybe we even do not need Hadoop for this. After reducing, every node in the cluste

Sort by index order desc?

2010-07-23 Thread Ryan McKinley
Any pointers on how to sort by reverse index order? http://search.lucidimagination.com/search/document/4a59ded3966271ca/sort_by_index_order_desc it seems like it should be easy to do with the function query stuff, but i'm not sure what to sort by (unless I add a new field for indexed time) Any p

RE: help with a schema design problem

2010-07-23 Thread Nagelberg, Kallin
I think you just want something like: p_value:"Pramod" AND p_type:"Supplier" no? -Kallin Nagelberg -Original Message- From: Pramod Goyal [mailto:pramod.go...@gmail.com] Sent: Friday, July 23, 2010 2:17 PM To: solr-user@lucene.apache.org Subject: help with a schema design problem Hi, L

help with a schema design problem

2010-07-23 Thread Pramod Goyal
Hi, Lets say i have table with 3 columns document id Party Value and Party Type. In this table i have 3 rows. 1st row Document id: 1 Party Value: Pramod Party Type: Client. 2nd row: Document id: 1 Party Value: Raj Party Type: Supplier. 3rd row Document id:2 Party Value: Pramod Party Type: Supplier

Re: Replacing text fields with numeric fields for speed

2010-07-23 Thread Gora Mohanty
On Fri, 23 Jul 2010 14:33:54 +0200 Peter Karich wrote: > Gora, > > just for my interests: > does apache bench sends different queries, or from the logs, or > always the same query? > If it would be always the same query the cache of solr will come > and make the response time super small. Yes,

RE: Spellcheck help

2010-07-23 Thread Dyer, James
In org.apache.solr.spelling.SpellingQueryConverter, find the line (#84): final static String PATTERN = "(?:(?!(" + NMTOKEN + ":|\\d+)))[\\p{L}_\\-0-9]+"; and remove the |\\d+ to make it: final static String PATTERN = "(?:(?!" + NMTOKEN + ":))[\\p{L}_\\-0-9]+"; My testing shows this solves your

RE: Novice seeking help to change filters to search without diacritics

2010-07-23 Thread Steven A Rowe
Hi HSingh, Maybe the mapping file I attached to https://issues.apache.org/jira/browse/SOLR-2013 will help? Steve > -Original Message- > From: HSingh [mailto:hsin...@gmail.com] > Sent: Thursday, July 22, 2010 11:30 PM > To: solr-user@lucene.apache.org > Subject: Re: Novice seeking help t

RE: filter query on timestamp slowing query???

2010-07-23 Thread Jonathan Rochkind
> and a typical query would be: > fl=id,type,timestamp,score&start=0&q="Coca+Cola"+pepsi+-"dr+pepper"&fq=timestamp:[2010-07-07T00:00:00Z+TO+NOW]+AND+(type:x+OR+type:y)& > rows=2000 On top of using trie dates, you might consider separating the timestamp portion and the type portion of the fq into

Allow custom overrides

2010-07-23 Thread Charlie Jackson
I need to implement a search engine that will allow users to override pieces of data and then search against or view that data. For example, a doc that has the following values: DocId FulltextMeta1 Meta2 Meta3 1 The quick brown fox foofoo foo

Re: Autocommit not happening

2010-07-23 Thread John DeRosa
On Jul 23, 2010, at 9:37 AM, John DeRosa wrote: > Hi! I'm a Solr newbie, and I don't understand why autocommits aren't > happening in my Solr installation. > > My one server running Solr: > > - Ubuntu 10.04 (Lucid Lynx), with all the latest updates. > - Solr 1.4.0 running on Tomcat6 > - Install

Re: filter query on timestamp slowing query???

2010-07-23 Thread oferiko
I'm in the process of indexing my demi data to test that, I'll have more valid data on whether or not it made the differeve In a few days Thanks ב-23/07/2010, בשעה 19:42, "Jonathan Rochkind [via Lucene]" < ml-node+990234-2085494904-316...@n3.nabble.com> כתב/ה: > and a typical query would be: >

RE: filter query on timestamp slowing query???

2010-07-23 Thread Jonathan Rochkind
> and a typical query would be: > fl=id,type,timestamp,score&start=0&q="Coca+Cola"+pepsi+-"dr+pepper"&fq=timestamp:[2010-07-07T00:00:00Z+TO+NOW]+AND+(type:x+OR+type:y)& > rows=2000 My understanding is that this is essentially what the solr 1.4 trie date fields are made for, I'd use them, should s

Autocommit not happening

2010-07-23 Thread John DeRosa
Hi! I'm a Solr newbie, and I don't understand why autocommits aren't happening in my Solr installation. My one server running Solr: - Ubuntu 10.04 (Lucid Lynx), with all the latest updates. - Solr 1.4.0 running on Tomcat6 - Installation was done via "apt-get install solr-common solr-tomcat tomc

Re: Solr on iPad?

2010-07-23 Thread Stephan Schwab
Thanks Mark! I'm subscribing to the cocoa-dev list. On Jul 23, 2010, at 10:17 AM, Mark Allan [via Lucene] wrote: > Hi Stephan, > > On the iPad, as with the iPhone, I'm afraid you're stuck with using > SQLite if you want any form of database in your app. > > I suppose if you wanted to get

Re: Duplicates

2010-07-23 Thread Pavel Minchenkov
I mean two usecases. I can't index folders only because I have another queries on files. Or I have to do another index that contains only folders, but then I have to take care of synchronizing folders in two indexes. Does range, spatial, etc quiries are supported on multivalued fields? 2010/7/23 P

Re: Tree Faceting in Solr 1.4

2010-07-23 Thread Eric Grobler
Hi Erik, I must be doing something wrong :-( I took: svn co https://svn.apache.org/repos/asf/lucene/dev/trunk mytest then i copied SOLR-792.path to folder /mytest/solr then i ran: patch -p1 < SOLR-792.patch but I get "can't find file to patch at input line 5" Is this the correct trunk and pa

Re: Tree Faceting in Solr 1.4

2010-07-23 Thread Geert-Jan Brits
>If I am doing >facet=on & facet.field={!ex=State}State & fq={!tag=State}State:Karnataka >All it gives me is Facets on state excluding only that filter query.. But i >was not able to do same on third level ..Like facet.field= Give me the >counts of cities also in state Karantaka.. >Let me know s

solrj occasional timeout on commit

2010-07-23 Thread Nagelberg, Kallin
Hey, I recently moved a solr app from a testing environment into a production environment, and I'm seeing a brand new error which never occurred during testing. I'm seeing this in the solrJ-based app logs: org.apache.solr.common.SolrException: com.caucho.vfs.SocketTimeoutException: client tim

Re: Solr 3.1 dev

2010-07-23 Thread Yonik Seeley
On Fri, Jul 23, 2010 at 9:33 AM, robert mena wrote: > Hi, > is there any wiki/url of the proposed changes or new features that we should > expect with this new release? You can see what has already gone in by looking at the appropriate CHANGES.txt in subversion. http://svn.apache.org/viewvc/luce

Re: Tree Faceting in Solr 1.4

2010-07-23 Thread Eric Grobler
Hi Erik, Thanks for the fast update :-) I will try it soon. Regards Eric On Fri, Jul 23, 2010 at 2:37 PM, Erik Hatcher wrote: > I've update the SOLR-792 patch to apply to trunk (using the solr/ directory > as the root still, not the higher-level trunk/). > > This one I think is an important one

Re: Tree Faceting in Solr 1.4

2010-07-23 Thread Erik Hatcher
I've update the SOLR-792 patch to apply to trunk (using the solr/ directory as the root still, not the higher-level trunk/). This one I think is an important one that I'd love to see eventually part of Solr built-in, but the TODO's in TreeFacetComponent ought to be taken care of first, to g

Re: Solr 3.1 dev

2010-07-23 Thread robert mena
Hi, is there any wiki/url of the proposed changes or new features that we should expect with this new release? On Fri, Jul 23, 2010 at 9:20 AM, Yonik Seeley wrote: > On Fri, Jul 23, 2010 at 6:09 AM, Eric Grobler > wrote: > > I have a few questions :-) > > > > a) Will the next release of solr be

Re: Solr 3.1 dev

2010-07-23 Thread Yonik Seeley
On Fri, Jul 23, 2010 at 6:09 AM, Eric Grobler wrote: > I have a few questions :-) > > a) Will the next release of solr be 3.0 (instead of 1.5)? The next release will be 3.1 (matching the next lucene version off of the 3x branch). Trunk is 4.0-dev > b) How stable/mature is the current 3x version?

Re: Replacing text fields with numeric fields for speed

2010-07-23 Thread Peter Karich
Gora, just for my interests: does apache bench sends different queries, or from the logs, or always the same query? If it would be always the same query the cache of solr will come and make the response time super small. I would like to find a tool or script where I can send my logfile to solr an

Re: Duplicates

2010-07-23 Thread Peter Karich
Pavel, hopefully I understand now your usecase :-) but one question: > I need to select always *one* file per folder or > select *only* folders than contains matched files (without files). What do you mean here with 'or'? Do you have 2 usecases or would one of them be sufficient? Because the se

Re: filter query on timestamp slowing query???

2010-07-23 Thread oferiko
I don't specify any sort order, and i do request for the score, so it is ordered based on that. My schema consists of these fields: (changing now to tdate) and a typical query would be: fl=id,type,timestamp,score&start=0&q="Coca+Cola"+pepsi+-"dr+pepper"&fq=timestamp:[2010-07-07T00:00:00Z+T

Re: Replacing text fields with numeric fields for speed

2010-07-23 Thread Gora Mohanty
On Fri, 23 Jul 2010 14:44:32 +0530 Gora Mohanty wrote: [...] > From some experiments, I see only a small difference between a > text search on a field, and a numeric search on the corresponding > numeric field. [...] Well, I take that back. Running more rigorous tests with Apache Bench shows a

Re: Delta import processing duration

2010-07-23 Thread Qwerky
I found my problem! It was a bad custom EntityProcessor I wrote. My EntityProcessor wasn't checking for hasNext() on the Iterator from my FileImportDataImportHandler, it was just returning next(). The second bug was that when the Iterator ran out of records it was returning an empty Map (it now r

Solr 3.1 dev

2010-07-23 Thread Eric Grobler
Hi Everyone I have a few questions :-) a) Will the next release of solr be 3.0 (instead of 1.5)? b) How stable/mature is the current 3x version? c) Is LocalSolr implemented? where can I find a list of new features? d) Is this the correct method to download the lasted stable version? svn co htt

Re: Duplicates

2010-07-23 Thread Pavel Minchenkov
Thanks, Peter! I'll try collapsing today. Example (sorry if table unformated): id | type | prop_1 | | prop_N | folderId 0 | folder | | | | 1 | file | val1 | | valN1 | 0 2 | file | val3 | |

Re: Solr on iPad?

2010-07-23 Thread Chantal Ackermann
Hi, unfortunately for iPad developers, it seems that it is not possible to use the Spotlight engine through the SDK: http://stackoverflow.com/questions/3133678/spotlight-search-in-the-application Chantal On Fri, 2010-07-23 at 10:16 +0200, Mark Allan wrote: > Hi Stephan, > > On the iPad, as wit

Problem with Pdf, Sol 1.4.1 Cell

2010-07-23 Thread Alessandro Benedetti
Hi all, as I saw in this discussion [1] there were many issues with PDF indexing in Solr 1.4 due to TIka library (0.4 Version). In Solr 1.4.1 the tika library is the same so I guess the issues are the same. Could anyone, who contributed to the previous thread, help me in resolving these issues? I

Replacing text fields with numeric fields for speed

2010-07-23 Thread Gora Mohanty
Hi, One of the things that we were thinking of doing in order to speed up results from Solr search is to convert fixed-text fields (such as values from a drop-down) into numeric fields. The thinking behind this was that searching through numeric values would be faster than searching through text

Re: Duplicates

2010-07-23 Thread Peter Karich
Hi Pavel! The patch can be applied to 1.4. The performance is ok, but for some situations it could be worse than without the patch. For us it works good, but others reported some exceptions (see the patch site: https://issues.apache.org/jira/browse/SOLR-236) > I need only to delete duplicates Co

Re: Solr on iPad?

2010-07-23 Thread Mark Allan
Hi Stephan, On the iPad, as with the iPhone, I'm afraid you're stuck with using SQLite if you want any form of database in your app. I suppose if you wanted to get really ambitious and had a lot of time on your hands you could use Xcode to try and compile one of the open- source C-based DB

Re: Duplicates

2010-07-23 Thread Pavel Minchenkov
Thanks. Does it work with Solr 1.4 (Solr 4.0 mentioned in article)? What about performance? I need only to delete duplicates (I don't need cout of duplicates or select certain duplicate). 2010/7/23 Peter Karich > Another possibility could be the well known 'field collapse' ;-) > > http://wiki.a

Re: Duplicates

2010-07-23 Thread Peter Karich
Another possibility could be the well known 'field collapse' ;-) http://wiki.apache.org/solr/FieldCollapsing Regards, Peter. > Thanks. > > If I set uniqueKey on the field, then I can save duplicates? > I need to remove duplicates only from search results. The ability to save > duplicates are sho

Re: Tree Faceting in Solr 1.4

2010-07-23 Thread Eric Grobler
Thanks I saw the article, As far as I can tell the trunk archives only go back to the middle of March and the 2 patches are from the beginning of the year. Thus: *These approaches can be tried out easily using a single set of sample data and the Solr example application (assumes current trunk cod

Re: Getting FileNotFoundException with repl command=backup?

2010-07-23 Thread Alexander Rothenberg
Thanks for the info Peter, i think i ran into the same isssue some time ago and could not find out why the backup stopped and also got deleted by solr. I decided to stop current running updates to solr while backup is running and wrote an own backuphandler that simply just copies the index-file

Re: Duplicates

2010-07-23 Thread Pavel Minchenkov
Thanks. If I set uniqueKey on the field, then I can save duplicates? I need to remove duplicates only from search results. The ability to save duplicates are should be. 2010/7/23 Erick Erickson > If the field is a single token, just define the uniqueKey on it in your > schema. > > Otherwise, th