Re: Using CSV for indexing ... Remote Streaming disabled

2009-04-16 Thread vivek sar
Any help on this? Could this error be because of something else (not remote streaming issue)? Thanks. On Wed, Apr 15, 2009 at 10:04 AM, vivek sar wrote: > Hi, > >  I'm trying using CSV (Solr 1.4, 03/29) for indexing following wiki > (http://wiki.apache.org/solr/UpdateCSV). I've updated the > sol

Invalid_Date_String on posting XML to the index

2009-04-16 Thread Mark Allan
Hi all, I'm encountering a problem when I try to add records with a date field to the index. The records I'm adding have very little date precision, usually MMDD but some only have year and month, others only have a year. I'm trying to get around this by using a text pattern factory

Re: Invalid_Date_String on posting XML to the index

2009-04-16 Thread Shalin Shekhar Mangar
On Thu, Apr 16, 2009 at 1:20 PM, Mark Allan wrote: > > My thinking is that Solr is trying to add the field directly as '1953' > before doing the text factory stuff and is therefore not in the right format > for indexing. Does that sound like a reasonable assumption and am I missing > something w

Re: Invalid_Date_String on posting XML to the index

2009-04-16 Thread Mark Allan
On 16 Apr 2009, at 9:00 am, Shalin Shekhar Mangar wrote: On Thu, Apr 16, 2009 at 1:20 PM, Mark Allan wrote: My thinking is that Solr is trying to add the field directly as '1953' before doing the text factory stuff and is therefore not in the right format for indexing. Does that sound

Re: Invalid_Date_String on posting XML to the index

2009-04-16 Thread Shalin Shekhar Mangar
On Thu, Apr 16, 2009 at 1:45 PM, Mark Allan wrote: > > Hi, thanks for your prompt reply. I'm a bit confused though - the only way > to do this is a two-step process? > > I have to write code to munge the XML into another document which is > exactly the same except for the format of the Date fiel

OutofMemory on Highlightling

2009-04-16 Thread Gargate, Siddharth
Hi, I am analyzing the memory usage for my Solr setup. I am testing with 500 text documents of 2 MB each. I have defined a field for displaying the teasers and storing 1 MB of text in it. I am testing with just 128 MB maxHeap(I know I should be increasing it but just testing the

Re: Invalid_Date_String on posting XML to the index

2009-04-16 Thread Shalin Shekhar Mangar
On Thu, Apr 16, 2009 at 1:45 PM, Mark Allan wrote: > > Hi, thanks for your prompt reply. I'm a bit confused though - the only way > to do this is a two-step process? > > I have to write code to munge the XML into another document which is > exactly the same except for the format of the Date fiel

Re: using multisearcher

2009-04-16 Thread Brent Palmer
Thanks Hoss. I haven't had time to try it yet, but that is exactly the kind of help I was looking for. Brent Chris Hostetter wrote: : As for the second part, I was thinking of trying to replace the standard : SolrIndexSearcher with one that employs a MultiSearcher. But I'm not very : famili

Stored Document encoding

2009-04-16 Thread AlexxelA
I'm using the DataImportHandler and my database is in "latin1". When i retreive documents that i have indexed in solr they seem to have been converted in utf-8. Is it normal ? Is it possible to store in latin1 in solr ? -- View this message in context: http://www.nabble.com/Stored-Document-enc

DataImport, remove doc when marked as deleted

2009-04-16 Thread Ruben Chadien
Hi I am new to Solr, but have been using Lucene for a while. I am trying to rewrite some old lucene indexing code using the Jdbc DataImport i Solr, my problem: I have Entities that can be marked in the db as "deleted", these i don't want to index and thats no problem when doing a full-imp

Re: solr 1.3 + tomcat 5.5

2009-04-16 Thread andrysha nihuhoid
No there is no such file there. How can i configure more detailed error reporting for this message? 2009/4/15 Shalin Shekhar Mangar : > From the log it seems like there is a solr.xml inside > var/lib/tomcat5/webapps/ which tomcat is trying deploy and failing. Very > strange. You should remove that

Re: OutofMemory on Highlightling

2009-04-16 Thread Otis Gospodnetic
Hi, Have you tried: http://wiki.apache.org/solr/HighlightingParameters#head-2ca22f63cb8d1b2ba3ff0cfc05e85b94898c59cf Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: "Gargate, Siddharth" > To: solr-user@lucene.apache.org > Sent: Thursday

Re: Using CSV for indexing ... Remote Streaming disabled

2009-04-16 Thread Otis Gospodnetic
Hi, Are you absolutely sure you are changing the correct config file? What is the 20090414_1 part in your URL? The name of the core? Be sure to change ITS config (you can get to it from Solr Admin page) and to restart Solr. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

Re: truncating indexed docs

2009-04-16 Thread Otis Gospodnetic
Hi, No, you typically truncate (i.e. index first N terms) them while indexing using maxFieldLength setting in solrconfig.xml. You can, however, limit how many characters (or bytes?) to copy when using copyField functionality. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

Re: Question on StreamingUpdateSolrServer

2009-04-16 Thread Otis Gospodnetic
Hi, Lots of little things to look at here. You should do lsof as root, and it looks like you aren't doing that. You should double-check Tomcat's maxThreads param in server.xml. You should give Jetty a try. I don't think you said anything about looking at the container's or solr logs and finding

Re: httpclient.ProtocolException using Solrj

2009-04-16 Thread Otis Gospodnetic
I don't think you gain anything on the Solr end of things by using multiple threads if you are already using StreamingUpdateSolrServer. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: vivek sar > To: solr-user@lucene.apache.org > Sent: Th

Re: Field Collapsing Patch

2009-04-16 Thread Otis Gospodnetic
I know of a company that used it, but then determined it was this component that was slowing down their search. They might have modified it some, too, I don't recall now. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Matthew Runo > T

Re: hardware requirements for solr

2009-04-16 Thread Otis Gospodnetic
Roman, This depends on multiple factors - amount of data, type of data/analysis, query rate and query complexity, etc. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Roman Dissertori > To: solr-user@lucene.apache.org > Sent: Wednesday,

Re: Solr Search Error

2009-04-16 Thread vivek sar
Hi, I'm using the Solr 1.4 (03/29 nightly build) and when searching on a large index (40G) I get the same exception as in this thread, HTTP Status 500 - 13724 java.lang.ArrayIndexOutOfBoundsException: 13724 at org.apache.lucene.search.TermScorer.score(TermScorer.java:74) at org.apache.lucene.s

Re: Question on StreamingUpdateSolrServer

2009-04-16 Thread Yonik Seeley
On Wed, Apr 15, 2009 at 7:28 PM, vivek sar wrote: > lsof at > this point usually shows at 1400, but my ulimit is much higher than > that. Could you be hitting a kernel limit? cat /proc/sys/fs/file-max cat /proc/sys/fs/file-nr http://www.netadmintools.com/art295.html -Yonik http://www.lucidimag

Authentication Error

2009-04-16 Thread Allahbaksh Asadullah
Hi,I have followed the procedure given on this blog to setup the solr Below is my code. I am trying to index the data but I am not able to connect to server and getting authentication error. HttpClient client=new HttpClient(); client.getState().setCredentials(new AuthScope("localhost", 80, AuthS

Re: DataImport, remove doc when marked as deleted

2009-04-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
did you try the deletedPkQuery? On Thu, Apr 16, 2009 at 7:49 PM, Ruben Chadien wrote: > Hi > > I am new to Solr, but have been using Lucene for a while. I am trying to > rewrite > some old lucene indexing code using the Jdbc DataImport i Solr, my problem: > > I have Entities that can be marked in

Re: Authentication Error

2009-04-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Thu, Apr 16, 2009 at 10:34 PM, Allahbaksh Asadullah wrote: > Hi,I have followed the procedure given on this blog to setup the solr > > Below is my code. I am trying to index the data but I am not able to connect > to server and getting authentication error. > > > HttpClient client=new HttpClien

Re: Authentication Error

2009-04-16 Thread Allahbaksh Asadullah
Thanks Noble.Regards, Allahbaksh 2009/4/16 Noble Paul നോബിള്‍ नोब्ळ् > On Thu, Apr 16, 2009 at 10:34 PM, Allahbaksh Asadullah > wrote: > > Hi,I have followed the procedure given on this blog to setup the solr > > > > Below is my code. I am trying to index the data but I am not able to > connect

Re: Stored Document encoding

2009-04-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
I guess strings are stored by lucene in utf-8 always. BTW As you pass the Object as a String the encoding is lost On Thu, Apr 16, 2009 at 7:37 PM, AlexxelA wrote: > > I'm using the DataImportHandler and my database is in "latin1".  When i > retreive documents that i have indexed in solr they seem

Re: What is QTime a measure of?

2009-04-16 Thread Otis Gospodnetic
Not sure if you got the answer - QTime represents the number of milliseconds it took Solr to execute a search. It does not include the time it takes to send back the response (that depends on its size, network speed...) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

Re: Phrase Query Issue

2009-04-16 Thread Otis Gospodnetic
Let me second this. People ask for this pretty often. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Erik Hatcher > To: solr-user@lucene.apache.org > Sent: Saturday, April 4, 2009 8:33:46 PM > Subject: Re: Phrase Query Issue > > > On

Re: Spelling Component

2009-04-16 Thread Otis Gospodnetic
Hi, It looks like your spellchecker index did get created (doesn't it get created automatically when Solr starts?), but it looks rather empty. :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Anoop Atre > To: "solr-user@lucene.apache.o

Garbage Collectors

2009-04-16 Thread David Baker
I have an issue with garbage collection on our solr servers. We have an issue where the old generation never gets cleaned up on one of our servers. This server has a little over 2 million records which are updated every hour or so. I have tried the parallel GC and the concurrent GC. The

Boosting by facets with standard query

2009-04-16 Thread ashokc
I have a query that yields results binned in several facets. How can I boost the results that fall in certain facets over the rest of them that do not belong to those facets? I use the standard query format. Thank you - ashok -- View this message in context: http://www.nabble.com/Boosting-by-fac

Re: Garbage Collectors

2009-04-16 Thread Otis Gospodnetic
Personally, I'd start from scratch: -Xmx -Xms... -server is not even needed any more. If you are not using Java 1.6, I suggest you do. Next, I'd try to investigate why objects are not being cleaned up - this should not be happening in the first place. Is Solr the only webapp running? Ot

Re: NPE creating EmbeddedSolrServer

2009-04-16 Thread Clay Fink
This worked great. Thanks! The only catch is you have to (eventually) call CoreContainer.shutdown(), otherwise the app just hangs. Alexandre Rafalovitch wrote: > > To reply to my own message. > > The following worked starting from scratch (example): > -

Re: Non-linear structure for search and index documents

2009-04-16 Thread Chris Hostetter
: I need index/search words extracted from pdf files with coordinates and page : number, so I have this structure: : :- index the document id :- a document has many pages :- a page has many words :- a word has geometry[w,h,x,y](inside of page) : : Is this possible with solr?

Re: Dictionary lookup possibilities

2009-04-16 Thread Chris Hostetter
: For instance, my dictionary holds the following terms: : 1 - a b c d : 2 - c d e : 3 - a b : 4 - a e f g h : : If I put the sentence [a b c d f g h] in as a query, I want to recieve : dictionary items 1 (matching all words a b c d) and 3 (matching words a b) : as matches this is a pretty hard

Seattle / PNW Hadoop + Lucene User Group?

2009-04-16 Thread Bradford Stephens
Greetings, Would anybody be willing to join a PNW Hadoop and/or Lucene User Group with me in the Seattle area? I can donate some facilities, etc. -- I also always have topics to speak about :) Cheers, Bradford

Re: special characters in Solr search query.

2009-04-16 Thread Chris Hostetter
: the special characters but the issue is while the document which I am : going to index contains any of these special characters it is throwing : query parse exception. Can anyone give pointer over this? Thanks in your question is kind of vauge ... for instance: it seems like you are saying

Re: Garbage Collectors

2009-04-16 Thread David Baker
Otis Gospodnetic wrote: Personally, I'd start from scratch: -Xmx -Xms... -server is not even needed any more. If you are not using Java 1.6, I suggest you do. Next, I'd try to investigate why objects are not being cleaned up - this should not be happening in the first place. Is Solr the

Advice on moving from 1.3 to 1.4-dev or trunk?

2009-04-16 Thread ristretto.rb
Hello, I'm using solr 1.3 with solr.py. We have a basic schema.xml, nothing custom or out of the ordinary. I need the following the feature from http://svn.apache.org/repos/asf/lucene/solr/trunk/CHANGES.txt SOLR-911: Add support for multi-select faceting by allowing filters to be tagged

Re: Solr posts xml

2009-04-16 Thread Chris Hostetter
: I installed Solr on tomcat 6 and whenever I click search it displays the xml : like I am editing it? : : is that normal? I'm afraid i don't really understand your question ... if you mean you get an XML formated response when you click the "Search" button on the admin screen, then yes -- tha

Re: Garbage Collectors

2009-04-16 Thread Bryan Talbot
If you're using java 5 or 6 jmap is a useful tool in tracking down memory leaks. http://java.sun.com/javase/6/docs/technotes/tools/share/jmap.html jmap -histo:live will print a histogram of all live objects in the heap. Start at the top and work your way down until you find something susp

Re: Boosting by facets with standard query

2009-04-16 Thread Shalin Shekhar Mangar
On Fri, Apr 17, 2009 at 1:03 AM, ashokc wrote: > > I have a query that yields results binned in several facets. How can I > boost > the results that fall in certain facets over the rest of them that do not > belong to those facets? I use the standard query format. Thank you I'm not sure what yo

Re: Dictionary lookup possibilities

2009-04-16 Thread Shalin Shekhar Mangar
On Fri, Apr 17, 2009 at 3:37 AM, Chris Hostetter wrote: > > this is a pretty hard problem in general ... in my mind i call it the > "longest matching sub-phrase" problem, but i have no idea if it has a real > name. > > the only solution i know of using Lucene is to construct a phrase query > for

Faceted Search

2009-04-16 Thread Sajith Weerakoon
Hi all, Can someone of you tell me how to implement a faceted search? Thanks, Regards, Sajith Vimukthi Weerakoon.

The facetd Search

2009-04-16 Thread Sajith Weerakoon
Hi all, I am developing a search tool and it uses solr as the key querying technique. At the moment I have got a very much stable version and I need to enhance the application by introducing a faceted search. I went through the documentation and did some modifications to my code. I could not get a

Re: The facetd Search

2009-04-16 Thread Shalin Shekhar Mangar
On Fri, Apr 17, 2009 at 9:58 AM, Sajith Weerakoon wrote: > Hi all, > > I am developing a search tool and it uses solr as the key querying > technique. At the moment I have got a very much stable version and I need > to > enhance the application by introducing a faceted search. I went through the >

Re: Boosting by facets with standard query

2009-04-16 Thread ashokc
What you indicated here is for a different purpose, is it not? I already do something similar with my 'q'. For example a sample query logged in 'catalina.out' looks like webapp=/search path=/select params={rows=15&start=0&q=(+(content:umts)+OR+(title:umts)^2+OR+(urltext:umts)^2)} when the search

RE: OutofMemory on Highlightling

2009-04-16 Thread Gargate, Siddharth
I tried hl.maxAnalyzedChars=500 but still the same issue. I get OOM for row size 20 only. -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Thursday, April 16, 2009 9:56 PM To: solr-user@lucene.apache.org Subject: Re: OutofMemory on Highlightling Hi,

Re: Boosting by facets with standard query

2009-04-16 Thread Shalin Shekhar Mangar
On Fri, Apr 17, 2009 at 11:32 AM, ashokc wrote: > > What we need is for the white_papers & pdfs to be boosted, but if and only > if such doucments are valid results to the search term in question. How > would I write my above 'q' to accomplish that? > Thanks for explaining in detail. Basically,

Re: DataImport, remove doc when marked as deleted

2009-04-16 Thread Ruben Chadien
I have now :-) Thanks , missed that in the Wiki. Ruben On Apr 16, 2009, at 7:10 PM, Noble Paul നോബിള്‍ नोब्ळ् wrote: did you try the deletedPkQuery? On Thu, Apr 16, 2009 at 7:49 PM, Ruben Chadien > wrote: Hi I am new to Solr, but have been using Lucene for a while. I am trying to rewri

CollapseFilter with the latest Solr in trunk

2009-04-16 Thread climbingrose
Hi all, Have any one try to use CollapseFilter with the latest version of Solr in trunk? However, it looks like Solr 1.4 doesn't allow calling setFilterList() and setFilter() on one instance of the QueryCommand. I modified the code in QueryCommand to allow this: public QueryCommand setFilterL

Fwd: Advice on moving from 1.3 to 1.4-dev or trunk?

2009-04-16 Thread ristretto.rb
I have built the trunk code as of Revision: 765826 and tried !tag=/!ex= which is what I need to work. And IT WORKS! That's great. Now, is it unwise to release 1.4 into production for this feature (based on my explanation below)? thanks gene -- Forwarded message -- From: ristr