Re: Solr searching performance issues, using large documents

2010-08-05 Thread Lance Norskog
s.  Any > hints? :-)  Thanks! > > -Peter > > On Aug 2, 2010, at 9:01 PM, Lance Norskog wrote: > >> Spanning won't work- you would have to make overlapping mini-documents >> if you want to support this. >> >> I don't know how big the chunks should

Re: Support loading queries from external files in QuerySenderListener

2010-08-05 Thread Lance Norskog
committed so it is not available in any release. > > -- > Regards, > Shalin Shekhar Mangar. > -- Lance Norskog goks...@gmail.com

Re: Sharing index files between multiple JVMs and replication

2010-08-05 Thread Lance Norskog
t; commmented out, all /update* requestHandlers removed, mainIndex locktype of > none, etc. > > And with Solr replication enabled, the Slave seems to hang, or at least report > unusually long time estimates for the current running replication process to > complete. > > > -K

Re: analysis tool vs. reality

2010-08-04 Thread Lance Norskog
010, at 4:43 PM, Justin Lolofie wrote: > >      Hello, > >      I have found the analysis tool in the admin page to be very useful in >      understanding my schema. I've made changes to my schema so that a >      particular case I'm looking at matches properly. I restarted solr, >      deleted the document from the index, and added it again. But still, >      when I do a query, the document does not get returned in the results. > >      Does anyone have any tips for debugging this sort of issue? What is >      different between what I see in analysis tool and new documents added >      to the index? > >      Thanks, >      Justin > -- Lance Norskog goks...@gmail.com

Re: No "group by"? looking for an alternative.

2010-08-04 Thread Lance Norskog
t; Mickael. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/No-group-by-looking-for-an-alternative-tp1022738p1022738.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com

Re: Sharing index files between multiple JVMs and replication

2010-08-03 Thread Lance Norskog
I can fire off a RELOAD script > for > each of my read-only cores. > > -Kelly > > > > > -- Lance Norskog goks...@gmail.com

Re: analysis tool vs. reality

2010-08-03 Thread Lance Norskog
#x27;ve made changes to my schema so that a >    particular case I'm looking at matches properly. I restarted solr, >    deleted the document from the index, and added it again. But still, >    when I do a query, the document does not get returned in the results. > >  

Re: Solr searching performance issues, using large documents

2010-08-02 Thread Lance Norskog
e spanned separate document chunks? > > Also, what would the optimal size of chunks be? > > Thanks! > > > -Peter > > On Aug 1, 2010, at 7:21 PM, Lance Norskog wrote: > >> Not that I know of. >> >> The DataImportHandler has the ability to create multiple

Re: Multiple solr servers Vs Katta

2010-08-02 Thread Lance Norskog
s each of size around 30-35 GB. All of it is on one > machine and i want to make it searchable. > I can have about 5 solr servers each with 2-3 indexes merged and search on > different shards or use katta. > Please let me know which is the better option. > > Thanks, > karthik

Re: Indexing data on MSSQL failed: Caused by: org.apache.solr.common.SolrException: Error loading class 'com.micros oft.sqlserver.jdbc.SQLServerDriver'

2010-08-02 Thread Lance Norskog
pache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSour > ce.java:128) >        at > org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcD > ataSource.java:363) >        at > org.apache.solr.handler.dataimport.JdbcDataSource.access$300(JdbcData > Source.java:39) >        at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.< > init>(JdbcDataSource.java:240) >        ... 11 more > Aug 2, 2010 11:29:25 PM org.apache.solr.update.DirectUpdateHandler2 rollback > INFO: start rollback > Aug 2, 2010 11:29:25 PM org.apache.solr.update.DirectUpdateHandler2 rollback > INFO: end_rollback > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Indexing-data-on-MSSQL-failed-Caused-by-org-apache-solr-common-SolrException-Error-loading-class-com-tp1015137p1017327.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com

Re: Solr searching performance issues, using large documents

2010-08-01 Thread Lance Norskog
or software that implements structured documents. On Sun, Aug 1, 2010 at 2:06 PM, Peter Spam wrote: > Thanks for the pointer, Lance!  Is there an example of this somewhere? > > > -Peter > > On Jul 31, 2010, at 3:13 PM, Lance Norskog wrote: > >> Ah! You're not just high

Re: Solr searching performance issues, using large documents

2010-07-31 Thread Lance Norskog
t;>>>> >>>>> body >>>>> >>>>> >>>>> --------- >>>>> >>>>> solrconfig.xml changes: >>>>> >>>>>  2147483647 >>>>>  128 >>>>> >>>>> - >>>>> >>>>> The query: >>>>> >>>>> rowStr = "&rows=10" >>>>> facet = >>>>> "&facet=true&facet.limit=10&facet.field=device&facet.field=ckey&facet.field=version" >>>>> fields = "&fl=id,score,filename,version,device,first2md5,filesize,ckey" >>>>> termvectors = "&tv=true&qt=tvrh&tv.all=true" >>>>> hl = "&hl=true&hl.fl=body&hl.snippets=1&hl.fragsize=400" >>>>> regexv = "(?m)^.*\n.*\n.*$" >>>>> hl_regex = "&hl.regex.pattern=" + CGI::escape(regexv) + >>>>> "&hl.regex.slop=1&hl.fragmenter=regex&hl.regex.maxAnalyzedChars=2147483647&hl.maxAnalyzedChars=2147483647" >>>>> justq = '&q=' + CGI::escape('body:' + fuzzy + p['q'].to_s.gsub(/\\/, >>>>> '').gsub(/([:~!<>="])/,'\1') + fuzzy + minLogSizeStr) >>>>> >>>>> thequery = '/solr/select?timeAllowed=5000&wt=ruby' + (p['fq'].empty? ? '' >>>>> : ('&fq='+p['fq'].to_s) ) + justq + rowStr + facet + fields + termvectors >>>>> + hl + hl_regex >>>>> >>>>> baseurl = '/cgi-bin/search.rb?q=' + CGI::escape(p['q'].to_s) + '&rows=' + >>>>> p['rows'].to_s + '&minLogSize=' + p['minLogSize'].to_s >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> http://karussell.wordpress.com/ >>>> >>> >> > > -- Lance Norskog goks...@gmail.com

Re: Solr searching performance issues, using large documents

2010-07-30 Thread Lance Norskog
t;> termvectors = "&tv=true&qt=tvrh&tv.all=true" >> hl = "&hl=true&hl.fl=body&hl.snippets=1&hl.fragsize=400" >> regexv = "(?m)^.*\n.*\n.*$" >> hl_regex = "&hl.regex.pattern=" + CGI::escape(regexv) + >> "&hl.regex.slop=1&hl.fragmenter=regex&hl.regex.maxAnalyzedChars=2147483647&hl.maxAnalyzedChars=2147483647" >> justq = '&q=' + CGI::escape('body:' + fuzzy + p['q'].to_s.gsub(/\\/, >> '').gsub(/([:~!<>="])/,'\1') + fuzzy + minLogSizeStr) >> >> thequery = '/solr/select?timeAllowed=5000&wt=ruby' + (p['fq'].empty? ? '' : >> ('&fq='+p['fq'].to_s) ) + justq + rowStr + facet + fields + termvectors + hl >> + hl_regex >> >> baseurl = '/cgi-bin/search.rb?q=' + CGI::escape(p['q'].to_s) + '&rows=' + >> p['rows'].to_s + '&minLogSize=' + p['minLogSize'].to_s >> >> >> > > > -- > http://karussell.wordpress.com/ > > -- Lance Norskog goks...@gmail.com

Re: DIH : SQL query (sub-entity) is executed although variable is not set (null or empty list)

2010-07-27 Thread Lance Norskog
ich triggers the dataSource.getData() call. > I have overridden the initContext() method setting a go/no go flag that > I am using in the overridden nextRow() to find out whether to delegate > to the superclass or not. > > This way I can also avoid the code that fills the tmp field with an > empty value if there is no value to query on. > > Cheers, > Chantal > > -- Lance Norskog goks...@gmail.com

Re: Indexing Problem: Where's my data?

2010-07-27 Thread Lance Norskog
y for you I suppose. But you may > want to either remove 'name=' or make it match the schema. (and I may be > completely wrong on this, it's been a while since I got DIH going). > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Indexing-Problem-Where-s-my-data-tp1000660p1000843.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com

Re: slave index is bigger than master index

2010-07-27 Thread Lance Norskog
lave nodes, so I >> dont want to effect the live search, while playing with slave nodes' >> indices. >> > > What do you mean here? Optimizing is too CPU expensive? > >> We will be running the indexing on master node today over the night. Lets >> see if it does it again. >> > > Do you mean increase to double size? > -- Lance Norskog goks...@gmail.com

Re: Solr 3.1 and ExtractingRequestHandler resulting in blank content

2010-07-27 Thread Lance Norskog
   inStock:true >     >     >      cat >      manu_exact >      price:[* TO 500] >      price:[500 TO *] >     >   >   > >    textSpell > >     >      default >      name >      ./spellchecker >     >   >   >     >      false >      false >      1 >     >     >      spellcheck >     >   > >   class="org.apache.solr.handler.component.TermVectorComponent"/> >   class="org.apache.solr.handler.component.SearchHandler"> >     >      true >     >     >      tvComponent >     >   >      name="clusteringComponent" >    enable="${solr.clustering.enabled:false}" >    class="org.apache.solr.handler.clustering.ClusteringComponent" > >     >       >      default >       name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm >      20 >     >     >      stc >       name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm >     >   >                    enable="${solr.clustering.enabled:false}" >                  class="solr.SearchHandler"> >     >       true >       default >       true >       >       name >       id >       >       features >       >       true >       >       >       >       false >     >     >      clusteringComponent >     >   > >   class="org.apache.solr.handler.extraction.ExtractingRequestHandler" > startup="lazy"> >     >      text >      true >      ignored_ > >       >      true >      links >      ignored_ >     >   > > >   class="org.apache.solr.handler.component.TermsComponent"/> > >   class="org.apache.solr.handler.component.SearchHandler"> >     >      true >     >     >      termsComponent >     >   >   >     >    string >    elevate.xml >   > >   >   >     >      explicit >     >     >      elevator >     >   >   > > >   class="solr.BinaryUpdateRequestHandler" /> > >   class="solr.DocumentAnalysisRequestHandler" /> >   class="solr.FieldAnalysisRequestHandler" /> >   startup="lazy" /> >   class="org.apache.solr.handler.admin.AdminHandlers" /> >   >     >      standard >      solrpingquery >      all >     >   > >   >     >     explicit >     true >     >   >   >   >   default="true"> >     >     100 >     >   > >   >     >       >      70 >       >      0.5 >       >      [-\w ,/\n\"']{20,200} >     >   > >   default="true"> >     >     >     >     >   > >   class="org.apache.solr.highlight.SimpleFragListBuilder" default="true"/> > >   class="org.apache.solr.highlight.SingleFragListBuilder"/> > >   class="org.apache.solr.highlight.MultiColoredScoreOrderFragmentsBuilder" > default="true"/> >   >   >   >    5 >   >   >    solr >   > > > > Test1.txt document: > Asdf > Asdf > Asdf > Adsf > > Upload command: > curl > "http://localhost:8080/solr/update/extract?literal.id=123&uprefix=attr_&fmap.content=attr_content&commit=true"; > -F "myfi...@test1.txt” > > RESULTS from an id:[* TO *] query: > > − > > 0 > 91 > − > > > *,score > on > 0 > id:[* TO *] > > standard > standard > > 10 > 2.2 > > > − > > − > > 1.0 > − > >         > > − > > text/plain > > − > > test1.txt > > − > > 24 > > − > > myfile > > − > > text/plain > > 123 > > > > > Note that the attr_content section of the response is blank.  Any help & > hints would be GREATLY appreciated…=) > > Best, > Dave > -- Lance Norskog goks...@gmail.com

Re: Tika, Solr running under Tomcat 6 on Debian

2010-07-27 Thread Lance Norskog
va:289) >        at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) > Caused by: java.lang.ClassNotFoundException: > org.apache.solr.handler.extraction.ExtractingRequestHandler >        at java.net.URLClassLoader$1.run(URLClassLoader.java:202) >        at java.security.

Re: Using hl.regex.pattern to print complete lines

2010-07-21 Thread Lance Norskog
> >>>> Peter: i haven't looked at the code, but i expect that the problem is that >>>> the java regex engine isn't being used in a way that makes ^ and $ match >>>> any line boundary -- they are probably only matching the start/end of the >>>> field (and . is probably only matching non-newline characters) >>>> >>>> java regexes support embedded flags (ie: "(?xyz)your regex") so you might >>>> try that (i don't remember what the correct modifier flag is for the >>>> multiline mode off the top of my head) >>>> >>>> -Hoss >>>> >>> >> > > -- Lance Norskog goks...@gmail.com

Re: Count hits per document?

2010-07-21 Thread Lance Norskog
rch for "foo", I get back a list of documents.  Any way to get a > per-document hit count?  Thanks! > > > -Pete > -- Lance Norskog goks...@gmail.com

Re: Dismax query response field number

2010-07-21 Thread Lance Norskog
nt^1.1 title^1.5 >     >     >        text^0.2 content^1.1 title^1.5 >     >     >        recip(price,1,1000,1000)^0.3 >     >     >        2<-1 5<-2 6<90% >     >     100 >     *:* >     >     text features name >     >     0 >     >     name >     regex >     >   > > > -- Lance Norskog goks...@gmail.com

Re: setting up schema (newbie question)

2010-07-20 Thread Lance Norskog
estions on how to proceed?  My first thought is that I should set up > two SOLR instances, one for indexing only attributes, and one for the > documents themselves. > > Thanks in advance for any help. > > cheers, > > Travis > -- Lance Norskog goks...@gmail.com

Re: indexing best practices

2010-07-20 Thread Lance Norskog
and 2.5 million largish (20 to 30 fields, a couple html text > fields) that get updated monthly. It currently takes about 20 hours to do a > full import. I would like to cut that down as much as possible. > Thanks, > Ken > -- > View this message in context: > http://lucene.472066.n3.nabble.com/indexing-best-practices-tp973274p976313.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com

Re: Spatial filtering

2010-07-19 Thread Lance Norskog
t; *:* >                    [meas] => hsin >                    [pt] => 48.85341,2.3488 >                    [bf] => >                    [qt] => standard >                    [fq] => +object_type:Concert +date:[2010-07-19T00:00:00Z > TO 2011-07-19T23:59:59Z] +

Re: SOLR 1.4.1 - Issue with recognition of solr.solr.home system property

2010-07-18 Thread Lance Norskog
ur dataDir config says to use > ${solr.data.dir}.  you can modify your solrconfig.xml to make (almost) > *anything* configurable at runtime... > > http://wiki.apache.org/solr/SolrConfigXml#System_property_substitution > > > -Hoss > > -- Lance Norskog goks...@gmail.com

Re: Spellcheck help

2010-07-18 Thread Lance Norskog
terms that present this behavior, but it is important for > me to get rid of this bug. So can i use the dictionnary AND the list built > by the spellchecker? > > -Original Message- From: Lance Norskog > Sent: Sunday, July 18, 2010 1:42 AM > To: solr-user@lucene.apache.org &

Re: HTTP ERROR: 500 - java.lang.ArrayIndexOutOfBoundsException

2010-07-17 Thread Lance Norskog
t; >>     >>     >>     >>     >>     >> ** >> * >> >> > > Tokenized field is one of multiValued type fields since > multiple tokens (values) are generated in that field by > tokenizer. > > Koji > > -- > http://www.rondhuit.com/en/ > > -- Lance Norskog goks...@gmail.com

Re: Spellcheck help

2010-07-17 Thread Lance Norskog
classic, whitespace tokenizer, with > lowercase filter...Any help would be greatly appreciated :)Thanks,Marc > _ > Messenger arrive enfin sur iPhone ! Venez le télécharger gratuitement ! > http://www.messengersurvotremobile.com/?d=iPhone > -- Lance Norskog goks...@gmail.com

Re: HTTP ERROR: 500 - java.lang.ArrayIndexOutOfBoundsException

2010-07-16 Thread Lance Norskog
s I add "sort=first+desc" parameter to the select clause, it throws > ArrayIndexOutOfBound exception. Please suggest if I am missing anything. > > http://localhost:8983/solr/select?q=girish&start=0&indent=on&wt=json&sort=first+desc > > I have close to 1 million records indexed. > > Thanks > Girish > > > -- Lance Norskog goks...@gmail.com

Re: SOLR Search Query : Exception : Software caused connection abort

2010-07-16 Thread Lance Norskog
> http://lucene.472066.n3.nabble.com/SOLR-Search-Query-Exception-Software-caused-connection-abort-tp969444p969444.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com

Re: limiting the total number of documents matched

2010-07-16 Thread Lance Norskog
vancy documents will be interspersed with very low >>> relevancy documents. I'd like to set a limit to the 1000 most relevant >>> documents, then sort those by title. >>> >>> Is there a way to do this? >>> >>> I guess I could always retrieve the top 1000 documents and sort them >>> in the client, but that seems particularly inefficient. I can't find >>> any other way to do this, though. >>> >>> Thanks, >>> Paul >>> >> > -- Lance Norskog goks...@gmail.com

Re: indexing rich documents

2010-07-16 Thread Lance Norskog
; here i attach u my solrconfig , tika config, schema files... if der r any > wrong tell me > -- Lance Norskog goks...@gmail.com

Re: Supplementing already indexed data

2010-07-12 Thread Lance Norskog
d metadata.  I know this can be done but I just can't seem to > dig up the correct docs, can anyone point me in the right direction? > > > Thanks. > -- Lance Norskog goks...@gmail.com

Re: indexing with pdf files problem

2010-07-12 Thread Lance Norskog
ctingRequestHandler.java:76) >    at > org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244) >    ... 16 more > Caused by: java.lang.NullPointerException >    at > org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.ja

Re: Realtime + Batch indexing

2010-07-08 Thread Lance Norskog
I am planning to implement the 2nd approach as we need to make > changes to the UI code if we are going for shards. > > Thanks, > BB > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Realtime-Batch-indexing-tp952293p953410.html > Sent from

Re: Filter multivalue fields from search result

2010-07-08 Thread Lance Norskog
--+++ > > so the query for q=name:Microsoft town:Leeds returns docs 1 & 3. > > How would I remove London/Glasgow from doc 1 and Birmingham from doc 3? > > Or is it that I should create separate doc for each name-event? > > Thanks, > Alex > -- Lance Norskog goks...@gmail.com

Re: Realtime + Batch indexing

2010-07-08 Thread Lance Norskog
http://lucene.472066.n3.nabble.com/Realtime-Batch-indexing-tp952293p952293.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com

Re: DIH batch job

2010-07-08 Thread Lance Norskog
lr Cell module? > > > >  It would be great if you could provide guidance. > > > > Thanks, > > Sanjeev Kakar > > > > -- Lance Norskog goks...@gmail.com

Re: year range field, proper data type?

2010-07-07 Thread Lance Norskog
of some type, to >> efficiently accomodate the range querries. >> >> It seems to me that I probably don't need/want an actual date field, since >> the data isn't complex to demand it, it's just a four-digit year. >> >> So that pretty much leaves storing as a trie integer, or as a trie string. >>   Any advice on which is probably better in this case? Or on how to set up >> the trie field for this kind of data? Thanks for any, >> >> Jonathan >> > -- Lance Norskog goks...@gmail.com

Re: Handling Updates

2010-07-07 Thread Lance Norskog
e delta import but this looks like another bulk process, > I'm looking for more of a semi-realtime way to ask for a specific item to be > reindexed.  Should I move away from the db query approach for that? > > Thanks. > -- Lance Norskog goks...@gmail.com

Re: Per-user results sets

2010-07-07 Thread Lance Norskog
I am wondering if Solr/Lucene can help improve my existing search engine. > > I would like to have different results for each user - but still have > relevant results. Each user would have different score multipliers for > each searchable item. > > Is this something possible? > &g

Re: index format error because disk full

2010-07-07 Thread Lance Norskog
andler it may handle the rollback for > you...again, however, it may not deal with disk full situations gracefully > either. > -- > View this message in context: > http://lucene.472066.n3.nabble.com/index-format-error-because-disk-full-tp948249p948968.html > Sent from the Solr -

Re: ClassCastException SOLR

2010-07-07 Thread Lance Norskog
> which is > the same file I get an exception. > > > I'm using the same dependencies as SOLR 1.4.1, because it caused problems > with newer versions of lucene-core. > > Is there some step in plugging in new Filters I forgot? > > I am grateful for any suggestions or advice. > > Thank you, > > Martin > > -- Lance Norskog goks...@gmail.com

Re: document level security: indexing/searching techniques

2010-07-06 Thread Lance Norskog
a fairly well-bounded list of "terms" for an OR query against the >> "acl-groups" field in each file/project document. Just don't forget to set >> the boost to 0 for that portion of the query :) >> >> -- Ken >> >> >> Ken Krugler >> +1 530-210-6378 >> http://bixolabs.com >> e l a s t i c   w e b   m i n i n g >> >> >> >> >> > -- Lance Norskog goks...@gmail.com

Re: general debugging techniques?

2010-07-06 Thread Lance Norskog
at 1:14 PM, Jim Blomo wrote: > On Sat, Jul 3, 2010 at 1:10 PM, Lance Norskog wrote: >> You don't need to optimize, only commit. > > OK, thanks for the tip, Lance.  I thought the "too many open files" > problem was because I wasn't optimizing/merging frequently e

Re: general debugging techniques?

2010-07-03 Thread Lance Norskog
pache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120) >>         at >> org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1306) >>         at >> org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1570) >>         at >> org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1579) >>         at >> org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1559) >>         at >> java.lang.Thread.run(Thread.java:619) >> Jul 3, 2010 1:32:20 AM >> org.apache.solr.update.processor.LogUpdateProcessor finish >> > -- Lance Norskog goks...@gmail.com

Re: Disk usage per-field

2010-07-03 Thread Lance Norskog
t; -- > "Good Enough" is not good enough. > To give anything less than your best is to sacrifice the gift. > Quality First. Measure Twice. Cut Once. > http://www.israelekpo.com/ > -- Lance Norskog goks...@gmail.com

Re: upload PDF using curl

2010-07-03 Thread Lance Norskog
you it can't find a > command named "commit" to execute -- that's a dead give away that > something about how you are running curl in your shell doesn't like the > quotes you used arround the URL -- it's splitting on the "&" character. > > i don't have a windows box to test on, but perhaps single quotes will work > better? > > > > > -Hoss > > -- Lance Norskog goks...@gmail.com

Re: REST calls

2010-07-03 Thread Lance Norskog
: solr-user@lucene.apache.org >> Subject: Re: REST calls >> >> Solr's APIs are described as "REST-like", and probably do qualify as >> "restful" the way the term is commonly used. >> >> I'm personally much more interested in making our APIs more powerful >> and easier to use, regardless of any REST purity tests. >> >> -Yonik >> http://www.lucidimagination.com > > -- Lance Norskog goks...@gmail.com

Re: REST calls

2010-06-30 Thread Lance Norskog
I've looked at the problem. It's fairly involved. It probably would take several iterations. (But not as many as field collapsing :) On Wed, Jun 30, 2010 at 2:11 PM, Yonik Seeley wrote: > On Wed, Jun 30, 2010 at 4:55 PM, Lance Norskog wrote: >>  Apparently this is not ReStFuL

Re: OOM on uninvert field request

2010-06-30 Thread Lance Norskog
edField faceting, the fieldType won't matter much at > all for the space it takes up. > > The key here is that it looks like the number of unique terms in these > fields is low - you would probably do much better with > facet.method=enum (which iterates over terms rather than documents). > > -Yonik > http://www.lucidimagination.com > -- Lance Norskog goks...@gmail.com

Re: Multiple Solr servers and a shared index vs master+slaves

2010-06-30 Thread Lance Norskog
gt; store those snapshots, so we'd be pulling it over the wire only to write it > right next to the original index.  If we didn't have these HA clustering > mechanisms available already, then I'm sure I'd be much more willing to look > at a Solr master+slave architecture.  But since we do, it seems like I'm a > little bit hamstrung to use Solr's mechanisms anyway.  So, that's my > scenario, comments welcome.  :) > >  -dKt > > > > -- Lance Norskog goks...@gmail.com

Re: REST calls

2010-06-30 Thread Lance Norskog
how efficient and yet simple > SOLR's (and Lucene's) query and response language (incl. response > formats) is. Some things seem complex/difficult at first (like dismax or > function queries) but turn out to be simple/easy to use considering the > complexity of the problems they solve. > > Chantal > > -- Lance Norskog goks...@gmail.com

Re: Unbuffered Exception while setting permissions

2010-06-30 Thread Lance Norskog
e.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487) >>>> at >>>> org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114) >>>> at >>>> org.apache.commons.httpclient.Ht

Re: Cache hits exposed by API

2010-06-29 Thread Lance Norskog
com/Cache-hits-exposed-by-API-tp930602p930696.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com

Re: Very basic questions: Indexing text - working, but slow!

2010-06-29 Thread Lance Norskog
ible >> >> until you force the SOLR reader to reopen. >> >> >> >> HTH >> >> Erick >> >> >> >> On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam wrote: >> >> >> >>> On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote: >> >>> >> >>>>> 1) I can get my docs in the index, but when I search, it >> >>>>> returns the entire document.  I'd love to have it only >> >>>>> return the line (or two) around the search term. >> >>>> >> >>>> Solr can generate Google-like snippets as you describe. >> >>>> http://wiki.apache.org/solr/HighlightingParameters >> >>> >> >>> Here's how I commit my documents: >> >>> >> >>> J=0; >> >>> for i in `find . -name \*.txt`; do >> >>>      (( J++ )) >> >>>      curl "http://localhost:8983/solr/update/extract?literal.id=doc$J"; >> >>> -F "myfi...@$i"; >> >>> done; >> >>> >> >>> echo "- Committing" >> >>> curl "http://localhost:8983/solr/update/extract?commit=true"; >> >>> >> >>> >> >>> Then, I try to query using >> >>> >> http://localhost:8983/solr/select?rows=10&start=0&fl=*,score&hl=true&q=testing >> >>> but I only get back the document ID rather than the snippet: >> >>> >> >>> >> >>> 0.05030759 >> >>> >> >>> text/plain >> >>> >> >>> doc16 >> >>> >> >>> >> >>> I'm using the schema.xml from the "lucid imagination: Indexing text and >> >>> html files" tutorial. >> >>> >> >>> >> >>> >> >>> -Pete >> >>> >> > >> >> > -- Lance Norskog goks...@gmail.com

Re: Faceted search outofmemory

2010-06-29 Thread Lance Norskog
this but I could not find the answer. >> How can we know the required memory when facets are used so that I try to >> scale my server/index correctly to handle it. >> >> Thanks >> >> Olivier >> > -- Lance Norskog goks...@gmail.com

Re: unknown handler dataimport

2010-06-29 Thread Lance Norskog
t; > Jun 28, 2010 8:52:32 PM org.apache.solr.handler.dataimport.DataImporter > loadDataConfig > > INFO: Data Configuration loaded successfully > > > > > > When I go to > http://localhost:8983/solr/admin/dataimport.jsp?handler=/dataimport, on the > right side, I see t

Re: one to many denormalization approach

2010-06-29 Thread Lance Norskog
> no of years. I was thinking I could use a dynamic field, *_skill, and > possibly add them like so: > > 1_skill: Ruby, 2_skill: Java > > But how can I index the years experience? would I then add a dynamic field > like: > > 1_skill_years: 5, 2_skill_years: 9 > > > How would i fit these into the index? > Any help greatly appreciated? > > Regards > -- Lance Norskog goks...@gmail.com

Re: REST calls

2010-06-29 Thread Lance Norskog
is used. > > Am I doing something wrong or is Solr not truly completely RESTful? > > thanks, > > > Jason > -- Lance Norskog goks...@gmail.com

Re: OOM on uninvert field request

2010-06-29 Thread Lance Norskog
at > org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:839) >        at > org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:250) >        at > org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:283) >        at > org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:166) > -- Lance Norskog goks...@gmail.com

Re: DIH and dynamicField

2010-06-23 Thread Lance Norskog
: > > solrconfig.xml > > > data-config.xml > > > Hope this helps. > > - Robert Zotter > -- > View this message in context: > http://lucene.472066.n3.nabble.com/DIH-and-dynamicField-tp917823p918189.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com

Re: Setting up Eclipse with merged Lucene Solr source tree

2010-06-23 Thread Lance Norskog
boolean) is undefined for the type >> DocumentBuilderFactory DataImporter.java >> >> /Solr/contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport >> line >> The method setXIncludeAware(boolean) is undefined for the type Object >> TestXIncludeConfig.java /Solr/src/test/org/apache/solr/core line 32 >> >> Is this the correct way to setup eclipse after the source tree merge? >> >> Thanks in advance >> Ukyo >> > -- Lance Norskog goks...@gmail.com

Re: Field missing when use distributed search + dismax

2010-06-22 Thread Lance Norskog
he result only have "ID". The field "type" > disappeared. I need that "type" to know what the "ID" refer to. Why solr > "eat" my "type"? > > > Thanks. > Regards. > Scott > -- Lance Norskog goks...@gmail.com

Re: OOM on sorting on dynamic fields

2010-06-22 Thread Lance Norskog
No, this is basic to how Lucene works. You will need larger EC2 instances. On Mon, Jun 21, 2010 at 2:08 AM, Matteo Fiandesio wrote: > Compiling solr with lucene 2.9.3 instead of 2.9.1 will solve this issue? > Regards, > Matteo > > On 19 June 2010 02:28, Lance Norskog wrote

Re: Mr Lance : customize the search algorithm of solr

2010-06-22 Thread Lance Norskog
for your > assistance in this work of mine. > > If any part of this mail was not clear to you then plz lemme know, i will > expain that you. > > Regards > > -sarfaraz > > -- Lance Norskog goks...@gmail.com

Re: Indexing HTML files in SOLR

2010-06-19 Thread Lance Norskog
s. > It will be great if u answer my question : > Is there any better approach to achieve the same functionality ? > > Regards, > Siddharth > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Indexing-HTML-files-in-SOLR-tp896530p902644.html >

Re: federated / meta search

2010-06-19 Thread Lance Norskog
search :: http://search-lucene.com/ > > > > - Original Message >> From: Lance Norskog >> To: solr-user@lucene.apache.org >> Sent: Fri, June 18, 2010 8:16:46 PM >> Subject: Re: federated / meta search >> >> Yes, you can do this. You need to have

Re: SolrQuery and escaping special characters

2010-06-18 Thread Lance Norskog
id doesn't do "query parser > escaping" ... mainly because it has no way of knowing which query parser > you are using. > > > -Hoss > > -- Lance Norskog goks...@gmail.com

Re: solr indexing takes a long time and is not reponsive to abort command

2010-06-18 Thread Lance Norskog
t to abort the process doesn’t really work. Does > anyone know what’s happening here? Thanks! > > Wen > -- Lance Norskog goks...@gmail.com

Re: customize the search algorithm of solr

2010-06-18 Thread Lance Norskog
and still allow me to use all > the rest of the features of solr. > > > -- Lance Norskog goks...@gmail.com

Re: OOM on sorting on dynamic fields

2010-06-18 Thread Lance Norskog
before starting a new > development, we want to be sure that we are not doing anything wrong > in the solr configuration or in the index generation. > > Any help would be appreciated. > Regards, > Matteo > -- Lance Norskog goks...@gmail.com

Re: federated / meta search

2010-06-18 Thread Lance Norskog
index. With two indexes from two sources, the terms in the documents will not have the same "fingerprint". Relevance scores from one shard will not match the meaning of a document's score in the other shard. There is a project to make this work in Solr, but it is not nearly finished.

Re: MappingCharFilterFactory equivalent for use after tokenizer?

2010-06-18 Thread Lance Norskog
t; Is there a token filter which do the same job as >>> MappingCharFilterFactory but after tokenizer, reading the >>> same config file? >> >> No, closest thing can be PatternReplaceFilterFactory. >> >> http://lucene.apache.org/solr/api/org/apache/solr/analysis/PatternReplaceFilterFactory.html >> >> >> > > -- Lance Norskog goks...@gmail.com

Re: Indexing HTML files in SOLR

2010-06-16 Thread Lance Norskog
n3.nabble.com/Indexing-HTML-files-in-SOLR-tp896530p896530.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com

Re: dealing with dash chars in fields when using dismax

2010-06-13 Thread Lance Norskog
ning how it works in google and are > starting to just try it out when they are doing searches. > > What I might end up doing though is not escape dashes only in specific cases: > foo-bar (escape) > foo - bar (escape) > foo -bar (not escape, aka probihit bar) > > This should enable power users and should rarely hit non power users. > > regards, > Lukas Kahwe Smith > m...@pooteeweet.org > > > > -- Lance Norskog goks...@gmail.com

Re: Solr DataConfig / DIH Question

2010-06-12 Thread Lance Norskog
Is there a cleaner / better way of handling these type of relationships?   > I've also tried to specify a default in the Solr schema, but that seems to > only work after all the data is indexed which makes sense but surprised me > initially.  BTW, thanks for the great DIH tutorial on the wiki! > > Thanks! > Charles > -- Lance Norskog goks...@gmail.com

Re: MoreLikeThis and dynamicField

2010-06-12 Thread Lance Norskog
:06 PM, Peter Karich wrote: >> >>> Hi, >>> >>> it seems to me that the MoreLikeThis component doesn't work for dynamic >>> fields. Is that correct? >>> And it also doesn't work for fields which are indexed but not stored, >>> right? e.g. 'text' where dynamic fields could be copied to. >>> >>> Or did I create an incorrect example? >>> >>> Regards, >>> Peter. >>> >>> -- >>> http://karussell.wordpress.com/ >>> >>> >>> >> >> >> > > > -- > http://karussell.wordpress.com/ > > -- Lance Norskog goks...@gmail.com

Re: Request log does not show QTime

2010-06-11 Thread Lance Norskog
app=/solr path=/select/ params={...} hits=4587 status=0 QTime=19 > > > I have read a lot of the documentation on Solr logging and SLF4J, but could > not figure it out from those. > -- Lance Norskog goks...@gmail.com

Re: Solr Delta index questions

2010-06-11 Thread Lance Norskog
rohibited. If you receive this e-mail in error, please notify the sender > by phone or > email immediately and delete it! > > *** > > -- Lance Norskog goks...@gmail.com

Re: Solr spellcheck config

2010-06-11 Thread Lance Norskog
llChecker* > > >  default >  text >  ./spellchecker > > > And I want to specify the dynamic field "*_text" as the field option: > > multiValued="true" indexed="true"> > > How it can be done? > > Thanks, Bogdan > > -- > Bogdan Gusiev. > agre...@gmail.com > -- Lance Norskog goks...@gmail.com

Re: Index search optimization for fulltext remote streaming

2010-06-11 Thread Lance Norskog
threads). Can you suggest me if I > can optimize the process changing any of these configurations? > > with regards, > Danyal Mark > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Index-search-optimization-for-fulltext-remote-streaming-tp828274p881809.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com

Re: ranking question

2010-06-11 Thread Lance Norskog
ing solr1.4 and it seems it does not support sort by function. > > How can this be achieved > > I tried using >  q=(query)^w0 (_val_:field1)^w1 (_val_:field2...)^w2 > > it adds more computations using querynorms > > any suggestions, ideas > > thanks in anticipation > umar > -- Lance Norskog goks...@gmail.com

Re: MoreLikeThis and dynamicField

2010-06-11 Thread Lance Norskog
re dynamic fields could be copied to. > > Or did I create an incorrect example? > > Regards, > Peter. > > -- > http://karussell.wordpress.com/ > > -- Lance Norskog goks...@gmail.com

Re: Indexing HTML

2010-06-10 Thread Lance Norskog
I use the > HTMLStripCharFilterFactory? > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Indexing-HTML-tp884497p885797.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com

Re: Indexing HTML

2010-06-09 Thread Lance Norskog
te  html? Thanks > > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Indexing-HTML-tp884497p884497.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com

Re: Diagnosing solr timeout

2010-06-09 Thread Lance Norskog
? Is there another way >> to get that info? >> >> Also, I was suspecting GC myself. So, if it is the problem, what do I >> do about it? It seems like increasing RAM might make the problem worse >> because it would wait longer to GC, then it would have more to do. > > -- Lance Norskog goks...@gmail.com

Re: Need help with document format

2010-06-09 Thread Lance Norskog
ti valued fields so if I do it the way Israel >>> suggested it, I will have tons of records. Do you think it will be >>> better if I did this instead ? >>> >>> >>> >>>  123 >>>  tony >>>  marjo >>>  Google_StartDate_EndDate >&

Re: Index-time vs. search-time boosting performance

2010-06-09 Thread Lance Norskog
gt;> But I think the most important for this date-influenced use case is: >> >> >> >> "Indexing time boosts are preprocessed for storage efficiency and >> written >> >> to >> >> the directory (when writing the document) in a single byte (!)" >> >> >> >> If you do this as an index-time boost, your boosts will lose lots of >> >> precision for this reason. >> >> >> >> -- >> >> Robert Muir >> >> rcm...@gmail.com >> >> >> > >> > >> > >> > -- >> > Asif Rahman >> > Lead Engineer - NewsCred >> > a...@newscred.com >> > http://platform.newscred.com >> > >> >> >> -- >> Lance Norskog >> goks...@gmail.com >> > > > > -- > Asif Rahman > Lead Engineer - NewsCred > a...@newscred.com > http://platform.newscred.com > -- Lance Norskog goks...@gmail.com

Re: Faceted Search Slows Down as index gets larger

2010-06-09 Thread Lance Norskog
facet.enum in this regard). >> >> One strategy is to use distributed search... have some big >> cores that >> don't change often, and then small cores for the new stuff >> that >> changes rapidly. >> >> -Yonik >> http://www.lucidimagination.com >> > > > > -- Lance Norskog goks...@gmail.com

Re: How Solr Manages Connected Database Updates

2010-06-09 Thread Lance Norskog
matically solr should grab those changes and perform Index updation > ? > > > Do I need to Write a Cron Job kind of stuff ? Or Use Data Import Handler ? > (Several ways could be ?) > > Is there any one who can provide his comments or share his experience If > some one gon

Re: general debugging techniques?

2010-06-09 Thread Lance Norskog
en't found a way to move up DEBUG level in > either solr or tomcat.  I was hopeful debug statements would point to > where extraction/indexing hangs were occurring.  I will keep poking > around, thanks for the tips. > > Jim > -- Lance Norskog goks...@gmail.com

Re: IndexSchema from CommonsHttpSolrServer

2010-06-08 Thread Lance Norskog
HttpSolrServer. (where u don't know the location of schema.xml and >> solrconfig.xml or those files are in some other machine.) >> >> I tried looking for solrJ api's for the same. but coudn't find it. >> >> or is there any way to retrieve schema file from the index using solrj? >> schema.xml file can be retrieved from >> http://localhost:8983/solr/CoreX/admin/file/?file=schema.xml >> >> Any Pointers??? >> >> Regards, >> Raakhi >> >> > > -- Lance Norskog goks...@gmail.com

Re: solrj Unicode queries don't return results

2010-06-08 Thread Lance Norskog
method of SolrServer. >> >> public QueryResponse query(SolrParams params, METHOD method) >> >> >> >> > -- Lance Norskog goks...@gmail.com

Re: Indexing link targets in HTML fragments

2010-06-06 Thread Lance Norskog
/or too heavyweight > -- but please correct me if I'm wrong. > > Maybe something using regular expressions? Does anyone have a code snippet > they could share? > > Many thanks, > > Andrew. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Indexing-link-targets-in-HTML-fragments-tp874547p874547.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com

Re: Index-time vs. search-time boosting performance

2010-06-06 Thread Lance Norskog
d written >> to >> the directory (when writing the document) in a single byte (!)" >> >> If you do this as an index-time boost, your boosts will lose lots of >> precision for this reason. >> >> -- >> Robert Muir >> rcm...@gmail.com >> > > > > -- > Asif Rahman > Lead Engineer - NewsCred > a...@newscred.com > http://platform.newscred.com > -- Lance Norskog goks...@gmail.com

Re: Need help with document format

2010-06-06 Thread Lance Norskog
est of accomplishing this? >> >> I was thinking of formatting the document like this >> >> >>Bear Stearns >> 2000-01-01 >> present >> >> >>AIG >> 1999-01-01 >> 2000-01-01 >> >> >> Is this possible? >> >> Thanks, >> >> Moazzam >> > > > > -- > "Good Enough" is not good enough. > To give anything less than your best is to sacrifice the gift. > Quality First. Measure Twice. Cut Once. > http://www.israelekpo.com/ > -- Lance Norskog goks...@gmail.com

Re: Array of arguments in URL?

2010-06-02 Thread Lance Norskog
of specifing an array of strings in HTTP params: if the > param supports multiple values, then you can specify multiple values just > be  repeating hte key... > >  q=foo&fq=firstValue&fq=secondValue&fq=thirdValue > > ...this results in a SolrParams instance where the "

Re: Importing large datasets

2010-06-02 Thread Lance Norskog
ne/database the import process exploded to over 24 hours. >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p865324.html >> Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com

Array of arguments in URL?

2010-06-01 Thread Lance Norskog
In the "/spell" declaration in the example solrconfig.xml, we find these lines among the default parameters: spellcheck How does one supply such an array of strings in HTTP parameters? Does Solr have a parsing option for this? -- Lance Norskog goks...@gmail.com

<    3   4   5   6   7   8   9   10   11   12   >