Re: Get all results from a solr query

2010-09-17 Thread Chris Hostetter
: stores, just a portion of it. Currently, I need to get 16 records at : once, not just the 10 that show. So I have the rows set to "99" for : the testing phase, and I can increase it later. I just wanted to have : a better way of getting all the results that didn't require hard : coding a value

Re: No more trunk support for 2.9 indexes

2010-09-17 Thread Chris Hostetter
: Since Lucene 3.0.2 is 'out there', does this mean the format is nailed down, : and some sort of porting is possible? : Does anyone know of a tool that can read the entire contents of a Solr index : and (re)write it another? (as an indexing operation - eg 2.9 -> 3.0.x, so not : repl) 3.0.2 shoul

Re: Change what gets logged when service is disabled

2010-09-17 Thread Chris Hostetter
: I use the PingRequestHandler option that tells my load balancer whether a : machine is available. : : When the service is disabled, every one of those requests, which my load : balancer makes every five seconds, results in the following in the log: : : Sep 9, 2010 6:06:58 PM org.apache.solr.c

Re: Date faceting +1MONTH problem

2010-09-17 Thread Chris Hostetter
: Reindexing with a +1MILLI hack had occurred to me and I guess that's what : I'll do in the meantime; it just seemed like something that people must have : run into before! I suppose it depends on the granularity of your people have definitely run into it before, and most of them (that i know

Re: Using more than one name for a query field - aliases

2010-09-17 Thread Shawn Heisey
On 9/17/2010 7:22 PM, Chris Hostetter wrote: a) not really. assuming you have no problem modifying the indexing code in the way you want, and are primarily worried about searching from various clients, then the most straight forward approach is probably to use RewriteRules (or something equivi

Re: Extending org.apache.solr.hander.dataimport.Transformer

2010-09-17 Thread Chris Hostetter
: During the actual import - SOLR complains because its looking for method : with signature transformRow(Map row) It would be helpful if you could clarify what you mean by "compalins" Are you getting an error? a message in the logs? what exactly does it say? (please cut/paste and provide plen

Re: Using more than one name for a query field - aliases

2010-09-17 Thread Chris Hostetter
: I would like to drop ft_text and make each index shard 3GB smaller, but make : it so that any queries which use ft_text get automatically redirected to : catchall. Ultimately we will be replacing catchall with dismax and : eliminating it. After the switch to dismax is complete and catchall is

Re: Can i do relavence and sorting together?

2010-09-17 Thread Dennis Gearon
'slop' is an actual argument!?!? LOL! I thought you were just describing some ASPECT of the search process, not it's workings :-) Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.p

Re: custom sorting / help overriding FieldComparator

2010-09-17 Thread Chris Hostetter
Brad: 1) if you haven't already figured this out, i would suggest emailin the java-user mailing list. It's got a bigger collection of users who are familiar with the internals of the Lucnee-Java API (that's the level it seems like you are having difficulty at) 2) Maybe you mentioned your sor

Re: merge indexes from EmbeddedSolrServer

2010-09-17 Thread Chris Hostetter
: Is it possible to use mergeindexes action using EmbeddedSolrServer? : Thanks in advance I haven't tried it, but this should be the same as any other feature of the CoreAdminHandler -- construct an instance using your CoreContainer, and then execute the appropriate request directly. (you may

Re: Simple Filter Query (fq) Use Case Question

2010-09-17 Thread Dennis Gearon
Wow, that's a lot to learn. At some point, I need to really dig in, or find some pretty pictures, graphical aids. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Fri

Re: Searching solr with a two word query

2010-09-17 Thread Erick Erickson
I suspect that you're seeing the default query operator in action, as an OR. We could tell more if you posted the results of your query with &debugQuery=on Best Erick On Fri, Sep 17, 2010 at 3:58 PM, wrote: > For some reason, when I run a query that has only two words in it, I get > back repeat

Re: Can i do relavence and sorting together?

2010-09-17 Thread Lance Norskog
http://wiki.apache.org/solr/CommonQueryParameters?action=fullsearch&context=180&value=slop&fullsearch=Text On Fri, Sep 17, 2010 at 10:55 AM, Dennis Gearon wrote: > HOw does one 'vary the slop'? > > Dennis Gearon > > Signature Warning > > EARTH has a Right To Life, >  otherwise we

Re: Solr Highlighting Issue

2010-09-17 Thread Lance Norskog
The same as with other formats. You give it strings to drop in before and after the highlighted text. On Fri, Sep 17, 2010 at 9:48 AM, Dennis Gearon wrote: > How does highlighting work with JSON output? > > Dennis Gearon > > Signature Warning > > EARTH has a Right To Life, >  oth

Re: Search the mailinglist?

2010-09-17 Thread Lance Norskog
And http://www.lucidimagination.com/Search taptaptap calling Otis taptaptap On Fri, Sep 17, 2010 at 9:30 AM, alexander sulz wrote: >  Many thank yous to all of you :) > > Am 17.09.2010 17:24, schrieb Walter Underwood: >> >> Or, for a fascinating multi-dimensional UI to mailing list archives: >>

Re: Indexing PDF - literal field already there & many "null"'s in text field

2010-09-17 Thread Lance Norskog
Tika is not perfect. Very much not perfect. I've seen a 10-15% failure rate on randomly sampled files. It works for creating searchable text fields, but not for text fields to return. That is, the anlyzers rip out the nulls and make an intelligible stream of words. If you want to save these words

Re: Get all results from a solr query

2010-09-17 Thread Lance Norskog
Look up _docid_ on the Solr wiki. It lets you walk the entire index about as fast as possible. On Fri, Sep 17, 2010 at 8:47 AM, Christopher Gross wrote: > Thanks for being so helpful!  You really helped me to answer my > question!  You aren't condescending at all! > > I'm not using it to pull dow

Re: Index partitioned/ Full indexing by MSSQL or MySQL

2010-09-17 Thread Lance Norskog
An essential problem is that Solr does not let you update just one field. When an ad changes from active to inactive, you have to reindex the whole document. If you have large documents (large text fields for example) this is a big pain. On Fri, Sep 17, 2010 at 5:37 AM, kenf_nc wrote: > > You don

Re: Tuning Solr caches with high commit rates (NRT)

2010-09-17 Thread Peter Sturge
Solr 4.x has new NRT stuff included (uses latest Lucene 3.x, includes per-segment faceting etc.). The Solr 3.x branch doesn't currently.. On Fri, Sep 17, 2010 at 8:06 PM, Andy wrote: > Does Solr use Lucene NRT? > > --- On Fri, 9/17/10, Erick Erickson wrote: > >> From: Erick Erickson >> Subject

Re: getting a list of top page-ranked webpages

2010-09-17 Thread Dennis Gearon
That's pretty good stuff to know, thanks everybody. For my application, it's pretty hard to do crawling and universally assign desired fields from the text returned. However, I would WELCOME someone with that expertise into the company when it gets funded, to prove me wrong :-) Dennis Gearon

Re: into

2010-09-17 Thread Yonik Seeley
On Fri, Sep 17, 2010 at 4:12 PM, facholi wrote: > > Hi, > > I would like a json result like that: > > { >   id:2342, >   name:"Abracadabra", >   metadatas: [ >      {type:"tag", name:"tutorial"}, >      {type:"value", name:"2323.434/434"}, >   ] > } Do you mean JSON with the tags not quoted (that

into

2010-09-17 Thread facholi
Hi, I would like a json result like that: { id:2342, name:"Abracadabra", metadatas: [ {type:"tag", name:"tutorial"}, {type:"value", name:"2323.434/434"}, ] } It's possible? -- View this message in context: http://lucene.472066.n3.nabble.com/doc-into-doc-tp1518090p15180

Importing SlashDot Data

2010-09-17 Thread Adam Estrada
All, I have a new Windows 7 machine and have been trying to import an RSS feed like in the SlashDot example that is included in the software. My dataConfig file looks fine. http://rss.slashdot.org/Slashdot/slashdot"; processor="XPathEnti

Searching solr with a two word query

2010-09-17 Thread noel
For some reason, when I run a query that has only two words in it, I get back repeating results of the last word. If I were to search for something like "good tonight", I'll get results like: good tonight tonight good tonight tonight tonight tonight tonight tonight Basically, the first word if

Re: DIH: alternative approach to deltaQuery

2010-09-17 Thread Shawn Heisey
On 9/17/2010 3:01 AM, Paul Dhaliwal wrote: Another feature missing in DIH is ability to pass parameters into your queries. If one could pass a named or positional parameter for an entity query, it will give them lot of freedom to optimize their delta or full load queries. One can even get creati

Re: getting a list of top page-ranked webpages

2010-09-17 Thread Ian Upright
On Thu, 16 Sep 2010 15:31:02 -0700, you wrote: >The public terabyte dataset project would be a good match for what you >need. > >http://bixolabs.com/datasets/public-terabyte-dataset-project/ > >Of course, that means we have to actually finish the crawl & finalize >the Avro format we use for th

Re: Simple Filter Query (fq) Use Case Question

2010-09-17 Thread Shawn Heisey
On 9/16/2010 12:27 PM, Dennis Gearon wrote: Is a core a running piece of software, or just an index/config pairing? Dennis Gearon A core is one complete index within a Solr instance. http://wiki.apache.org/solr/CoreAdmin My master index servers have five cores - ncmain, ncrss, live, build,

Re: getting a list of top page-ranked webpages

2010-09-17 Thread Ian Upright
On Fri, 17 Sep 2010 04:46:44 -0700 (PDT), kenf_nc wrote: >A slightly different route to take, but one that should help test/refine a >semantic parser is wikipedia. They make available their entire corpus, or >any subset you define. The whole thing is like 14 terabytes, but you can get >smaller se

Re: Tuning Solr caches with high commit rates (NRT)

2010-09-17 Thread Andy
Does Solr use Lucene NRT? --- On Fri, 9/17/10, Erick Erickson wrote: > From: Erick Erickson > Subject: Re: Tuning Solr caches with high commit rates (NRT) > To: solr-user@lucene.apache.org > Date: Friday, September 17, 2010, 1:05 PM > Near Real Time... > > Erick > > On Fri, Sep 17, 2010 at 12

Re: Tuning Solr caches with high commit rates (NRT)

2010-09-17 Thread Dennis Gearon
This means both the indexing and the searching in NRT? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Fri, 9/17/10, Erick Erickson wrote: > From: Erick Erickson >

Re: Can i do relavence and sorting together?

2010-09-17 Thread Dennis Gearon
HOw does one 'vary the slop'? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Fri, 9/17/10, Erick Erickson wrote: > From: Erick Erickson > Subject: Re: Can i do rel

Re: Can i do relavence and sorting together?

2010-09-17 Thread Dennis Gearon
The users will be able to choose the order of sort based on distance, data and time, relevancy. More than likely, my first initial version will do range limits on distance, data and time. Then relevancy will sort, send it to browser. After that, the user will sort it in the browser as desired.

Re: Can i do relavence and sorting together?

2010-09-17 Thread Don Werve
On Sep 17, 2010, at 10:00 AM, Dennis Gearon wrote: > Well .. >> because the date sort overrides all the scoring, by >> definition. > > THAT'S not good for what I want, LOL! > > Is there any way to chain things like distance, date, relevancy, an integer > field to force sort oder, like when usin

RE: Can i do relavence and sorting together?

2010-09-17 Thread Jonathan Rochkind
Yes. Just as you'd expect: &sort=score asc,date desc,title asc [url encoded of course] The only trick is knowing the special key 'score' for sorting by relevancy. This is all in the wiki docs: http://wiki.apache.org/solr/CommonQueryParameters#sort Also keep in mind, as the docs say, sort

Re: Can i do relavence and sorting together?

2010-09-17 Thread Erick Erickson
Sure, you can specify multiple sort fields. If the first sort field results in a tie, then the second is used to resolve. If both first and second match, then the third is used to break the tie. Note that relevancy is tricky to include in the chain because it's infrequent to have two docs with exa

Re: Tuning Solr caches with high commit rates (NRT)

2010-09-17 Thread Erick Erickson
Near Real Time... Erick On Fri, Sep 17, 2010 at 12:55 PM, Dennis Gearon wrote: > BTW, what is NRT? > > Dennis Gearon > > Signature Warning > > EARTH has a Right To Life, > otherwise we all die. > > Read 'Hot, Flat, and Crowded' > Laugh at http://www.yert.com/film.php > > > ---

Re: Can i do relavence and sorting together?

2010-09-17 Thread Dennis Gearon
Well .. > because the date sort overrides all the scoring, by > definition. THAT'S not good for what I want, LOL! Is there any way to chain things like distance, date, relevancy, an integer field to force sort oder, like when using SQL 'SORT BY', the order of sort is the order of listing? Den

Re: Tuning Solr caches with high commit rates (NRT)

2010-09-17 Thread Dennis Gearon
BTW, what is NRT? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Fri, 9/17/10, Peter Sturge wrote: > From: Peter Sturge > Subject: Re: Tuning Solr caches with high

Re: Solr Highlighting Issue

2010-09-17 Thread Dennis Gearon
How does highlighting work with JSON output? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Fri, 9/17/10, Ahson Iqbal wrote: > From: Ahson Iqbal > Subject: Solr Hi

Re: Search the mailinglist?

2010-09-17 Thread alexander sulz
Many thank yous to all of you :) Am 17.09.2010 17:24, schrieb Walter Underwood: Or, for a fascinating multi-dimensional UI to mailing list archives: http://markmail.org/ --wunder On Sep 17, 2010, at 7:15 AM, Markus Jelsma wrote: http://www.lucidimagination.com/search/?q= On Friday 17 Sep

Indexing PDF - literal field already there & many "null"'s in text field

2010-09-17 Thread alexander sulz
Hi everyone. Im successfully indexing PDF files right now but I still got some problems. 1. Tika seems to map some content to appropiate fields in my schema.xml If I pass on a literal.title=blabla parameter, tika may have parsed some information out of the pdf to fill in the field "title" its

Re: Can i do relavence and sorting together?

2010-09-17 Thread Erick Erickson
The problem, and it's a practical one, is that terms usually have to be pretty close to each other for proximity to matter, and you can get this with phrase queries by varying the slop. FWIW Erick On Fri, Sep 17, 2010 at 11:05 AM, Andrew Cogan wrote: > I'm a total Lucene/SOLR newbie, and I'm sur

Re: Get all results from a solr query

2010-09-17 Thread Christopher Gross
Thanks for being so helpful! You really helped me to answer my question! You aren't condescending at all! I'm not using it to pull down *everything* that the Solr instance stores, just a portion of it. Currently, I need to get 16 records at once, not just the 10 that show. So I have the rows s

Re: Search the mailinglist?

2010-09-17 Thread Walter Underwood
Or, for a fascinating multi-dimensional UI to mailing list archives: http://markmail.org/ --wunder On Sep 17, 2010, at 7:15 AM, Markus Jelsma wrote: > http://www.lucidimagination.com/search/?q= > > > On Friday 17 September 2010 16:10:23 alexander sulz wrote: >> Im sry to bother you all with

Re: Get all results from a solr query

2010-09-17 Thread Walter Underwood
Go ahead and put an absurdly large value as the rows parameter. Then wait, because that query is going to take a really long time, it can interfere with every other query on the Solr server (denial of service), and quite possibly cause your client to run out of memory as it parses the result. A

Re: Search the mailinglist?

2010-09-17 Thread Thomas Joiner
Also there is http://lucene.472066.n3.nabble.com/Solr-User-f472068.html if you prefer a forum format. On Fri, Sep 17, 2010 at 9:15 AM, Markus Jelsma wrote: > http://www.lucidimagination.com/search/?q= > > > On Friday 17 September 2010 16:10:23 alexander sulz wrote: > > Im sry to bother you all

Re: Color search for images

2010-09-17 Thread Shashi Kant
> > What I am envisioning (at least to start) is have all this add two fields in > the index.  One would be for color information for the color similarity > search.  The other would be a simple multivalued text field that we put > keywords into based on what OpenCV can detect about the image.  If i

Re: Understanding Lucene's File Format

2010-09-17 Thread Michael McCandless
You're welcome! Mike On Fri, Sep 17, 2010 at 10:44 AM, Giovanni Fernandez-Kincade wrote: > Interesting. Thanks for your help Mike! > > -Original Message- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Friday, September 17, 2010 10:29 AM > To: solr-user@lucene.apach

RE: Can i do relavence and sorting together?

2010-09-17 Thread Andrew Cogan
I'm a total Lucene/SOLR newbie, and I'm surprised to see that when there are multiple search terms, term proximity isn't part of the scoring process. Has anyone on the list done custom scoring that weights proximity? Andy Cogan -Original Message- From: kenf_nc [mailto:ken.fos...@realestat

Re: Version stability [was: svn branch issues]

2010-09-17 Thread Yonik Seeley
On Fri, Sep 17, 2010 at 10:46 AM, Mark Miller wrote: > I agree it's mainly API wise, but there are other issues - largely due > to Lucene right now - consider the bugs that have been dug up this year > on the 4.x line because flex has been such a large rewrite deep in > Lucene. We wouldn't do flex

Re: Version stability [was: svn branch issues]

2010-09-17 Thread Mark Miller
I agree it's mainly API wise, but there are other issues - largely due to Lucene right now - consider the bugs that have been dug up this year on the 4.x line because flex has been such a large rewrite deep in Lucene. We wouldn't do flex on the 3.x stable line and it's taken a while for everything

RE: Understanding Lucene's File Format

2010-09-17 Thread Giovanni Fernandez-Kincade
Interesting. Thanks for your help Mike! -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Friday, September 17, 2010 10:29 AM To: solr-user@lucene.apache.org Subject: Re: Understanding Lucene's File Format Yes. They are decoded from the deltas in the t

Re: Solr Highlighting Issue

2010-09-17 Thread Ahson Iqbal
Hi Koji thank you very much it really works From: Koji Sekiguchi To: solr-user@lucene.apache.org Sent: Fri, September 17, 2010 7:11:31 PM Subject: Re: Solr Highlighting Issue (10/09/17 16:36), Ahson Iqbal wrote: > Hi All > > I have an issue in highlighting

Re: Understanding Lucene's File Format

2010-09-17 Thread Michael McCandless
Yes. They are decoded from the deltas in the tii file into absolutes in memory, on load. Note that trunk (w/ flex indexing) has changed this substantially: we store only the offset into the terms dict file, as an absolute in a packed int array (no object per indexed term). Then, at the seek poin

Re: Version stability [was: svn branch issues]

2010-09-17 Thread Yonik Seeley
I think we aim for a "stable" trunk (4.0-dev) too, as we always have (in the functional sense... i.e. operate correctly, don't crash, etc). The stability is more a reference to API stability - the Java APIs are much more likely to change on trunk. Solr's *external* APIs are much less likely to ch

Re: Version stability [was: svn branch issues]

2010-09-17 Thread Mark Miller
The 3.x line should be pretty stable. Hopefully we will do a release soon. A conversation was again started about more frequent releases recently, and hopefully that will lead to a 3.x release near term. In any case, 3.x is the stable branch - 4.x is where the more crazy stuff happens. If you are

Re: Search the mailinglist?

2010-09-17 Thread Markus Jelsma
http://www.lucidimagination.com/search/?q= On Friday 17 September 2010 16:10:23 alexander sulz wrote: > Im sry to bother you all with this, but is there a way to search through > the mailinglist archive? Ive found > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/ so far > but there i

Re: Solr Highlighting Issue

2010-09-17 Thread Koji Sekiguchi
(10/09/17 16:36), Ahson Iqbal wrote: Hi All I have an issue in highlighting that if i query solr on more than one fields like "+Contents:risk +Form:1" and even i specify the highlighting field is "Contents" it still highlights risk as well as 1, because it is specified in the query.. now if i s

Search the mailinglist?

2010-09-17 Thread alexander sulz
Im sry to bother you all with this, but is there a way to search through the mailinglist archive? Ive found http://mail-archives.apache.org/mod_mbox/lucene-solr-user/ so far but there isnt any convinient way to search through the archive. Thanks for your help

RE: Understanding Lucene's File Format

2010-09-17 Thread Giovanni Fernandez-Kincade
> The terms index (once loaded into RAM) has absolute longs, too. So in the TermInfo Index(.tii), the FreqDelta, ProxDelta, And SkipDelta stored with each TermInfo are actually absolute? -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Friday, Septemb

Version stability [was: svn branch issues]

2010-09-17 Thread Mark Allan
OK, 1.5 won't be released, so we'll avoid that. I've now got my code additions compiling against a version of 3.x so we'll stick with that rather than solr_trunk for the time being. Does anyone have any sense of when 3.x might be considered stable enough for a release? We're hoping to go

spatial sorting

2010-09-17 Thread dan sutton
Hi, I'm trying to filter and sort by distance with this URL: http://localhost:8080/solr/select/?q=*:*&fq={!sfilt%20fl=loc_lat_lon}&pt=52.02694,-0.49567&d=2&sort={!func}hsin(52.02694,-0.49567,loc_lat_lon_0_d,%20loc_lat_lon_1_d,3963.205)asc Filtering is fine but it's failing in parsing the sort wi

Re: Solr Rolling Log Files

2010-09-17 Thread Mark Miller
Sure - start here: http://wiki.apache.org/solr/SolrLogging Solr uses java util logging out of the box. You will end up with something like this: java.util.logging.FileHandler.limit=102400 java.util.logging.FileHandler.count=5 - Mark lucidimagination.com On 9/14/10 2:02 PM, Vladimir Sutskever wr

Re: Can i do relavence and sorting together?

2010-09-17 Thread Erick Erickson
What is it about the standard relevance ranking that doesn't suit your needs? And note that if you sort by your date field, relevance doesn't matter at all because the date sort overrides all the scoring, by definition. Best Erick On Fri, Sep 17, 2010 at 6:57 AM, Pawan Darira wrote: > Hi > > My

Re: Index partitioned/ Full indexing by MSSQL or MySQL

2010-09-17 Thread kenf_nc
You don't give an indication of size. How large are the documents being indexed and how many of them are there. However, my opinion would be a single index with an 'active' flag. In your queries you can use FilterQueries (fq=) to optimize on just active if you wish, or just inactive if that is ne

Re: Get all results from a solr query

2010-09-17 Thread kenf_nc
Chris, I agree, having the ability to make rows something like -1 to bring back everything would be convenient. However, the 2 call approach (q=blah&rows=0 followed by q=blah&rows=numFound) isn't that slow, and does give you more information up front. You can optimize your Array or List<> sizes in

Re: DataImportHandler with multiline SQL

2010-09-17 Thread kenf_nc
Sounds like you want the http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor CachedSqlEntityProcessor it lets you make one query that is cached locally and can be joined to with a separate query. -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportH

Re: Get all results from a solr query

2010-09-17 Thread Christopher Gross
@Markus Jelsma - the wiki confirms what I said before: rows This parameter is used to paginate results from a query. When specified, it indicates the maximum number of documents from the complete result set to return to the client for every request. (You can consider it as the maximum number of re

Re: Can i do relavence and sorting together?

2010-09-17 Thread kenf_nc
Those are at least 3 different questions. Easiest first, sorting. add&sort=ad_post_date+desc (or asc) for sorting on date, descending or ascending check out how http://www.supermind.org/blog/378/lucene-scoring-for-dummies Lucene scores by default. It might close to what you want. The

Re: getting a list of top page-ranked webpages

2010-09-17 Thread kenf_nc
A slightly different route to take, but one that should help test/refine a semantic parser is wikipedia. They make available their entire corpus, or any subset you define. The whole thing is like 14 terabytes, but you can get smaller sets. -- View this message in context: http://lucene.472066.n

Can i do relavence and sorting together?

2010-09-17 Thread Pawan Darira
Hi My index have fields named ad_title, ad_description & ad_post_date. Let's suppose a user searches for more than one keyword, then i want the documents with maximum occurence of all the keywords together should come on top. The more closer the keywords in ad_title & ad_description should be give

Re: Understanding Lucene's File Format

2010-09-17 Thread Michael McCandless
The entry for each term in the terms dict stores a long file offset pointer, into the .frq file, and another long for the .prx file. But, these longs are delta-coded, so as you scan you have to sum up these deltas to get the absolute file pointers. The terms index (once loaded into RAM) has absol

Re: Tuning Solr caches with high commit rates (NRT)

2010-09-17 Thread Peter Sturge
Hi, It's great to see such a fantastic response to this thread - NRT is alive and well! I'm hoping to collate this information and add it to the wiki when I get a few free cycles (thanks Erik for the heads up). In the meantime, I thought I'd add a few tidbits of additional information that might

Re: DIH: alternative approach to deltaQuery

2010-09-17 Thread Paul Dhaliwal
Another feature missing in DIH is ability to pass parameters into your queries. If one could pass a named or positional parameter for an entity query, it will give them lot of freedom to optimize their delta or full load queries. One can even get creative with entity and delta queries that can take

Solr Highlighting Issue

2010-09-17 Thread Ahson Iqbal
Hi All I have an issue in highlighting that if i query solr on more than one fields like "+Contents:risk +Form:1" and even i specify the highlighting field is "Contents" it still highlights risk as well as 1, because it is specified in the query.. now if i split the query as "+Contents:risk" i