There are some instructions about integrating Nutch with Solr here:
http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html
Joakim
"Otis Gospodnetic" <[EMAIL PROTECTED]> kirjoitti 9.1.2008:
> Nutch and Solr work nice in tandem. We've used Nutch for its distributed
> fet
Nutch and Solr work nice in tandem. We've used Nutch for its distributed
fetching + parsing and related functionality and have used Solr to indexed the
resulting text. What glued them together was Solrj, actually.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Origi
I've found that Solr running on modest hardware (a 2.4 GHz PC running
Windows XP Pro for testing changes) is able to index about 23,000
records in under three minutes. Assuming you aren't going to make too
many typos in your naming, you should be fine just doing the
re-indexing. Try timing yo
Marc: sorry that no one seems to have replied to your meail untill now
(holidays and all that).
: I try to run three SOLR webapps. Tomcat starts up and the three SOLR instances
: work perfectly well. However, in the logfiles I found a
:
: ... "INFO: no /solr/home in JNDI" (followed by a "SEVERE
: Is there a way or a point in filtering all results bellow a certain score?
: e.g. exclude all results bellow score Y.Thanks
not out of the box, you'd hav to write aq custom request handler to do
this ... but unless your custom request handler also changes the scoring
forula drasticly, filteri
On Tomcat 5.5, an OutOfMemory on a query leaves the server in an OK state,
and future queries work.
But a facet query that runs out of ram does not free its undone state and
all future requests get OutOfMemory.
I have not tried the Solr 'luke' handler since it took 5 minutes to run on
our index wh
On Tomcat, an OutOfMemory on a query leaves the server in an OK state, and
future queries work.
But a facet query that runs out of ram does not free its undone state and all
future requests get OutOfMemory.
Lance
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On B
Robert Young wrote:
Thanks, that is very helpfull. So, is there a way to find out the
total number of distinct tokens, regardless of which field they're
associated with? And to find which are most popular?
nothing standard does that... the semantics of what it would mean get a
little wierd -
On Jan 8, 2008 3:07 PM, Robert Young <[EMAIL PROTECTED]> wrote:
> Thanks, that is very helpfull. So, is there a way to find out the
> total number of distinct tokens, regardless of which field they're
> associated with?
No. A term in lucene consists of a field and value, so the same word
in diff
Thanks, that is very helpfull. So, is there a way to find out the
total number of distinct tokens, regardless of which field they're
associated with? And to find which are most popular?
Cheers
Rob
On Jan 8, 2008 5:04 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote:
> numTerms counts the unique terms
I may be mistaken, but this is not equivalent to my query.In my query i have
matches for x1, matches for x2 without slope and/or boosting and then match
to "x1 x2" (exact match) with slope (~) a and boost (b) in order to have
results with exact match score better.
The total score is the sum of all
: User Query: x1 x2
: Desired query (Lucene): field:x1 x2 field:"x1 x2"~a^b
:
: In the standard handler the only way i saw how to make this work was:
: field:x1 field:x2 field:"x1 x2"!a^b
:
: Now that i want to try the DisMax is there a way to implement this without
: having duplicate fields? i.
Please see http://wiki.apache.org/solr/DisMaxRequestHandler
-Yonik
On Jan 8, 2008 2:40 PM, s d <[EMAIL PROTECTED]> wrote:
> User Query: x1 x2
> Desired query (Lucene): field:x1 x2 field:"x1 x2"~a^b
>
> In the standard handler the only way i saw how to make this work was:
> field:x1 field:x2 field
The multicore stuff has changed around a bit since its debut (and may
change some more before the final release) check:
http://wiki.apache.org/solr/MultiCore
There is no longer a 'SETASDEFAULT' action and all requests require the
core name, so you will need:
http://localhost:8983/solr/core0/
Jae Joo wrote:
I have built two cores - core0 and core1.
each core has different set of index.
I can access core0 and core 1 by
http://localhost:8983/solr/core[01]/admin/form.jsp.
Is there any way to access multiple indexes with single query?
nothing standard. From a custom RequestHandler,
User Query: x1 x2
Desired query (Lucene): field:x1 x2 field:"x1 x2"~a^b
In the standard handler the only way i saw how to make this work was:
field:x1 field:x2 field:"x1 x2"!a^b
Now that i want to try the DisMax is there a way to implement this without
having duplicate fields? i.e. since the fiel
On 8-Jan-08, at 4:34 AM, Heba Farouk wrote:
Hello there,
i would like to use a similar parallelmultisearcher of lucene in
solr to search multiple indexes, does it exist?
Nope. Instead, there is work progressing on more flexible
distribution query distribution:
https://issues.apache.org
I have built two cores - core0 and core1.
each core has different set of index.
I can access core0 and core 1 by
http://localhost:8983/solr/core[01]/admin/form.jsp.
Is there any way to access multiple indexes with single query?
Thanks,
Jae
On 08.01.2008 19:09 Yonik Seeley wrote:
There is no shorter way, but if you update to the latest solr-dev
(changes I checked in today), the default will be no encapsulation for
split fields.
Many thanks, also for your patience!
Do you think the dev-version is ready for production?
-Michael
I have set multicores - core0 and core1, core0 is default.
Once I update the index by http://localhost:8983/solr/update, it updates
core1 not core0.
Also, I tried to set the deault core using SETASDEFAULT, but it is "unknown
action command".
Can any one help me?
Thanks,
Jae
On Jan 8, 2008 12:59 PM, Michael Lackhoff <[EMAIL PROTECTED]> wrote:
> On 08.01.2008 16:55 Yonik Seeley wrote:
>
> >> - A literal encapsulator should be possible to add by doubling
> >>it ' => '' but this gives the same error
> >
> > I think you would have to tripple it (the first is the encaps
On 08.01.2008 16:55 Yonik Seeley wrote:
- A literal encapsulator should be possible to add by doubling
it ' => '' but this gives the same error
I think you would have to tripple it (the first is the encapsulator).
Regardless, don't use encapsulation on the split fields unless you
have to.
Is there any way to make the DisMaxRequestHandler a bit more forgiving with
user queries, I'm only getting results when the user enters a close to
perfect match. I'd like to allow near matches if possible, but I'm not sure
how to add something like this when special query syntax isn't allowed.
--
Brian Whitman wrote:
I found that on the Wiki at
http://wiki.apache.org/solr/UpdateXmlMessages#head-3dfbf90fbc69f168ab6f3389daf68571ad614bef
under the title: Updating a Data Record via curl. I removed it and
now have the following:
0name="QTime">122This response format
is experimental.
try it without and see how you do...
I just updated the wiki
Kirk Beers wrote:
Brian Whitman wrote:
On Jan 8, 2008, at 10:58 AM, Kirk Beers wrote:
curl http://localhost:8080/solr/update -H "Content-Type:text/xml"
--data-binary '/overwritePending="true">0001name="title">TitleIt was the be
I found that on the Wiki at http://wiki.apache.org/solr/UpdateXmlMessages#head-3dfbf90fbc69f168ab6f3389daf68571ad614bef
under the title: Updating a Data Record via curl. I removed it and
now have the following:
0name="QTime">122This response format
is experimental. It is likely to cha
Brian Whitman wrote:
On Jan 8, 2008, at 10:58 AM, Kirk Beers wrote:
curl http://localhost:8080/solr/update -H "Content-Type:text/xml"
--data-binary '/overwritePending="true">0001name="title">TitleIt was the best of
times it was the worst of times blah blah blah'
Why the / after the first
numTerms counts the unique terms (field:value pair) in the index. The
source is:
TermEnum te = reader.terms();
int numTerms = 0;
while (te.next()) {
numTerms++;
}
indexInfo.add("numTerms", numTerms );
"distinct" is a similar calculation, but fo
On Jan 8, 2008, at 10:58 AM, Kirk Beers wrote:
curl http://localhost:8080/solr/update -H "Content-Type:text/xml" --
data-binary '/overwritePending="true">0001field>TitleIt was
the best of times it was the worst of times blah blah blahdoc>'
Why the / after the first single quote?
Hi,
I am new to Lucene and even newer to Solr and I am attempting to setup
Solr to fit our needs. I am using the Jan 7, 2008 build. I have updated
the schema.xml fields to reflect our preferences:
required="true"/>
required="true"/>
required="false"/>
required="false"/>
re
On Jan 8, 2008 10:32 AM, Michael Lackhoff <[EMAIL PROTECTED]> wrote:
> On 08.01.2008 16:11 Yonik Seeley wrote:
>
> > Ahh, wait, it looks a single quote as the encapsulator for split field
> > values by default.
> > Try adding f.PUBLPLACE.encapsulator=%00
> > to disable the encapsulation.
>
> Hmm. Y
On 08.01.2008 16:11 Yonik Seeley wrote:
Ahh, wait, it looks a single quote as the encapsulator for split field
values by default.
Try adding f.PUBLPLACE.encapsulator=%00
to disable the encapsulation.
Hmm. Yes, this works but:
- I didn't find anything about it in the docs (wiki). On the contrar
On Jan 8, 2008 9:58 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> On Jan 8, 2008 3:07 AM, Michael Lackhoff <[EMAIL PROTECTED]> wrote:
> > After a long weekend I could do a deeper look into this one and it looks
> > as if the problem has to do with splitting.
> >
> > > This one works for me fine.
>
On Jan 7, 2008 12:08 PM, Jae Joo <[EMAIL PROTECTED]> wrote:
> What happens if Solr application hit the max. memory of heap assigned?
>
> Will be die or just slow down?
In my (limited) experience (with Jetty), Solr will not die but it will
return HTTP 500 errors on all requests until it is restarte
On Jan 8, 2008 3:07 AM, Michael Lackhoff <[EMAIL PROTECTED]> wrote:
> After a long weekend I could do a deeper look into this one and it looks
> as if the problem has to do with splitting.
>
> > This one works for me fine.
> >
> > $ cat t2.csv
> > id,name
> > 12345,"'s-Gravenhage"
> > 12345,'s-Grav
On Jan 8, 2008 9:30 AM, Ryan McKinley <[EMAIL PROTECTED]> wrote:
> whats the function? how many matching documents?
Right... and remember that the first time you use a field in a
function query has the same cost as sorting on a field (a cache entry
needs to be generated). Try further queries or
whats the function? how many matching documents?
s d wrote:
Adding a FunctionQuery made the query response time slower by ~300ms, adding
a 2ndFunctionQuery added another ~300ms so overall i got over 0.5sec for a
response time (slow).Is this expected or am i doing something wrong ?
Thx
currently two approaches:
http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html
and:
https://issues.apache.org/jira/browse/NUTCH-442
I have had experience with the former... you may have more luck on the
nutch-user list for help
ryan
Jan Buelens wrote:
Hi,
We are c
Hello there,
i would like to use a similar parallelmultisearcher of lucene in solr to search
multiple indexes, does it exist?
Thanks in advance
Hi,
In the response for the LuceRequestHandler what do the different
fields mean? Some of them are obvious but some are less so. Is
numTerms the total number of terms or the total number of unique terms
(ie the dictionary), if it is the former how can I find the size of
the dictionary across all f
Got it (
http://wiki.apache.org/solr/DisMaxRequestHandler#head-cfa8058622bce1baaf98607b197dc906a7f09590)
.
thx !
On Jan 8, 2008 12:11 AM, Chris Hostetter < [EMAIL PROTECTED]> wrote:
>
> : Isn't there a better way to take the information into account but still
> : normalize? taking the score of on
Is there a way or a point in filtering all results bellow a certain score?
e.g. exclude all results bellow score Y.Thanks
Adding a FunctionQuery made the query response time slower by ~300ms, adding
a 2ndFunctionQuery added another ~300ms so overall i got over 0.5sec for a
response time (slow).Is this expected or am i doing something wrong ?
Thx
: I've got this error when trying to search query like q=+myFiled:"some
: value"*
:
: org.apache.solr.core.SolrException: Query parsing error: Cannot parse
: '+myFiled:"some value"*': '*' or '?' not allowed as first character in
: WildcardQuery
...
...this is where the subtleties of the L
Hi,
We are currently using Solr as search engine.
To add an existing website to our search engine, we are investigating Nutch.
Does anyone have more information / experience about an integration between
Solr and Nutch?
Thanks in advance !
Best regards,
Jan
Naveen: it doesn't look like you ever got a reply (probably because of the
holidays) ... did you ever manage to get things working?
looking over your question, the root problem seems to be related to the
the XPath Factories that should come standard in tomcat. I have never
seen this error bef
: Isn't there a better way to take the information into account but still
: normalize? taking the score of only one of the fields doesn't sound like the
: best thing to do (it's basically ignoring part of the information).
note the word "mostly" in Mike's response about dismax ... the "tie" param
After a long weekend I could do a deeper look into this one and it looks
as if the problem has to do with splitting.
This one works for me fine.
$ cat t2.csv
id,name
12345,"'s-Gravenhage"
12345,'s-Gravenhage
12345,"""s-Gravenhage"
$ curl http://localhost:8983/solr/update/csv?commit=true --dat
48 matches
Mail list logo