I used to store full text into lucene index. But I found it's very
slow when merging index because when merging 2 segments it copy the
fdt files into a new one. So I want to only index full text. But When
searching I need the full text for applications such as hightlight and
view full text. I can
hi li,
i looked at doing something similar - where we only index the text
but retrieve search results / highlight from files -- we ended up giving
up because of the amount of customisation required in solr -- mainly
because we wanted the distributed search functionality in solr which
meant making
Thanks for the quick reply!
In fact it was a typo, the 200 rows I got were from postgres. I tried to say
that the full-import was omitting the 100 oracle rows.
When I run the full import, I run it as a single job, using the url
command=full-import. I've tried to clear the index both using the
Just for testing purpose - I would
1. Use curl to create new docs
2. Use Solrj to go to individual dbs and collect docs.
On Wed, Jul 7, 2010 at 12:45 PM, Xavier Rodriguez xee...@gmail.com wrote:
Thanks for the quick reply!
In fact it was a typo, the 200 rows I got were from postgres. I
I was wondering if anyone has any experience using huge pages[1] to
improve SOLR (or Lucene) performance (esp on 64bit).
Some are reporting major performance gains in large, memory intense
applications (like EJBs)[2].
Also, ephemeral but significant performance reductions have also been
solved
1) Shouldn't you put your entity elements under document tag, i.e.
dataConfig
dataSource ... /
dataSource ... /
document name=docs
entity ../entity
entity ../entity
/document
/dataConfig
2) What happens if you try to run full-import with explicitly
specified entity GET
This looks reasonable. I'll take a look at the patch. Originally, I had
intended that it was just for one Field Sub Type, thinking that if we ever
wanted multiple sub types, that a new, separate class would be needed, but if
this proves to be clean this way, then I see no reason not to
Hi,
I am trying to make a Lucene module for SKOS-based synonym expansion. As I
wanted to implement the Filter in SOLR, I get a ClassCastException.
So I tried to take one of the existing SOLR Filters and FilterFactories, change
the package information, compress into a jar and use it as a
I haven't used this myself, but Solr supports a
http://wiki.apache.org/solr/UpdateXmlMessages#A.22rollback.22 rollback
function. It is supposed to rollback to the state at the previous commit. So
you may want to turn off auto-commit on the index you are updating if you
want to control what that
Currently our only requirement is to be able to search on the
numerical part of the daterange field, so our field type overrides
getRangeQuery and getFieldQuery to consider only the first two
subfields. If we wanted to be able to search the name subfield as
well, I suppose we could do
So I will have a solr field that contains years, ie, 1990, 2010,
maybe even 1492, 1209 and 907/0907.
I will be doing range limits over this field. Ie, [1950 TO 1975] or
what have you. The data represents publication dates of books on a
large library shelves; there will be around 3 million
Hi list,
I am wondering if Solr/Lucene can help improve my existing search engine.
I would like to have different results for each user - but still have
relevant results. Each user would have different score multipliers for
each searchable item.
Is this something possible?
Thanks,
--
On Wed, Jul 7, 2010 at 8:15 AM, Grant Ingersoll gsing...@apache.org wrote:
Originally, I had intended that it was just for one Field Sub Type, thinking
that if we ever wanted multiple sub types, that a new, separate class would
be needed
Right - this was my original thinking too.
I'm still pretty new to SOLR and have a question about handling updates. I
currently have a db-config to do a bulk import. I have a single root entity
and then some data that comes from other tables. This works fine for an
initial bulk load. However, once indexed, is there a way I can tell
Hmmm, let's see your schema definitions please. I'm suspicious because
you've implied that you do use a unique key. If it's required, then your
definitions don't select it into the same name (i.e. you select as
id_carrer in one and id_hidrant in another). So if id_hidrant was defined
as your
You need to look carefully at your schema.xml. There are plenty of
comments in that file describing what's going on. That's where you
set up your analyzers by chaining together various tokenizers
and filters.
I think you're confused about indexing and storing. Generally it's
a bad practice to
My index contains data of 2 different languages, English German. Now which
analyzer stemmer should be applied on this data before feeding to index
-Sarfaraz
This isn't a very worrisome case. Most of the messages you see on the board
about
the dangers of dates arise because dates can be stored with many unique
values if
they include milliseconds. Then, when sorting on date your memory explodes
because
all the dates are loaded into memory.
In your
The short answer is there isn't a single analyzer and stemmer that
really work well for mixed-language indexing and searching.
Take a look through the mail archive, try search for multilanguage or
multi-language
or multiple languages. There's a wealth of info there because this topic has
been
There are terms in my data like : one-way , separated by '-' , now the problem
is that the standard analyzer is considering these as a single term instead of
two. but i need that these should be stored as two terms in the index.. but how
to do this ??
Sarfaraz
Thanx Erick
:-)
--- On Thu, 8/7/10, Erick Erickson erickerick...@gmail.com wrote:
From: Erick Erickson erickerick...@gmail.com
Subject: Re: stemming the index
To: solr-user@lucene.apache.org
Date: Thursday, 8 July, 2010, 1:33 AM
The short answer is there isn't a single analyzer and stemmer that
Take a look at WordDelimiterFilterFactory
Erick
On Wed, Jul 7, 2010 at 4:15 PM, sarfaraz masood
sarfarazmasood2...@yahoo.com wrote:
There are terms in my data like : one-way , separated by '-' , now the
problem is that the standard analyzer is considering these as a single term
instead
: Ubuntu server (see exception below). The same configuration works when
: injecting from a Windows client to a Windows server.
interesting ... so you're saying that if you use the exact same SolrJ
code, and just change the host:port, it works on windows? are you certian
that the version of
: Does anyone know how to read in data from one or more of the example xml docs
: and ALSO store the filename and path from which it came?
Solr has no knowledge that your xml docs are actually files ... the XML
syntax (adddoc...) is just a serialization mechanism for streaming
data to solr
: with multicore. i cannot access:
: http://localhost:8983/solr/collection1/admin/zookeeper.jsp
why would you expect that URL to work? you don't have a core named
collection1 in the solr.xml you posted...
: cores adminPath=/admin/cores defaultCoreName=collection1
: core name=GPTWPI
: I am fectching the following details programatically :
1) you didn't tell us how you were fetching those detials programatically
.. what URL are you using?
2) he fact that the handlerStart times are different suggests that you are
not looking at the same handler (maybe you are looking at
Hi,
I have a text file broken apart by carriage returns, and I'd like to only
return entire lines. So, I'm trying to use this:
hl.fragmenter=regex
hl.regex.pattern=^.*$
... but I still get fragments, even if I crank up the hl.regex.slop to 3 or so.
I also tried a pattern of
TokenFilterFactory is an interface. Your factory class has to
implement this interface.
If you look at the Lucene factories, they all subclass from
BaseTokenFilterFactory which then subclasses from
BaseTokenStreamFactory. That last one does various things for the
child factories (I don't know
If autocommit does not to an automatic rollback, that is a serious bug.
There should be a way to detect that an automatic rollback has
happened, but I don't know what it is. Maybe something in the Solr
MBeans?
On Wed, Jul 7, 2010 at 5:41 AM, osocurious2 ken.fos...@realestate.com wrote:
I
Yes, for a user's query you would include a different set of boosts as
a parameter in the search request. It's easy. You need the user-boost
set mapping in your front end, not in Solr.
On Wed, Jul 7, 2010 at 8:44 AM, Jean-Michel Philippon-Nadeau
j...@jmpnadeau.ca wrote:
Hi list,
I am wondering
You can pass variables to the DIH from the URL parameters. This would
let you pass a query term into the DIH operation.
On Wed, Jul 7, 2010 at 11:53 AM, Frank A fsa...@gmail.com wrote:
I'm still pretty new to SOLR and have a question about handling updates. I
currently have a db-config to do a
There is no 'trie string'.
If you use a trie type for this problem, sorting will take much less
memory. Sorting strings uses memory both per document and per unique
term. The Trie types do not use any memory per unique term. So, yes, a
Trie Integer is a good choice for this problem.
On Wed, Jul
Hey Robert,
You may want to check out Flume for log file collection:
http://github.com/cloudera/flume. We don't currently allow Flume to populate
a Solr index, but that would be quite an interesting use case!
Later,
Jeff
On Wed, Jun 30, 2010 at 3:06 PM, Robert Petersen rober...@buy.com wrote:
I use SegmentInfos to read the segment_N file and found the error is
that it try to load deletedDocs but the .del file's size is 0(because
of disk error) . So I use SegmentInfos to set delGen=-1 to ignore
deleted Docs.
But I think there is some bug. The logic of write my be -- it first
writes the
34 matches
Mail list logo