Hi Otis,
The problem is that we are using hadoop for batch index building. So in this
case we are not capable to do incremental indexing by now. It will be cool
if we could simulate incremental indexing only for the index uploads.
--
View this message in context:
http://lucene.472066.n3
Dear all,
According to my experiences, when the Lucene index updated frequently, its
performance must become low. Is it correct?
In my system, most data crawled from the Web is indexed and the
corresponding index will NOT be updated any more.
However, some indexes should be updated frequently
Hi,
I am using Solr to index documents. And I would index my documents with 2
different analyzer and generate 2 index.
So I don't know how I could generate 2 different index?
Thank you for your help.
Amel.
You're possibly getting hit by server caching. Are you by chance
submitting the exact same query after your commit? What
happens if you change your query do one you haven't used before?
Turning off http caching might help. Solr should be searching
the new contents after a commit (and any attendant
Dave Stuart
> Hi,
>
> I am testing index time synonyms, stemming etc and it would be great to be
> able to view the raw indexed data is there a way to do this using either
> Lucene tools or the admin Solr admin interface
>
> Regards,
>
> David
search_returns_too_many_.2BAC8_too_little_.2BAC8_unexpected_results.2C_how_to_debug.3F
>
>
> 2011/4/19 Dave Stuart
>
>> Hi,
>>
>> I am testing index time synonyms, stemming etc and it would be great to be
>> able to view the raw indexed data is there a way to
I did a facet query on my data field and it showed a list of words with
their count but it miss lot of words in facet count.
The query used was :-
http://localhost:8983/solr/select/?q=*:*&facet=true&facet.field=Data
How can I get the count of each word in my index and one more questio
hi
I am not sure if SOLR has this feature so just wanted to confirm..
Basically what I want to do is for certain query terms I would like to query
real time web service which will return certain results and at the same time
search in solr index.
This can be implemented out side solr and I am
Hi All,
I understand that I can use a custom queryConverter for the input to
the suggester http://wiki.apache.org/solr/Suggester component, however
there dosen't seem to be anything on the indexing side, TST appears to
take the input verbatim, and Jaspell seems to lowercase everything.
The proble
Take a look at TikaEntityProcessor or the Tika package. I'm on restricted
inet access so can't look at the exact class.
Erick
On May 24, 2011 6:45 AM, "Thumuluri, Sai"
wrote:
> Good morning, I am trying to index some PDFs which are protected by
> siteminder, any ideas a
That's rare. How do you add documents to Solr? what do you have as primary
key?
How do you determine the number of documents in the index?
The value of "maxDoc" of the stats page considers deleted documents too,
which are eliminated at merging.
On Wed, Jun 8, 2011 at 12:18 PM,
erver/app_name/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on
Thanks,
Marius
2011/6/8 Tomás Fernández Löbbe
> That's rare. How do you add documents to Solr? what do you have as primary
> key?
> How do you determine the number of documents in the index?
>
>
After switching to solr 3.2 and building a new index from scratch I ran
check_index which reports:
Segments file=segments_or numSegments=1 version=FORMAT_3_1 [Lucene 3.1]
Why do I get FORMAT_3_1 and Lucene 3.1, anything wrong with my index?
from my schema.xml:
from my solrconfig.xml
We are experiencing slowness during reloading/resubmitting index from Database
to the master.
We have two environments:
QA and Prod.
The slowness is happened only in Production but not in QA.
It only takes one hours to reload 2.5Mil indexes compare 5-6 hours to load the
same size of index
ut 50 million documents split across 2 servers with
reasonable performance - sub-second response time in most cases. The total
size of the 2 indices is about 300G. I'd say most of the size is from stored
fields, though we index just about everything. This is on 64-bit ubuntu boxes
with 32G of m
I am trying to index a solr server from a nightly build. I get the
following error in my catalina.out:
26-Jun-2009 5:52:06 PM
org.apache.solr.update.processor.LogUpdateProcessor
finish
Hello,
I was wondering if there was an option to initialize Solr server with
synonyms pulled from a database while indexing documents? At the moment, the
only option seems to be to use a flat file.
Thanks.
--
View this message in context:
http://www.nabble.com/Loading-Index-synonyms-from
Hello,
I was wondering if there was an option to initialize Solr server with synonyms
pulled from a database while indexing documents? At the moment, the only option
seems to be to use a flat file.
Thanks.
I recently upgraded to a nightly build of 1.4. The build works fine, I
can deploy fine. But when I go to insert data into the index, I get the
following error:
26-Jun-2009 5:52:06 PM
org.apache.solr.update.processor.LogUpdateProcessor
finish
For performance reasons, we're attempting to build the index used with
Solr, directly in Lucene. It works for the most part fine, but I'm
having issue when it comes to stemming. I'm guessing this is due to a
mismatch in how Lucene is stemming, with how Solr stems during i
On Sun, Jul 12, 2009 at 10:55 AM, manuel aldana wrote:
> is it possible to clean up solr index by passing a start param? currently I
> am deleting the data/ folder to achieve this, which feels a bit unnatural.
> It would be cool to have something like -Dsolr.drop.index as parameter.
ntaining a score, lacks the
> designated ID field (from my schema) and thus the document cannot be added
> to the results queue.
>
> Because the example on the wiki works by loading the documents directly
> into
> Solr for indexing, I have come to the conclusion that there is
rDocument instance has
> only
> > a
> > score field -- which proves problematic in the following line where the
> id
> > is requested. The SolrDocument, only containing a score, lacks the
> > designated ID field (from my schema) and thus the document cannot be
&g
g.
If you forgot to include uniqueKeys in some documents, changed to schema to
add a uniqueKey and then didn't reindex the whole bunch, there will be some
documents in the index without a value in the unique key field. In such a
case, if you use distributed search, it will blow up because i
on the slave this command would not work well. The indexversion is not
the actual index version. It is the current replicateable index
version.
why do you call that API directly?
On Tue, Jul 21, 2009 at 12:53 AM, solr jay wrote:
> If you ask for the index version of a slave instance, you alw
oh, in case of index data corrupted on slave, I want to download the entire
index from master. During downloading, I want the slave be out of service
and put it back after it finished. I was trying figure out how to determine
downloading is done. Right now, I am calling
http://slave_host:8983
I hope it could be a solution.
But I think I understood that u can use deletePkQuery like this
"select document_id from table_document where statusDeleted= 'Y'"
In my case I have no status like "statusDeleted".
The request I would like to write is
"Delete
= 'Y'"
>
> In my case I have no status like "statusDeleted".
>
> The request I would like to write is
>
> "Delete from my solr Index the id that are no longer present in my
> table_document"
>
> With Lucene I had a way to do that :
> ope
As far as I know you can not do that with DIH. What size is your index?
Probably the best you can do is index from scratch again with full-import.
clico wrote:
>
> I hope it could be a solution.
>
> But I think I understood that u can use deletePkQuery like this
>
> "
did you see the deletedPkQuery?
On Thu, Aug 20, 2009 at 8:27 PM, clico wrote:
>
> Hello
>
> I'm trying a way to do that :
>
> I index a db query like
> "select id from table_documents"
>
> Some documents are updated or deleted from the data table.
e "statusDeleted".
I don't think there is a straight solution w/o doing a full-import
>
> The request I would like to write is
>
> "Delete from my solr Index the id that are no longer present in my
> table_document"
>
> With Lucene I had a way to do tha
: > The request I would like to write is
: >
: > "Delete from my solr Index the id that are no longer present in my
: > table_document"
: >
: > With Lucene I had a way to do that :
: > open IndexReader,
: > for each lucene document : check in table_document
hossman wrote:
>
> : > The request I would like to write is
> : >
> : > "Delete from my solr Index the id that are no longer present in my
> : > table_document"
> : >
> : > With Lucene I had a way to do that :
> : > open IndexReader,
>
you can write a onImportEnd event handler also
On Fri, Aug 21, 2009 at 3:28 PM, clico wrote:
>
>
> hossman wrote:
>>
>> : > The request I would like to write is
>> : >
>> : > "Delete from my solr Index the id that are no longer present in my
>>
I don't understand this point.
--
View this message in context:
http://www.nabble.com/Remove-data-from-index-tp25063736p25080669.html
Sent from the Solr - User mailing list archive at Nabble.com.
I had similar task but I simply used SolrJ client for that; I was forced to
keep track on deleted docs in table_document... now, I am using "delete by
query" (with SolrJ) - each document has a timestamp which I query...
For instance, you can update (allowOverwrite=true) whole SOLR inde
bati wrote:
>
> Hey,
> I was wondering - is there a mechanism in lucene and/or solr to mark a
> document in the index
> as deleted and then have this change reflect in query serving without
> performing the whole
> commit/warmup cycle? this seems to me largely appealing as it allo
; On Tue, Aug 25, 2009 at 3:10 PM, KaktuChakarabati
>> wrote:
>>>
>>> Hey,
>>> I was wondering - is there a mechanism in lucene and/or solr to mark a
>>> document in the index
>>> as deleted and then have this change reflect in query serving wit
a fairly simple change though perhaps too late for 1.4
> release?
>
> On Tue, Aug 25, 2009 at 3:10 PM, KaktuChakarabati
> wrote:
>>
>> Hey,
>> I was wondering - is there a mechanism in lucene and/or solr to mark a
>> document in the index
>> as deleted and then h
t; This will be implemented as you're stating when
>>> IndexWriter.getReader is incorporated. This will carry over
>>> deletes in RAM until IW.commit is called (i.e. Solr commit).
>>> It's a fairly simple change though perhaps too late for 1.4
>>> relea
ed. This will carry over
>>>> deletes in RAM until IW.commit is called (i.e. Solr commit).
>>>> It's a fairly simple change though perhaps too late for 1.4
>>>> release?
>>>>
>>>> On Tue, Aug 25, 2009 at 3:10 PM, KaktuChakarabati
>&
Use Apache Luke.
If you're using new Lucene. You might need to add Lucene 2.9 Jar files to
the Luke and build it.
Cheers
Rajan
On Wed, Sep 2, 2009 at 2:02 PM, Jason Rutherglen wrote:
> Is there a quick way to view index files?
>
>
> On Wed, Sep 2, 2009 at 2:02 PM, Jason Rutherglen > wrote:
>
>> Is there a quick way to view index files?
>>
>
ote:
> > Use Apache Luke.
> >
> > If you're using new Lucene. You might need to add Lucene 2.9 Jar files to
> > the Luke and build it.
> >
> > Cheers
> > Rajan
> >
> >
> > On Wed, Sep 2, 2009 at 2:02 PM, Jason Rutherglen <
> jason.rutherg...@gmail.com
> >> wrote:
> >
> >> Is there a quick way to view index files?
> >>
> >
>
rglen
wrote:
Is there a quick way to view index files?
et of servers which is used only for indexers for
index creation and then every 5 mins or so, the index will be copied to the
searchers(set of solr servers only for querying). For this we tried to use the
snapshooter,rsysnc etc.
But the problem with this approach is, the same index is present on
is there a way i can actually tell solr which index i want it to search
against with the query? I know it will cost a bit on performance, but it
would be helpful
i have many indexes and it would be nice to determine which one should be
used by the user.
thanks
--
View this message in context
Hey,
I noticed with new in-process replication, it is not as straightforward to
have
(production serving) solr index snapshots for backup (it used to be a
natural byproduct
of the snapshot taking process.)
I understand there are some command-line utilities for this (abc..)
Can someone please
On Tue, Oct 13, 2009 at 8:36 AM, Varun Gupta wrote:
> Hi,
>
> I am using Solr 1.3 for spell checking. I am facing a strange problem of
> spell checking index not been generated. When I have less number of
> documents (less than 1000) indexed then the spell check index builds,
ng. I am facing a strange problem of
> > spell checking index not been generated. When I have less number of
> > documents (less than 1000) indexed then the spell check index builds, but
> > when the documents are more (around 40K), then the index for spell
> checking
> >
Solr wants to keep various data directories like the spellchecking
index, not just the main index. The solr.data.dir option gives the
location of data data/ directory, which defaults under solr/. This
line in solrconfig.xml uses the property:
${solr.data.dir:./solr/data}
This starts the example
Have a look at http://wiki.apache.org/solr/TermsComponent
On Oct 15, 2009, at 5:43 AM, jfmel...@free.fr wrote:
Hi
I use a sample embedded Apache Solr to create a Lucene index with
few documents for tests purpose.
Documents have text string, sint, sfloat, bool, and date fields,
each of
1.3.0 replication.
Recently the indexing script got into problems when the commit was
taking longer than the request timeout. I killed the script, did a
commit by hand (using
bin/commit) and then started to index again and it still wouldn't
commit. We then tried to go to the stats page an
he approval_dt the above was
giving me proper sorted results. Could anyone suggest whether I am doing
something wrong.
thanks and rgds,
Anil.
-- Forwarded message --
From: Anil Cherian
Date: Thu, Nov 19, 2009 at 12:25 PM
Subject: index-time boost ... query
To: solr-user@l
thor: http://www.packtpub.com/solr-1-4-enterprise-search-server/
>
> My ultimate aim is to always bring up records in the result having latest
> approval_dt to appear first using index-time boosting in SOLR. Could you pls
> help me with some directions.
On Nov 19, 2009, at 12:25
o always bring up records in the result having
> latest approval_dt to appear first using index-time boosting in SOLR. Could
> you pls help me with some directions.
>
>
> On Nov 19, 2009, at 12:25 PM, Anil Cherian wrote:
>
> > Hi,
> >
> > I am working on index-ti
Hi David,
I just now tried a sorting on the results and I got the records with latest
approval_dt first.
My question now is will index-time boosting method increase the response. ie
will I be able to acheive the same thing i achieved
using sorting much faster if i use index-time boosting.
If
Hi,
Can I have one instance of Solr write the index and date to multiple drives
? e.g.
Can I configure Solr to do something like -
c:\data
d:\data
e:\data
Or is the suggested way to use multiple Solr cores and have the application
shard the index across the cores ? Or is distributed search
pdev [mailto:vika...@yahoo.com]
> Sent: Wed 04-Nov-2009 5:42 PM
> To: solr-user@lucene.apache.org
> Subject: Index documents with Solr
>
>
> Wanted to find out how people are using Solr's ExtractingRequestHandler to
> index different types of documents from a configurat
I got the records with latest
> approval_dt first.
>
> My question now is will index-time boosting method increase the response. ie
> will I be able to acheive the same thing i achieved
> using sorting much faster if i use index-time boosting.
>
> If you feel it helps could you
: I had working index time boosting on documents like so:
:
: Everything was great until I made some changes that I thought where no
: related to the doc boost but after that my doc boosting appears to be
: missing.
:
: I'm having a tough time debugging this and didn't have th
Hi everybody,
I'm experiencing a problem with my Solr-based web application running on a Sun
Solaris OS.
It seems that the application still holds file-descriptors to index files even
if these last ones are removed. It can be observed mainly when the
snapinstaller script is executed, b
I've tried searching for this answer all over but have found no results
thus far. I am trying to add a new field to my schema.xml with a
default value of 0. I have a ton of data indexed right now and it would
be very hard to retrieve all of the original sources to rebuild my
index.
Hi,
I use the DIH with RDBMS for indexing a large mysql database with
about 7 mill. entries.
Full index is working fine, in schema.xml I implemented a uniqueKey
field (which is of the type 'text').
I start queries with the dismax query handler, and get my results as
an php ar
Hello,
Is it possible to have the index created by a single SOLR instance, but have
several SOLR instances field the search queries. Or do I HAVE to replicate
the index for each SOLR instance that I want to answer queries? I need to
set up a fail-over instance. Thanks
- ashok
--
View this
Hi,
Given the only real way to reindex is to save the document again, what is
the fastest way to extract all the documents from a solr index to resave
them.
I have tried the id:[* TO *] trick however, it takes a while once you get a
few thousand into the index. Are there any tools that will
Hi,
Starting about one week ago, our index size gets tripled during
optimization.
The current index statistics are:
numDocs : 192702132
size: 76G
And we do optimization for every 6M docs update.
Since we keep getting new data, the index size increases every day. Before,
the index size was
Hi,
I'm trying to understand the internal Sturcture of the lucene indexer.
Well according to "Lucene in action" book , the Document are first converted
into lucene Document Format, then analysed with the standardAnalyser.
I don't understand how the analysed Documents added t
the issue is open and
> marked for SOLR 1.5 release. Is the patch available updates the index file
> for the particular id for solr 1.3?
>
> Regards,
> Sagar Khetkade
> _
> For the freshest Indian Jobs Visit MSN
thanks a lot Noble.
`Sagar
> Date: Wed, 18 Feb 2009 14:45:13 +0530> Subject: Re: Updating the solr index>
> From: noble.p...@gmail.com> To: solr-user@lucene.apache.org> > The patch
> currently does not work . SOLR-828 is supposed to duplicate> this. But this
>
ol.
I use Solr 1.3 to build two separate indexes. Then I shut down Solr.
The indexes generated by Solr look ok. I can read them with a Lucene
IndexSearcher, and even open up the index files and see the text of my
documents.
Next I run IndexMergeTool from Lucene 2.4, following the
instructions
I'm using lucene-core-2.4-dev.jar from the Solr 1.3.0 distribution.
Solr doesn't include lucene-misc, so I used lucene-misc-2.4.jar from
the Lucene 2.4.0 distribution.
But I had the exact same problem when I wrote my own index merge tool
using just the Solr distribution jars.
-Stuart S
Hi,
I am using Nutch nightly build #736 (version 1.0?) to crawl and index, and I
would like to use Solr as the indexer. I wonder if there's a way to convert
Nutch indices to Solr?
Thanks!
Tony
--
Are you RCholic? www.RCholic.com
温 良 恭 俭 让 仁 义 礼 智 信
Hi,
I am new user of solr and I don't know how to index
can any one tell me setting so that I can make index and search
and also how to crawl any web site and local system using solr?
Thanks In advance.
-Sanjshra
--
View this message in context:
http://www.nabble.com/How-to-index-in
Hi,
I am creating indexes in Solr and facing an unusual issue.
I am creating 5 indexes and xml file of 4th index is malformed. So, while
creating indexes it properly submits index #1, 2 & 3 and throws exception
after submission of index 4.
Now, if I look for index #1,2 & 3, it doesnt
Hi:
Is it easy to do daily incremental index update in Solr assuming the
index is around 1G? In terms of giving a document an ID to facilitate
index update, is it using the URL a good way to do so?
Thanks
Victor
Hi All,
I have a txt file, that captured all of my network traffic. How can I use
Solr to filter out a particular IP address?
Thank you,
Nga.
My question is - From design and query speed point of - should I add
new core to handle the additional data or should I add the data to
the existing core.
Do you ever need to get results from both sets of data in the same
query? If so, putting them in the same index will be faster
Hi,
Without knowing the details, I'd say keep it in the same index if the
additional information shares some/enough fields with the main product data and
separately if it's sufficiently distinct (this also means 2 queries and manual
merging/joining).
Otis --
Sematext -- http://se
lder under solr?
3. Where to mention if we want to take the back up of indexes under a
particular core or all the indexes under all cores?
Thanks,
Amit Garg
--
View this message in context:
http://www.nabble.com/How-to-take-Index-Backup-tp22736224p22736224.html
Sent from the Solr - User mailing
Hi,
We have a distributed Solr system (2-3 boxes with each running 2
instances of Solr and each Solr instance can write to multiple cores).
Our use case is high index volume - we can get up to 100 million
records (1 record = 500 bytes) per day, but very low query traffic
(only administrators
inley [mailto:ryan...@gmail.com]
Sent: Wednesday, March 25, 2009 8:54 PM
To: solr-user@lucene.apache.org
Subject: Re: large index vs multicore
>
> My question is - From design and query speed point of - should I add
> new core to handle the additional data or should I add the d
Hi,
Can someone provide a practical advice of how large a Solr search index can
be? for a better performance for consumer facing media website?.
Is it good or bad to think about Distributed Search and dividing index in
earlier stage of development?
Thanks
Ram
--
View this message in context
do not show, I compounded this, as all information is in
English and SOLR after I do not understand, finally, I hope I can help
because I have just days to deliver the project, a thousand thanks in
advance! :)
--
View this message in context:
http://www.nabble.com/Delete-from-Solr-index
hola de nuevo!
es cierto ese comando es el que borra un index, ya lo intenté y sí, así
borraré mis registros de prueba de mi proyecto, estaría bien saber como
borrarlo desde la aplicación mediante solrj, saludos, gracias :)
hello again!
this is true is the command that erases an index, and I
On Thu, Apr 23, 2009 at 9:25 AM, lupiss wrote:
>
> hola de nuevo!
> es cierto ese comando es el que borra un index, ya lo intenté y sí, así
> borraré mis registros de prueba de mi proyecto, estaría bien saber como
> borrarlo desde la aplicación mediante solrj, saludos, gracias :)
You can use solrServer.deleteByQuery("*:*") and then call commit by
solrServer.commit(true, true);
This will erase the index.
--
Regards,
Shalin Shekhar Mangar.
hola gracias por contestar! ese comando ya lo había visto, pero borra todo
los índices verdad?, yo quisiera borrar solo
Query("*:*") and then call commit by
> solrServer.commit(true, true);
>
> This will erase the index.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
>
> hola gracias por contestar! ese comando ya lo había visto, pero borra todo
> los índices verdad?, yo
Hi All,
I have a requirement wherein i want to update an existing index in solr.
For example : I have issued an index command in solr as
123
xxx
The id field is a unique key here.
My requirement is that i should be able to update this inex i.e add another
field to it without the need to
Hi, and sorry for slightly hijacking the thread,
On Mar 26, 2009, at 2:54 , Otis Gospodnetic wrote:
Hi,
Without knowing the details, I'd say keep it in the same index if
the additional information shares some/enough fields with the main
product data and separately if it's su
colas Pastorino [mailto:n...@ez.no]
Sent: Thursday, May 07, 2009 10:21 AM
To: solr-user@lucene.apache.org
Subject: Re: large index vs multicore
Hi, and sorry for slightly hijacking the thread,
On Mar 26, 2009, at 2:54 , Otis Gospodnetic wrote:
>
> Hi,
>
> Without knowing the details, I
f the solr svn checkout):
java -ea:org.apache.lucene -cp lib/lucene-core-2.9-dev.jar
org.apache.lucene.index.CheckIndex [path to index directory]
For example, to check the example index:
java -ea:org.apache.lucene -cp lib/lucene-core-2.9-dev.jar
org.apache.lucene.index.CheckIndex example/solr
Hi peter,
Thank you very much for your quick reply.
I tried the CheckIndex method. It can't work on my crashed index.
In the error message, it says the segments file in the directory is missing.
and when I use the -fix param, new segments file still can't be write.
I even try the
relsults to long :-)
But when i do the Same with Solritas, i get Nothing...
i do this:
http://192.168.105.56:8983/solr/itas/?q=summary%3Aplesnik&version=2.2&start=0&rows=10&indent=on
But when i do this:
http://192.168.105.56:8983/solr/itas/?q=
i get the Hole index as result
I cant search
Fist the Facts...
Ubuntu 8.10 2GB RAM
Nightlybuild from 24.05.09
fields in index:
extension, title, url, last-modified, size, some oter...
136 Documents in the index
when i do this:
http://192.168.105.54:8983/solr/itas/?q=summary%3Aplesnik&version=2.2&start=0&rows=10&inden
On May 26, 2009, at 9:39 AM, Jörg Agatz wrote:
fields in index:
extension, title, url, last-modified, size, some oter...
136 Documents in the index
when i do this:
http://192.168.105.54:8983/solr/itas/?q=summary%3Aplesnik&version=2.2&start=0&rows=10&indent=on&wt=velocity&
>
>
> Sounds like it's just purely working out the basics of what field(s) you
> want to search on, and indexing them properly.
>
>Erik
>
>
coud bee, but when i use the admin page to search.. it works...
I search in the field "summary" the word "plesnik"
so i doo "*summary:plesnik*" in th
On May 26, 2009, at 9:59 AM, Jörg Agatz wrote:
Sounds like it's just purely working out the basics of what
field(s) you
want to search on, and indexing them properly.
Erik
coud bee, but when i use the admin page to search.. it works...
I search in the field "summary" the word "p
ok, now i understand.. the phraser fram solrtis, or the phraser, that
solrtis is using... do not work with: (fieldname:term)
but how i can change the default search field from solrtis?
and how can i search in a special field?
Change /itas to use "lucene" (or remove the defType parameter altogethe
On May 26, 2009, at 10:13 AM, Jörg Agatz wrote:
ok, now i understand.. the phraser fram solrtis, or the phraser, that
solrtis is using... do not work with: (fieldname:term)
To be precise... "parser".
but how i can change the default search field from solrtis?
and how can i search in a specia
601 - 700 of 2416 matches
Mail list logo