Thanks for the links - I've put a posting on the Tika ML.
I've just checked and we using tika-0.2.jar - does anyone know which
version I can use with solr 1.3?
Is there any info on upgrading from this far back to the latest
version - is it even possible? or would I need to re-index everything?
O
I've been toying with the idea of setting up an experiment to index a large
document set 1+ TB -- any thoughts on an open data set that one could use
for this purpose?
Thanks.
On Mon, Jan 16, 2012 at 5:00 PM, Burton-West, Tom wrote:
> Hello ,
>
> Searching real-time sounds difficult with that am
Would it make sense to Index on the cloud and periodically [2-4 times
/day] replicate the index at our server for searching .Which service to go
with for solr Cloud Indexing ?
Any good and tried services?
Regards
Sujatha
I remembered there is another implementation using lucene index file as the
look up table not the in memory FST
FST has its advantage in speed but if you writes documents during runtime,
reconstructing FST may cause performance issue
On Tue, Jan 17, 2012 at 11:08 AM, Robert Muir wrote:
> looks l
looks like https://issues.apache.org/jira/browse/SOLR-2888.
Previously, FST would need to hold all the terms in RAM during
construction, but with the patch it uses offline sorts/temporary
files.
I'll reopen the issue to backport this to the 3.x branch.
On Mon, Jan 16, 2012 at 8:31 PM, Dave wrot
According to http://wiki.apache.org/solr/Suggester FSTLookup is the least
memory-intensive of the lookupImpl's. Are you suggesting a different
approach entirely or is that a lookupImpl that is not mentioned in the
documentation?
On Mon, Jan 16, 2012 at 9:54 PM, qiu chi wrote:
> you may disable
you may disable FST look up and use lucene index as the suggest method
FST look up loads all documents into the memory, you can use the lucene
spell checker instead
On Tue, Jan 17, 2012 at 10:31 AM, Dave wrote:
> I've tried up to -Xmx5g
>
> On Mon, Jan 16, 2012 at 9:15 PM, qiu chi wrote:
>
> >
I've tried up to -Xmx5g
On Mon, Jan 16, 2012 at 9:15 PM, qiu chi wrote:
> What is the largest -Xmx value you have tried?
> Your index size seems not very big
> Try -Xmx2048m , it should work
>
> On Tue, Jan 17, 2012 at 9:31 AM, Dave wrote:
>
> > I'm trying to figure out what my memory needs are
What is the largest -Xmx value you have tried?
Your index size seems not very big
Try -Xmx2048m , it should work
On Tue, Jan 17, 2012 at 9:31 AM, Dave wrote:
> I'm trying to figure out what my memory needs are for a rather large
> dataset. I'm trying to build an auto-complete system for every
>
I'm trying to figure out what my memory needs are for a rather large
dataset. I'm trying to build an auto-complete system for every
city/state/country in the world. I've got a geographic database, and have
setup the DIH to pull the proper data in. There are 2,784,937 documents
which I've formatted
I don't see why not. I'm assuming a *nix system here so when Solr
updated an index, any deleted files would hang around.
But I have to ask why bother with the Embedded server in the
first place? You already have a Solr instance up and running,
why not just query that instead, perhaps using SolrJ?
Why not just up the maxBooleanClauses parameter in solrconfig.xml?
Best
Erick
On Sat, Jan 14, 2012 at 1:41 PM, Dmitry Kan wrote:
> OK, let me clarify it:
>
> if solrconfig has maxBooleanClauses set to 1000 for example, than queries
> with clauses more than 1000 in number will be rejected with th
What have you tried and what have the results been? Because this is well
within Solr's oob capabilities.
Best
Erick
On Fri, Jan 13, 2012 at 10:37 AM, vibhoreng04 wrote:
> Hi ,
>
> I want to do a 800 words multiple search across the index of 1 million
> records.
> Can anyone suggest something whi
I don't know where the commas are coming from, as far as I know that's
not part of Solr. You must have the catchall field defined
with 'multiValued="true"" ', so if you set the increment gap to 0, that
should help.
When you do that, what does your return look like?
Best
Erick
P.S. It's rather un
For future reference, I had this problem, and it was the debug statements in
commons HTTP that were printing out all the binary data to the log, but my
console appender was set to INFO so I wasn't seeing them. Setting http
commons to INFO level fixed my speed issue (two orders of magnitude faster).
Did this ever progress? Shall we make a jira?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Detecting-replication-slave-health-tp677584p3664739.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hello ,
Searching real-time sounds difficult with that amount of data. With large
documents, 3 million documents, and 5TB of data the index will be very large.
With indexes that large your performance will probably be I/O bound.
Do you plan on allowing phrase or proximity searches? If so, you
Hi,
is it possible to use the same index in a solr webapp and additionally in a
EmbeddedSolrServer? The embbedded one would be read only.
Thank you.
Hi,
I'm not sure which version of Solr/Tika you're using but I had a similar
experience which turned out to be the result of a design change to PDFBox.
https://issues.apache.org/jira/browse/SOLR-2886
Tricia
On Sat, Jan 14, 2012 at 12:53 AM, Wayne W wrote:
> Hi,
>
> we're using Solr running on
David,
The spellchecker normally won't give suggestions for any term in your index.
So even if "wever" is misspelled in context, if it exists in the index the
spell checker will not try correcting it. There are 3 workarounds:
1. Use the patch included with SOLR-2585 (this is for Trunk/4.x only
Hi Herman,
Try adding this to your replication config:
00:00:10
See also http://search-lucene.com/?q=commitReserveDuration&fc_project=Solr
Otis
Performance Monitoring SaaS for Solr -
http://sematext.com/spm/solr-performance-monitoring/index.html
- Original Message -
> From: H
We are at times having some difficulty achieving a 'successful' replication.
Our Operations personnel have reported the following behavior (which I cannot
attest to): A master has a set of segment files (let's say 25). A slave then
polls the master, get the list of segment files that differ an
Wayne,
Have you asked on Tika's ML?
You may also want to watch https://issues.apache.org/jira/browse/SOLR-2901
Otis
Performance Monitoring SaaS for Solr -
http://sematext.com/spm/solr-performance-monitoring/index.html
- Original Message -
> From: Wayne W
> To: solr-user@lucene.
Johnny,
If you are indexing a catalog of songs and artists you can write a query parser
or search component that recognizes known things like song ("you must have
"bohemian rhapsody" in your catalog) or artist names (you must have the exact
string "queen" in your catalog) or even their combinat
Hello,
>
> From: mustafozbek
>
>All documents that we use are rich text documents and we parse them with
>tika. we need to search real time.
Because of real-time requirement, you'll need to use unreleased/dev version of
Solr.
>Robert Stewart wrote
>> Any idea
Hi Everyone,
Please help out if you know what is going on.
We are upgrading to Solr 3.5 (from 1.4.1) and busy with a Re-Index and Test on
our data.
Everything seems OK, but Date Fields seem to be "broken" when using with the
MoreLikeThis handler
(I also saw the same error on Date Fields using
okay, thx =)
but i replace it now in my data-config ;)
-
--- System
One Server, 12 GB RAM, 2 Solr Instances, 8 Cores,
1 Core with 45 Million Documents other Cores < 200.000
- Solr1 for Search-Requests - commit every Minu
(12/01/16 19:43), stockii wrote:
Why does this not work?
OR
i dont know where is my error?
i only want to replace comma with a blank ...
Try t
Why does this not work?
OR
i dont know where is my error?
i only want to replace comma with a blank ...
thx =)))
-
---
29 matches
Mail list logo