Hi, can you post your final configuration?
On Tue, Jun 30, 2015 at 9:57 AM, ssharma7...@gmail.com
ssharma7...@gmail.com wrote:
davidphilip cherian Alessandro Benedetti,
Thanks for you feedback links, I was able to get the suggestions from
suggester component.
Thanks Regards,
Sachin
God damn. Thank you.
*ashamed*
Am 30.06.2015 00:21 schrieb Erick Erickson:
Try not putting it in double quotes?
Best,
Erick
On Mon, Jun 29, 2015 at 12:22 PM, Thomas Michael Engelke
thomas.enge...@posteo.de wrote:
A friend and I are trying to develop some software using Solr in the
Hello.
I have a question about the Solr Data Import Handler. I'm using Solr 5.2.1
on a Linux server with 32G ram.
I have five different collections, and for each collection, I'm trying to
import data from a MySQL database. All of the MySQL queries work properly in
MySQL, and previously I was
Thanks Eric and Upayavira for your inputs.
Is there a way i can associate this to a unique id of document, either
using schema browser or TermsComponent?
Best Regards,
Dinesh Naik
On Tue, Jun 30, 2015 at 2:55 AM, Upayavira u...@odoko.co.uk wrote:
Use the schema browser on the admin UI, and
On Tue, 2015-06-30 at 16:39 +1000, Caroline Hind wrote:
We have very recently upgraded from SOLR 4.1 to 5.2.1, and at the same
time increased the physical RAM from 24Gb to 96Gb. We run multiple
cores on this one server, approximately 20 in total, but primarily we
have one that is huge in
davidphilip cherian Alessandro Benedetti,
Thanks for you feedback links, I was able to get the suggestions from
suggester component.
Thanks Regards,
Sachin Vyas.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-Suggester-not-working-tp4214086p4214873.html
Sent from
Thank you Erick.
This solution fits part of our queries, will adopt it for those. Yet we
have use-cases in which the results can not be cached.
Everyone,
What do you think about our assumptions and conclusions?
As a general rule of thumb, at least in our case, would you please
comment
on the
You would have to have a separate instance of the update processor, each
with one of the words.
Or, you could code a JavaScript script with the stateless script update
processor that has the long list or words and replacements as two arrays or
an array of objects, and then iterate through the
I am out of the office until 07/06/2015.
I'll be out of the office through July 4th.
Please contact Jason Brown for any pressing JAS Team related items.
Note: This is an automated response to your message Re: Correcting text
at index time sent on 6/30/2015 8:55:16 PM.
This is the only
Hi Erick,
This is mainly for debugging purpose. If i have 20M records and few fields
in some of the documents are not indexed as expected or something went
wrong during indexing then how do we pin point the exact issue and fix the
problem?
Best Regards,
Dinesh Naik
On Tue, Jun 30, 2015 at 5:56
bq: The type of queries that are run can return anything from 1
million to 9.5 million documents, and typically run for anything from
20 to 45 minutes.
Uhhh, are you literally setting the rows parameter to over 9.5M and
getting that many docs all at once? Or is that just numFound and
you're
I would like to add some consideration if possible.
I find the field type really hard analysed, are you sure is this ok with
your suggestions requirement ?
Usually is better to keep the field for suggestion as less analysed as
possible and then play with the different type of suggesters.
If you
Update: regarding the solrj changelog I found this:
-
https://cwiki.apache.org/confluence/display/solr/Major+Changes+from+Solr+4+to+Solr+5
and this:
-
https://issues.apache.org/jira/browse/SOLR/component/12324331?selectedTab=com.atlassian.jira.jira-projects-plugin:component-changelog-panel
On
Vincenzo D'Amore,
The following is my (CURRENT) Working Final Configuration:
*Scheme.xml*
fields
.
.
field name=text type=c_text indexed=true stored=true
termVectors=true termPositions=true termOffsets=true /
field name=document_name type=c_document_name indexed=true
stored=true
Two very quick questions:
1 how big is your transaction log? Well, do you even have one? If
Solr is abnormally terminated, it'll replay the tlog on startup. The
scenario here would be something like you were running DIH without any
kind of hard commit specified and killed Solr for some reason.
I've actually seen this happen right in front of my eyes in the
field. However, that was a very high-performance environment. My
assumption was that fragmented index files were causing more disk
seeks especially for the first-pass query response in distributed
mode. So, if the problem is similar,
In short, not unless you want to get into low-level Lucene coding.
Inverted indexes are, well, inverted so their very structure makes
this difficult. It looks like this:
But I'm not convinced yet that this isn't an XY problem. What is the
high-level problem you're trying to solve here? Maybe
Hi All,
I have a bunch of java clients connecting to a solrcloud cluster 4.8.1 with
Solrj 4.8.0.
The question is, I have to switch clients and cluster to the new version at
same time?
Could I upgrade the cluster and in the following months upgrade clients?
BTW, looking at Solrj 5.2.1 I have seen
Pesky computers, they keep doing exactly what I tell 'em to do, not
what I mean ;)
I'll open a JIRA for making Solr DWIM-compliant, Do What I Mean ;) ;)
On Tue, Jun 30, 2015 at 4:17 AM, Thomas Michael Engelke
thomas.enge...@posteo.de wrote:
God damn. Thank you.
*ashamed*
Am 30.06.2015
Hi,
Is it possible to restrict the result returned by Suggeter to selected
fields only?
i.e. Currently, Suggester returns data in following structure (XML),
Can I restrict the Solr (5.1) Suggestor to return ONLY term EXCLUDE
long name=weight
str name=payload/ as per Suggeter result XML below ?
Hi,
I have the following Solr 5.1 configuration:
*schema.xml*
fields
.
.
field name=text type=c_text indexed=true stored=true
termVectors=true termPositions=true termOffsets=true /
field name=document_name type=c_document_name indexed=true
stored=true required=true multiValued=false /
Dinesh:
This is what the admin/analysis page is for. It shows you exactly
what tokens are produced by what steps in the analysis chain.
That would be far better than trying to analyze the indexed
terms.
Best,
Erick
On Tue, Jun 30, 2015 at 8:35 AM, dinesh naik dineshkumarn...@gmail.com wrote:
Thanks Sachin Vyas.
Maybe I have found a typo, but there is a closed comment -- alone at end
of tag
str name=spellcheck.collatefalse/str --
On Tue, Jun 30, 2015 at 2:09 PM, ssharma7...@gmail.com
ssharma7...@gmail.com wrote:
Vincenzo D'Amore,
The following is my (CURRENT) Working Final
Vincenzo D'Amore,
Yes You are right, it's a typo, I missed it while cleaning the XML to put on
the Solr-User list.
But, *REMOVE *the following line, this was not used in my Solr 5.1
configuration:
* str name=spellcheck.collatefalse/str-- *
Regards,
Sachin Vyas.
--
View this message in
Erick,
Many thanks for your reply.
1. The file solr.log does not show any errors, however, there is a file
solr.log.8 which is 5MB and has a ton of text that was trying to index, but
there was an invalid date error. I fixed that. Is it possible that Solr
keeps trying to use that log? Can I
Am I wrong or the current type of default IndexDirectory is the
NRTCachingDirectoryFactory since Solr 4.x ?
If I remember well this Factory is creating a Directory implementation
built on top of a MMapDirectory.
In this case we should rely on the Memory Mapping Operative System feature
to properly
Hi,
I have the following Solr 5.1 configuration:
*schema.xml*
fields
.
.
field name=text type=c_text indexed=true stored=true
termVectors=true termPositions=true termOffsets=true /
field name=document_name type=c_document_name indexed=true
stored=true required=true multiValued=false /
Actually what you are asking does not make any sense.
Solr response is returning that data structure because it must return as
much as possible.
It is responsibility of the client to get what it needs from the response.
Talking about the Java Client, I contributed the SolrJ code to parse the
1 Not solr log. The transaction log. If it is present it'll be a
child directory of your data directory
called tlog, a sibling to your index directory. And big here is
gigabytes. And yes, you can
just nuke it if you want. You get one automatically if you are using SolrCloud.
2 OK, it was a long
This will be pretty much unworkable for any large corpus. The
DocumentDictionaryFactory
builds its index by reading the stored value from every document
in your index to put into a sidecar Solr index (for free text suggester).
This can take many minutes so doing this on every commit is an
Hi Erick,
I agree with you.
But i was checking if we could get hold on the whole document (to see all
analyzed field values) .
There might be chances that field value is common for multiple documents .
In such cases it will be difficult to backtrack which document has the
issue . Because
Alessandro Benedetti,
Thanks for the update.
Actually, what I meant by - Is it possible to restrict the result returned
by Suggeter to selected
fields only? was like option of fl available for querying (/select) in
Solr, wherein there could be some fields as defined in schema.xml, but we
can
Thanks to all for the help - it's now storing text and I can search and get
results just before in 4.6, but I cannot get snippets to appear when I ask
for highlighting.
when I add documents, here is the URL my script generates:
Hi,
I am currently investigating the queries with a much small index size (1M)
to see the grouping, faceting on the performance degradation. This will
allow me to do a lot of tests in a short period of time.
However, it looks like the query is executed much faster the second time.
This is tested
On 6/29/2015 2:48 PM, Reitzel, Charles wrote:
I take your point about shards and segments being different things. I
understand that the hash ranges per segment are not kept in ZK. I guess I
wish they were.
In this regard, I liked Mongodb, uses a 2-level sharding scheme. Each shard
On 6/30/2015 6:40 AM, Vincenzo D'Amore wrote:
I have a bunch of java clients connecting to a solrcloud cluster 4.8.1 with
Solrj 4.8.0.
The question is, I have to switch clients and cluster to the new version at
same time?
Could I upgrade the cluster and in the following months upgrade
Instead of your immense schema, can you give us the details of the
Highlight you are trying to use ?
And how you are trying to use it ?
Which client ? Direct APi calls ?
let us know!
Cheers
2015-06-30 15:10 GMT+01:00 Mark Ehle marke...@gmail.com:
Thanks to all for the help - it's now storing
Hi Alessandro,
I am able to check the field wise analyzed results.
I was interested in getting the complete document.
As Erick mentioned -
Reconstructing the doc from the
postings lists isactually quite tedious. The Luke program (not request
handler) has a
function that
does this, it's not fast
On 6/25/2015 2:20 AM, Mikhail Khludnev wrote:
On Tue, Jun 23, 2015 at 9:23 AM, Rudolf Grigeľ grige...@gmail.com wrote:
How can I prevent opening new searcher after
every delete statement ?
comment updateLog tag in solrconfig.xml (it always help)
The presence or absence of the updateLog
Do you have the original document available ? Or stored in the field of
interest ?
Should be quite an easy test to reproduce the Analysis simply using the
analysis tool Upaya and Erick suggested.
Just use your real document content and you will see how it is exactly
analysed.
Cheers
2015-06-30
Alessandro -
Someone asked to see the schema, I posted it. Should I have just attached
it? Does this mailing list support that?
I am by no means a SOLR expert. I am a PHP coder who wrote a
(very-much-loved by our library staff and patrons) newspaper indexing tool
that I am trying to update. I
No worries, it is not a big deal you shared the schema.xml, I said that
only because it turned the mail a little hard to read, anyway, in my
opinion the query is correct, so the problem should reside elsewhere.
Can you share the solrconfig.xml piece for your select request handler ?
Probably it
But what do you mean with the complete document ? Is it not available
anymore ?
So you have lost your original document and you want to try to reconstruct
from the index ?
2015-06-30 16:05 GMT+01:00 dinesh naik dineshkumarn...@gmail.com:
Hi Alessandro,
I am able to check the field wise
I'd set filterCache and queryResultCache to zero (size and autowarm count)
Leave documentCache alone IMO as it's used to store documents on disk
as the pass through various query components and doesn't autowarm anyway.
I'd think taking it out would skew your results because of multiple
From the log fragment it's at least worth further investigation.
You've had 4 searchers open in less than 1/2 second. That's
horribly fast, but you already know that...
Let's see the DIH configs, perhaps there's something
innocent-seeming there that's causing this. Or, there's
a bug somewhere.
Test_results_round_2.doc
http://lucene.472066.n3.nabble.com/file/n4215016/Test_results_round_2.doc
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4215016.html
Sent from the Solr - User mailing list archive
Hi Alessandro,
Lets say I have 20M documents with 50 fields in each.
I have applied text analysis like compression,ngram,synonym expansion on these
fields.
Checking individually field level analysis can be easily done via
admin/analysis . But I need to do 50 times analysis check for these
On Tue, Jun 30, 2015, at 04:42 PM, Shawn Heisey wrote:
On 6/29/2015 2:48 PM, Reitzel, Charles wrote:
I take your point about shards and segments being different things. I
understand that the hash ranges per segment are not kept in ZK. I guess I
wish they were.
In this regard, I
Do you mean this?:
requestHandler name=/select class=solr.SearchHandler
lst name=defaults
str name=echoParamsexplicit/str
int name=rows10/int
!-- str name=dftext/str --
/lst
/requestHandler
On Tue, Jun 30, 2015 at 12:11 PM, Alessandro Benedetti
Something's not right here. Your query does not specify any field,
you have q=JOHN GRAP. Which should parse as
q=default_search_field:JOHN GRAP.
BUT, you've commented the default field out of the select request handler.
I don't _think_ that there's a default in the code, but I've been surprised
Here's what I get:
{
responseHeader:{
status:0,
QTime:27,
params:{
echoParams:all,
fl:year,
df:_text_,
indent:true,
q:\JOHN GRAP\,
hl.simple.pre:em,
debug:true,
hl.simple.post:/em,
hl.fl:text,
wt:json,
hl:true,
you can call the same API as the admin UI does. Pass it strings, it
returns tokens in json/xml/whatever.
Upayavira
On Tue, Jun 30, 2015, at 06:55 PM, Dinesh Naik wrote:
Hi Alessandro,
Lets say I have 20M documents with 50 fields in each.
I have applied text analysis like
Hi All,
I did many tests with very consistent test results. Each query was executed
after re-indexing, and only one request was sent to query the index. I
disabled filterCache and queryResultCache for this test based on Erick's
recommendation.
The test document was posted to this email list
Hi Paden,
I believe you could use a PatternReplaceFilterFactory (
http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/pattern/PatternReplaceFilterFactory.html
)
configured in your fieldtype that could replace '' with '\\' at index time.
Thanks.
Regards,
Nitin
Hello,
I'm having a slight Catch-22 scenario going on with my Solr indexing
process. I'm using the DataImportHandler to pull a filepath from a database.
The problems is that Windows filepaths have the backslash character inside
their paths.
\\some\filepath
So when insert this data into MySQL
Thanks for your explanation.
Right out of your head, are there any other options which prevent
getting a cursorMark?
Yes, that was also my idea to set up a separate request handler
for harvesting without timeAllowed.
As Shawn suggested, a short note about this should go into the documentation.
We need to work out why your performance is bad without optimise. What
version of Solr are you using? Can you confirm that your config is using
the TieredMergePolicy?
Upayavira
Oe, Jun 30, 2015, at 04:48 AM, Summer Shire wrote:
Hi Upayavira and Erick,
There are two things we are talking
Hi,
I am very new to SOLR, and would appreciate some guidance if anyone has the
time to offer it.
We have very recently upgraded from SOLR 4.1 to 5.2.1, and at the same time
increased the physical RAM from 24Gb to 96Gb. We run multiple cores on this one
server, approximately 20 in total,
Hi all
Thanks for the replies. So there's no getting away from doing it on my own
then...
@Jack: I need to replace a whole list of shortened words... It would make a
crazy regex (which I incidentally wouldn't even know how to formulate).
Cheers
A.
--
View this message in context:
bq: The index size is only 1 M records. A 10 times of the record size ( 10M)
will likely bring the total response time to 1 second
This is an extrapolation you simply cannot make. Plus you cannot really tell
anything from just a few queries about system performance. In fact you must
disregard
It _looks_ like you're searching against _text_ and trying to
highlight on text. On a very brief grep of all the Java code I don't
see _text_ defined anywhere (of course I could be missing something
here).
So none of this makes sense.
you have no df field defined, yet you're getting a default
61 matches
Mail list logo