Hello Solr devs,
One thing we did recently in lucene that I would like to expose in Solr, is
add support for protected words to all stemmers.
So the way this works is that a TokenStream attribute 'KeywordAttribute' is
set, and all the stemfilters know to ignore tokens with this boolean value
On Tue, Mar 30, 2010 at 8:06 AM, Robert Muir rcm...@gmail.com wrote:
We have two choices:
* we could treat this stuff as impl details, and add protwords.txt support
to all stemming factories. we could just wrap the filter with a
keywordmarkerfilter internally.
* we could deprecate the
On Tue, Mar 30, 2010 at 8:33 AM, Yonik Seeley yo...@lucidimagination.comwrote:
It would also be nice to make the token categories generated by
tokenizers into tags (like StandardTokenizer's ACRONYM, etc). A
tokenizer that detected many of the properties could significantly
speed up analysis
On Tue, Mar 30, 2010 at 8:33 AM, Yonik Seeley yo...@lucidimagination.comwrote:
On Tue, Mar 30, 2010 at 8:06 AM, Robert Muir rcm...@gmail.com wrote:
We have two choices:
* we could treat this stuff as impl details, and add protwords.txt
support
to all stemming factories. we could just wrap
On Tue, Mar 30, 2010 at 10:07 AM, Robert Muir rcm...@gmail.com wrote:
Sorta unrelated too, but on the same topic of performance, I'd really like
to improve the indexing speed with the example schema, and thats my hidden
motivation here.
I think we've already significantly improved WDF and
On Tue, Mar 30, 2010 at 10:32 AM, Yonik Seeley
yo...@lucidimagination.comwrote:
Unfortunately not... it's normally something ad hoc like uploading a
big CSV file, etc.
There's also the very simplistic TestIndexingPerformance.
ant test -Dtestcase=TestIndexingPerformance -Dargs=-server
It absolutely is a better way to collaborate on development, especially in
conjunction with github:
http://github.com/apache/solr
HOWEVER, the merge of Lucene Solr has totally disrupted the git mirrors.
Who can fix this?
~ David Smiley
-
Author:
I've opened an issue for this:
https://issues.apache.org/jira/browse/INFRA-2580
-Yonik
http://www.lucidimagination.com
On Tue, Mar 30, 2010 at 11:27 AM, David Smiley (@MITRE.org)
dsmi...@mitre.org wrote:
It absolutely is a better way to collaborate on development, especially in
conjunction
Script to monitor Solr health including replication status
--
Key: SOLR-1855
URL: https://issues.apache.org/jira/browse/SOLR-1855
Project: Solr
Issue Type: New Feature
[
https://issues.apache.org/jira/browse/SOLR-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shawn Smith updated SOLR-1855:
--
Attachment: checksolr
I've attached a first pass implementation of this script: !checksolr!. It's
On Mar 30, 2010, at 8:33 AM, Yonik Seeley wrote:
On Tue, Mar 30, 2010 at 8:06 AM, Robert Muir rcm...@gmail.com wrote:
We have two choices:
* we could treat this stuff as impl details, and add protwords.txt support
to all stemming factories. we could just wrap the filter with a
[
https://issues.apache.org/jira/browse/SOLR-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851462#action_12851462
]
Shawn Smith edited comment on SOLR-1855 at 3/30/10 9:58 PM:
I've
[
https://issues.apache.org/jira/browse/SOLR-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851637#action_12851637
]
Jason Rutherglen commented on SOLR-1375:
{quote}Doesn't this hint at some of this
In Solr Cell, literals should override Tika-parsed values
-
Key: SOLR-1856
URL: https://issues.apache.org/jira/browse/SOLR-1856
Project: Solr
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/SOLR-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Harris updated SOLR-1856:
---
Attachment: SOLR-1856.patch
Initial patch. Notes:
* We allow literal values to override all other
[
https://issues.apache.org/jira/browse/SOLR-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851667#action_12851667
]
Chris Harris commented on SOLR-1633:
bq. It seems like a possible improvement here would
[
https://issues.apache.org/jira/browse/SOLR-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Harris updated SOLR-1856:
---
Description:
I propose that ExtractingRequestHandler / SolrCell literals should take
precedence over
[
https://issues.apache.org/jira/browse/SOLR-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lance Norskog closed SOLR-1803.
---
Resolution: Fixed
3 other issues go after this same problem - probably SOLR-1856 will win the
turtle
[
https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851700#action_12851700
]
Jonathan Rochkind commented on SOLR-1553:
-
Hoss, I would be EXTREMELY interested in
[
https://issues.apache.org/jira/browse/SOLR-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851722#action_12851722
]
Lance Norskog commented on SOLR-1842:
-
Could the DIH shut down all Datasources
[
https://issues.apache.org/jira/browse/SOLR-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851725#action_12851725
]
Lance Norskog commented on SOLR-1848:
-
Maybe there could be a Solr Apps project
[
https://issues.apache.org/jira/browse/SOLR-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851726#action_12851726
]
Lance Norskog commented on SOLR-1568:
-
Dublin Core includes conventions for encoding
22 matches
Mail list logo