from:"Koji Sekiguchi"

Re: Tokenizing managed synonyms

2020-07-06 Thread Koji Sekiguchi

I think the question makes sense as SynonymGraphFilterFactory accepts tokenizerFactory, he asked the managed version of SynonymGraphFilter could accept it as well. https://lucene.apache.org/solr/guide/8_5/filter-descriptions.html#synonym-graph-filter The answer seems to be NO. Koji On 2020/0

per field mm

2018-12-14 Thread Koji Sekiguchi

Hi, I have a use case that one of our customers wants to set different mm parameter per field, as in some fields of qf, unexpectedly many terms are produced because they are N-gram fields while in other fields, few terms are produced because they are normal text fields. If it is reasonable, I

Re: Implementing NeuralNetworkModel RankNet in Solr LTR

2018-09-19 Thread Koji Sekiguchi

his supported in Solr 7.4.0? Regards, Edwin On Wed, 19 Sep 2018 at 11:02, Koji Sekiguchi wrote: Hi, > https://github.com/airalcorn2/Solr-LTR#RankNet > > Has anyone tried on this before? And what is the format of the training > data that this model requires? I haven't tried

Re: Implementing NeuralNetworkModel RankNet in Solr LTR

2018-09-18 Thread Koji Sekiguchi

Hi, > https://github.com/airalcorn2/Solr-LTR#RankNet > > Has anyone tried on this before? And what is the format of the training > data that this model requires? I haven't tried it, but I'd like to inform you that there is another project of LTR we've been developed: https://github.com/LTR4L/

Re: Return only matched multi-valued field

2017-08-21 Thread Koji Sekiguchi

Hi, I don't think Lucene/Solr can know which field matches the query you posted. You should usually use Highlighter to know it. Koji On 2017/08/22 2:46, ruby wrote: Is there a way to return only the matched field from a multivalued field using filtering? -- View this message in context:

Re: Issues trying to boost phrase containing stop word

2017-07-19 Thread Koji Sekiguchi

Hi Shamik, I'm sorry but I don't understand why you use KeywordRepeatFilter. I think it's normal to create separate fields to solve this kind of problems. Why don't you have another separate field which has ShingleFilter as I mentioned in the previous reply? Koji On 2017/07/20 12:13, shamik w

Re: Issues trying to boost phrase containing stop word

2017-07-19 Thread Koji Sekiguchi

Hi Shamik, How about using ShingleFilter which constructs token n-grams from a token stream? http://lucene.apache.org/core/6_6_0/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilter.html As for "about dynamic block", ShingleFilter produces "about dynamic" and "dynamic block". Th

Re: Is there any particular reason why ExternalFileField is read from data directory

2017-06-29 Thread Koji Sekiguchi

Hi, ExternalFileField was introduced via SOLR-351. https://issues.apache.org/jira/browse/SOLR-351 The author thought values could optionally be updated often... I think it describes why it is read from not config, but datadir. Koji On 2017/06/29 17:17, apoorvqwerty wrote: Hi, As per the doc

Re: Filtering results by minimum relevancy score

2017-04-12 Thread Koji Sekiguchi

Hi Walter, May I ask a tangential question? I'm curious the following line you wrote: > Solr is a vector-space engine. Some early engines (Verity VDK) were probabilistic engines. Those do give an absolute estimate of the relevance of each hit. Unfortunately, the relevance of results is just no

Re: Classify document using bag of words

2017-03-26 Thread Koji Sekiguchi

Hi, I'm not sure that it can help you but I'd like to show you the link of an article which I wrote about document classification years ago: Comparing Document Classification Functions of Lucene and Mahout http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.

Re: Query/Field Index Analysis corrected but return no docs in search

2017-02-05 Thread Koji Sekiguchi

Hi Peter, I'm not sure if I can correctly see the result you attached, I think it sounds reasonable to me that you couldn't get search result, because your query 均匀肤色 is used as it is without being analyzed whereas the same string 均匀肤色 is tokenized as 均匀匀肤肤色 in the index. So it is obvious t

Re: How to train the model using user clicks when use ltr(learning to rank) module?

2017-02-02 Thread Koji Sekiguchi

Hi, NLP4L[1] has not only Learning-to-Rank module but also a module which calculates click model and converts it into pointwise annotation data. NLP4L has a comprehensive manual[2], but you may want to read "Click Log Analysis" section[3] first to see if it suits your requirements. Hope this h

Re: I cannot get phrases highlighted correctly without using the Fast Vector highlighter

2016-09-20 Thread Koji Sekiguchi

Hello Panagiotis, I'm sorry but it's a feature. As for hl.usePhraseHighlighter parameter, when you turn off it, you may get only foo or bar highlighted in your snippets. Koji On 2016/09/18 15:55, Panagiotis T wrote: I'm using Solr 6.2 (tried with 6.1 also) I created a new core and the only c

Re: Query Elevation

2016-07-11 Thread Koji Sekiguchi

Hello, I'm curious, why do you want the particular document to place second, not top, of the result for a particular query? Sorry this isn't the answer for your question, but I think you can implement it rather easy if you study the existing query elevation. Koji On 2016/07/08 19:59, Swathika

Re: FW: Difference Between Tokenizer and filter

2016-03-02 Thread Koji Sekiguchi

Hi, ... must have one and only one and it can have zero or more s. From the point of view of the rules, your ... is not correct because it has more than one and ... is not correct as well because it has no . Koji On 2016/03/02 20:25, G, Rajesh wrote: Hi Team, Can you please clarify the bel

Re: Help With Phrase Highlighting

2015-12-01 Thread Koji Sekiguchi

Hi Teague, I couldn't understand the part of "document size" in your question, but if you'd like Solr to return snippet My search phrase instead of My search phrase you should use FastVectorHighlighter. In case use of FVH, your highlight field (hl.fl=text) need to be indexed with options te

Re: Tokenize ShingleFilterFactory results and apply filters to tokens

2015-10-15 Thread Koji Sekiguchi

Hi Vitaly, I'm not sure I understand you correctly, why don't you put EdgeNGramFilter just after ShingleFilter? That is: Koji On 2015/10/15 22:47, vitaly bulgakov wrote: I want to rephrase my question I asked in another post. As far as I understand filter ShingleFilterFactory creates shin

Re: highlighting

2015-10-01 Thread Koji Sekiguchi

Hi Mark, I think I saw similar requirement recently in mailing list. The feature sounds reasonable to me. > If not, how do I go about posting this as a feature request? JIRA can be used for the purpose, but there is no guarantee that the feature is implemented. :( Koji On 2015/10/01 20:07,

Re: solr.SynonymFilterFactory

2015-09-17 Thread Koji Sekiguchi

Hi Vincenzo, By intuition, regardless of what value you set for attributes such as expand or ignoreCase, I think synonym records that LHS==RHS are meaningless. That is, you can remove these lines. Koji On 2015/09/17 16:51, Vincenzo D'Amore wrote: Hello, this may be a silly question. I have

Re: How to export the list of terms indexed in Solr?

2015-04-29 Thread Koji Sekiguchi

Hi brent3600, You can use NLP4L for this purpose. NLP4L is good at counting the number of words not only in whole index but also in a set of documents. There is a tutorial for this function. Count the number of words http://nlp4l.github.io/tutorial_ja.html#useNLP Sorry but the tutorial is writ

Re: Sorting and Rerank

2015-03-25 Thread Koji Sekiguchi

Hi, You're right. Those sets are same each other, only documents order is different. Koji On 2015/03/26 0:53, innoculou wrote: If I do an initial search without any field sorting; and then do the exact same query but also sort one field will I get the same result set in the subsequent query b

Re: Lucene cosine similarity score for more like this query

2015-02-03 Thread Koji Sekiguchi

Lucene uses TFIDFSimilarity class to calculate the similarity. It is implemented on the idea of cosine measurement but it modifies the cosine formula. Please take a look at "Lucene Practical Scoring Function" in the following Javadoc: http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/

Re: [ANN] word2vec for Lucene

2014-11-20 Thread Koji Sekiguchi

ays rather pleasant for the LSI/LSA-like approach, but precisely this is mathematically opaque. Maybe it's more a question of presentation. Paul On 20 nov. 2014, at 16:24, Koji Sekiguchi wrote: Hi Paul, I cannot compare it to SemanticVectors as I don't know SemanticVectors. But w

Re: [ANN] word2vec for Lucene

2014-11-20 Thread Koji Sekiguchi

At least I see more transparent math in the web-page. > Maybe this helps a bit? > > SemanticVectors has always rather pleasant for the LSI/LSA-like approach, but > precisely this is mathematically opaque. > Maybe it's more a question of presentation. > > Paul > >

Re: [ANN] word2vec for Lucene

2014-11-20 Thread Koji Sekiguchi

Rome'), and vector('king') - vector('man') + vector('woman') is close to vector('queen') Thanks, Koji (2014/11/20 20:01), Paul Libbrecht wrote: > Hello Koji, > > how would you compare that to SemanticVectors? > > paul > > On

[ANN] word2vec for Lucene

2014-11-20 Thread Koji Sekiguchi

Hello, It's my pleasure to share that I have an interesting tool "word2vec for Lucene" available at https://github.com/kojisekig/word2vec-lucene . As you can imagine, you can use "word2vec for Lucene" to extract word vectors from Lucene index. Thank you, Koji -- http://soleami.com/blog/compar

Re: boosting words from specific list

2014-09-29 Thread Koji Sekiguchi

Hi Ali, I don't think Solr has such function OOTB. One way I can think of is that you can implement UpdateRequestProcessor. In processAdd() method of the UpdateRequestProcessor, as you can read field values, you can calculate the total score and copy the total score to a field e.g. total_score. T

Re: statuscode list

2014-09-07 Thread Koji Sekiguchi

Hi Jan, (2014/09/05 21:01), Jan Verweij - Reeleez wrote: Hi, If I'm correct you will get a statuscode="0" in the response if you use XML messages for updating the solr index. I think you mean by statuscode="0" is status=0 here. 07 Is there a list of possible other statuscodes you can re

Re: ExternalFileFieldReloader and commit

2014-08-05 Thread Koji Sekiguchi

Hi Peter, It seems like a bug to me, too. Please file a JIRA ticket if you can so that someone can take it. Koji -- http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html (2014/08/05 22:34), Peter Keegan wrote: When there are multiple 'external file field

Re: Understanding the Debug explanations for Query Result Scoring/Ranking

2014-07-24 Thread Koji Sekiguchi

Hi, In addition, this might be useful: Fundamentals of Information Retrieval, Illustration with Apache Lucene https://www.youtube.com/watch?v=SCsS5ePGmCs This video is about 40 minutes long, but you can fast forward to 24:00 to learn scoring based on vector space model and how Lucene customize

Re: Contiguous Phrase Highlighting Example

2014-07-17 Thread Koji Sekiguchi

Hi Teague, If you want phrase-unit tagging for highlighter, you need to use FastVectorHighlighter instead of the ordinary Highlighter. To turn on FVH, set hl.useFastVectorHighlighter=on when querying. In addition, when indexing, you need to set termVectors=on, termPositions=on and termOffsets=on

Re: OCR - Saving multi-term position

2014-07-02 Thread Koji Sekiguchi

Hi Manuel, I think OCR error correction is one of well-known NLP tasks. I'd thought it could be implemented in the past by using Lucene. This is a brief idea: 1. You have got a Lucene index. This existing index is made from correct (i.e. error free) documents that are same domain of OCR documen

Re: Restriction on type of uniqueKey field?

2014-07-01 Thread Koji Sekiguchi

In addition, KeywordTokenizer can be seemingly used but it should be avoided for unique key field. One of my customers that used it and they had got OOM during a long term indexing. As it was difficult to find the problem, I'd like to share my experience. Koji -- http://soleami.com/blog/comparing

Re: Multiple highlight snippet for single field

2014-05-16 Thread Koji Sekiguchi

Hi Bijan, Have you tried to set hl.maxAnalyzedChars parameter to larger number? hl.maxAnalyzedChars http://wiki.apache.org/solr/HighlightingParameters#hl.maxAnalyzedChars As the default value of the parameter is 51200, if the second "Andy" is at the end paragraph of your large stored field, the

Re: Searching for tokens does not return any results

2014-05-01 Thread Koji Sekiguchi

Hi Yetkin, welcome! I think StandardAnalyzer of Lucene is the problem you are facing. Why don't you have another field using StandardAnalyzer and see how it tokenizes CRD_PROD on Solr admin GUI? I forgot in the detail but we can use Lucene's Analyzer in schema.xml something like this:

Re: AND not as a boolean operator in Phrase

2014-03-25 Thread Koji Sekiguchi

(2014/03/26 2:29), abhishek jain wrote: hi friends, when i search for "A and B" it gives me result for A , B , i am not sure why? Please guide how can i exact match when it is within phrase/quotes. Generally speaking (w/ LuceneQParser), if you want phrase match results, use quotes, i.e. q="A

Re: Solr & Nutch

2014-01-28 Thread Koji Sekiguchi

1. Nutch follows the links within HTML web pages to crawl the full graph of a web of pages. In addition, I think Nutch has PageRank-like scoring function as opposed to Lucene/Solr, those are based on vector space model scoring. koji -- http://soleami.com/blog/mahout-and-machine-learning-traini

Re: document contained more than 100000 characters

2013-12-25 Thread Koji Sekiguchi

Hi, I'm not sure but you probably met Tika exception. Have you checked Apache Tika mailing list? Hmm, just now I googled "Your document contained more than 10 characters", I found a page in StackOverFlow. According to it, there is API to change the limit. But I don't know whether Solr can ch

Re: indexing from bowser

2013-12-16 Thread Koji Sekiguchi

Hi, (13/12/16 19:46), Nutan wrote: how to index pdf,doc files from browser? I think you can index from browser. If you said that this query is used for indexing : curl "http://localhost:8080/solr/document/update/extract?literal.id=12&commit=true"; -F"myfile=@C:\solr\document\src\test1\Codin

Re: Passing a Parameter to a Custom Processor

2013-12-13 Thread Koji Sekiguchi

Hi Dileepa, The stanbolInterceptor processor chain will be used in multiple request handlers. Then I will have to pass the stanbol.enhancer.url param in each of those request handler which will cause redundant configurations. Therefore I need to pass the param to the processor directly. But whe

Re: SOLRJ API to do similar CURL command execution

2013-11-13 Thread Koji Sekiguchi

(13/11/13 22:25), Anupam Bhattacharya wrote: How can I post the whole XML string to SOLR using its SOLRJ API ? The source code of SimplePostTool would be of some help: http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/util/SimplePostTool.html koji -- http://soleami.com/blog/auto

Re: count links pointing to id

2013-11-10 Thread Koji Sekiguchi

(13/11/10 3:43), Andreas Owen wrote: I have a multivalue field with links pointing to ids of solrdocuments. I would like calculate how many links are pointing to each document und put that number into the field links2me. How can I do this, I would prefer to do it with a query and the updater so s

Re: solr sort facets by name

2013-11-05 Thread Koji Sekiguchi

(13/11/06 9:00), PeterKerk wrote: By default solr sorts facets by the amount of hits for each result. However, I want to sort by facetnames alphabetically. Earlier I sorted the facets on the client or via my .NET code, however, this time I need solr to return the results with alphabetically sorte

Re: Unable to add mahout classifier

2013-10-31 Thread Koji Sekiguchi

Caused by: java.lang.ClassCastException: class com.mahout.solr.classifier.CategorizeDocumentFactory at java.lang.Class.asSubclass(Unknown Source) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:433) at org.apache.solr.core.SolrResourceLoad

Re: Return the synonyms as part of Solr response

2013-10-30 Thread Koji Sekiguchi

Hi Siva, (13/10/30 18:12), sivaprasad wrote: Hi, We have a requirement where we need to send the matched synonyms as part of Solr response. I don't think that Solr has such function. Do we need to customize the Solr response handler to do this? So the answer is yes. koji -- http://soleami

Re: Unable to add mahout classifier

2013-10-30 Thread Koji Sekiguchi

(13/10/30 22:09), lovely kasi wrote: Hi, I made few changes to the solrconfig.xml, created a jar file,added it to the lib folder of the solr and tried to start it. THe changes in the solrconfig.xml are LEAD_NOTES category Others naiveBayesModel

Re: Help on solr more like this functionality

2013-10-26 Thread Koji Sekiguchi

Hi Suren, (13/10/25 23:36), Suren Raju wrote: Hi, We are trying to solve a business problem by performing solr more like this query. We are able to perform the more like this search. We have a specific use case that requires different boost on different match fields. Say i do more like this bas

Re: how to debug my own analyzer in solr

2013-10-21 Thread Koji Sekiguchi

Hi Mingz, If you use Eclipse, you can debug Solr with your plugin like this: # go to Solr install directory $ cd $SOLR $ ant run-example -Dexample.debug=true Then connect the JVM from Eclipse via remote debug port 5005. Good luck! koji (13/10/21 18:58), Mingzhu Gao wrote: More information

Re: ExtractRequestHandler, skipping errors

2013-10-18 Thread Koji Sekiguchi

1) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) ... 16 more On Thu, Oct 17, 2013 at 5:19 PM, Koji Sekiguchi wrote: Hi Roland

Re: ExtractRequestHandler, skipping errors

2013-10-17 Thread Koji Sekiguchi

Hi Roland, (13/10/17 20:44), Roland Everaert wrote: Hi, I helped a customer to deployed solr+manifoldCF and everything is going quite smoothly, but every time solr is raising an exception, the manifoldcfjob feeding solr aborts. I would like to know if it is possible to configure the ExtractRequ

Re: req info : SOLRJ and TermVector

2013-10-16 Thread Koji Sekiguchi

(13/10/16 17:47), elfu wrote: hi, can i access TermVector information using solrj ? There is TermVectorComponent to get termVector info: http://wiki.apache.org/solr/TermVectorComponent So yes, you can access it using solrj. koji -- http://soleami.com/blog/automatically-acquiring-synonym-kno

Re: fq caching question

2013-10-14 Thread Koji Sekiguchi

Hi Tim, (13/10/15 5:22), Tim Vaillancourt wrote: Hey guys, Sorry for such a simple question, but I am curious as to the differences in caching between a "combined" filter query, and many separate filter queries. Here are 2 example queries, one with combined fq, one separate: 1) "/select?q=*:

Re: Please help!, Highlighting exact phrases with solr

2013-10-10 Thread Koji Sekiguchi

(13/10/10 18:17), Silvia Suárez wrote: I am using solrj as client for indexing documents on the solr server I am new to solr, And I am having problem with the highlighting in solr. Highlighting exact phrases with solr does not work. For example if the search keyword is: "dulce hogar" it returns:

Re: defType

2013-08-10 Thread Koji Sekiguchi

See line 33 to 50 at http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/QParserPlugin.java?view=markup koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html (13/08/11 8:05), William Bell wrote: Can you list them out?

Re: Proximity and highliting

2013-08-03 Thread Koji Sekiguchi

(13/08/04 14:36), Alex Cougarman wrote: Hi all. I'm having some issues with highlighting and proximity searching in Solr 4.x. Matching words in the query are sometimes highlighted even if they are not within proximity and in some cases, matching words in the query are not highlighted at all. D

Re: ICUTransformFilterFactory

2013-08-02 Thread Koji Sekiguchi

(13/08/02 17:53), Jochen Lienhard wrote: Hello, we have a problem with some special characters: for example æ We are using the ICUTranformFilterFactory for indexing and searching. We have some documents with "urianae" and with "urianæ" If I search "urainae" so I find only the versions with "

Re: Sort by document similarity counts

2013-07-18 Thread Koji Sekiguchi

I have tried doing this via custom SearchComponent, where I can find all similar documents for each document in current search result, then add a new field into document hoping to use sort parameter (q=*&sort=similarityCount). I don't understand this part very well, but: But this will not wo

Re: Find related words

2013-07-04 Thread Koji Sekiguchi

Hi Dotan, (13/07/04 23:51), Dotan Cohen wrote: Thank you Jack and Koji. I will take a look at MLT and also at the .zip files from LUCENE-474. Koji, did you have to modify the code for the latest Solr? Yes. As the Lucene APIs for accessing index have been changed, I had to modify the code. koj

Re: Find related words

2013-07-04 Thread Koji Sekiguchi

You may want collocations a given word? I've implemented LUCENE-474 for Solr a while ago and I found it worked pretty well. https://issues.apache.org/jira/browse/LUCENE-474 Hope this helps. koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html (13/07/04

Re: [blog post] Automatically Acquiring Synonym Knowledge from Wikipedia

2013-05-28 Thread Koji Sekiguchi

e you shared source code / jar for the same so at it could be used ? Thanks, Rajesh On Mon, May 27, 2013 at 8:44 PM, Koji Sekiguchi wrote: Hello, Sorry for cross post. I just wanted to announce that I've written a blog post on how to create synonyms.txt file automatically from Wikiped

Re: Note on The Book

2013-05-27 Thread Koji Sekiguchi

contribution, that would be great. The focus of the book will be hard-core Solr. -- Jack Krupansky -Original Message- From: Koji Sekiguchi Sent: Monday, May 27, 2013 8:07 AM To: solr-user@lucene.apache.org Subject: Re: Note on The Book Hi Jack, I'd like to ask as a person who contributed a

[blog post] Automatically Acquiring Synonym Knowledge from Wikipedia

2013-05-27 Thread Koji Sekiguchi

Hello, Sorry for cross post. I just wanted to announce that I've written a blog post on how to create synonyms.txt file automatically from Wikipedia: http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html Hope that the article gives someone a good experience! koji

Re: Note on The Book

2013-05-27 Thread Koji Sekiguchi

Hi Jack, I'd like to ask as a person who contributed a case study article about "Automatically acquiring synonym knowledge from Wikipedia" to the book. (13/05/24 8:14), Jack Krupansky wrote: To those of you who may have heard about the Lucene/Solr book that I and two others are writing on Luce

Re: cache disable through solrJ

2013-05-20 Thread Koji Sekiguchi

(13/05/20 20:53), J Mohamed Zahoor wrote: Hi How do i disable cache (Solr FieldValueCache) for certain queries... using HTTP it can be done using {!cache=false}... how can i do it from solrj? ./zahoor How about using facet.method=enum? koji -- http://soleami.com/blog/lucene-4-is-super-conv

Re: Solr 3.6.1: changing a field from stored to not stored

2013-04-23 Thread Koji Sekiguchi

(13/04/24 7:09), Petersen, Robert wrote: Hi guys, What would happen if I changed a field definition on an existing field in an existing index from stored to not stored? Would solr just party on ignoring the fact that this field's data is stored in the current index? I noticed I am unnecessa

Re: Returning similarity values for more like this search

2013-04-19 Thread Koji Sekiguchi

(13/04/19 23:24), Achim Domma wrote: Hi, I'm executing a search including a search for similar documents (mlt=true&mlt.fl=) which works fine so far. I would like to get the similarity value for each document. I expected this to be quite common and simple, but I could not find a hint how t

Re: conditional queries?

2013-04-09 Thread Koji Sekiguchi

Hi Mark, > Is it possible to do a conditional query if another query has no results? For example, say I want to search against a given field for: - Search for "car". If there are results, return them. - Else, search for "car*" . If there are results, return them. - Else, search for "car~" .

Re: Confusion over Solr highlight hl.q parameter

2013-04-02 Thread Koji Sekiguchi

(13/04/03 5:27), Van Tassell, Kristian wrote: > Thanks Koji, this helped with some of our problems, but it is still not > perfect. > > This query, for example, returns no highlighting: > > ?q=id:abc123&hl.q=text_it_IT:l'assieme&hl.fl=text_it_IT&hl=true&defType=edismax > > But this one does (whe

Re: Flow Chart of Solr

2013-04-02 Thread Koji Sekiguchi

(13/04/02 21:45), Furkan KAMACI wrote: Is there any documentation something like flow chart of Solr. i.e. Documents comes into Solr(maybe indicating which classes get documents) and goes to parsing process (i.e. stemming processes etc.) and then reverse indexes are get so on so forth? There is

Re: Retrieving Term vectors

2013-03-19 Thread Koji Sekiguchi

Hi Sarita, I've not dug into your code detail but my first impression is that you are missing store term positions? > FieldType fieldType = new FieldType();> IndexOptions indexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS; > fieldType.setIndexOptions(indexOptions); > fieldTyp

Re: Getting back highlights almost always works...

2013-03-19 Thread Koji Sekiguchi

(13/03/20 6:14), Van Tassell, Kristian wrote: ...but I'm finding some examples where the stored text is so big (14,000 words) that Solr fails to highlight anything. But the data is definitely in the text field and is returning due to that hit. Does anyone have any ideas why this happens? Pr

Re: Incorrect snippets using FastVectorHighlighter

2013-03-18 Thread Koji Sekiguchi

So just to be clear: There is no possibility to highlight results, if I use variable gram size. Neither the original highlighter nor FVH do the job. Or am I missing something? I don't know the latest original highlighter has such restriction or not today, but when FVH came in 2.9, at that time,

Re: Incorrect snippets using FastVectorHighlighter

2013-03-18 Thread Koji Sekiguchi

Hi Jochen, There is a restriction in FVH. FVH cannot deal with variable gram size. That is, minGramSize == maxGramSize in your NGramFilterFactory setting. koji -- http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html (13/03/18 22:17), Jochen Just wrote: -BEGIN

Re: Confusion over Solr highlight hl.q parameter

2013-03-16 Thread Koji Sekiguchi

(13/03/16 4:08), Van Tassell, Kristian wrote: > Hello everyone, > > If I search for a term “baz” and tell it to highlight it, it highlights just > fine. > > If, however, I search for “foo bar” using the q parameter, which appears in > that same document/same field, and use the hl.q parameter to

Re: how to overrride pre and post tags when usefastVectorHighlighter is set to true

2013-02-23 Thread Koji Sekiguchi

Hi Alex, (13/02/23 10:53), alx...@aim.com wrote: Hello, I was unable to change pre and post tags for highlighting when usefastVectorHighlighter is set to true. Changing default tags in solrconfig.xml works for standard highlighter though. I searched mailing list and the net with no success.

Re: Order by hl.snippets count

2012-11-19 Thread Koji Sekiguchi

(12/11/20 1:50), Gabriel Croitoru wrote: Hello, I'm using Solr 1.3 with http://wiki.apache.org/solr/HighlightingParameters options. The client just asked us to change the order from the default score to the number of hl.snippets per document. It's this posibble from Solr configuration? (witho

Re: Patch Needed for Issue Solr-3790

2012-11-09 Thread Koji Sekiguchi

(12/11/09 19:20), mechravi25 wrote: Hi All, Im using Solr 3.6.1 version. For the issue given in the following url, there is no patch file provided https://issues.apache.org/jira/browse/SOLR-3790 Can you tell me if there is patch file for the same? Also, We noticed that the below url had the c

Re: SLOR And OpenNlp integration

2012-10-11 Thread Koji Sekiguchi

(12/10/11 20:40), ahmed wrote: Hi, Thanks for reply i fact i tried this tutorial but when i execute 'ant compile' i have probleme taht class not found despite the class a re their.I dont know wats the probleme I think if you attach the error you got helps us to understand your problem. Also b

Re: Regarding delta-import and full-import

2012-09-27 Thread Koji Sekiguchi

(12/09/27 22:45), darshan wrote: Hi All, Can anyone refer me few number blogs that explains both imports in little bit more detail and with examples. Thanks, Darshan Asking Google, I got: http://www.arunchinnachamy.com/apache-solr-mysql-data-import/ http://www.andornot.

Re: solr binary protocol

2012-09-26 Thread Koji Sekiguchi

(12/09/27 9:29), Radim Kolar wrote: Its possible to use SOLR binary protocol instead of xml for taking TO SOLR? I know that it can be used in Solr reply. Have you looked javabin? http://wiki.apache.org/solr/javabin koji -- http://soleami.com/blog/starting-lab-work.html

Re: Broken highlight truncation for hl.alternateField

2012-09-14 Thread Koji Sekiguchi

Hi Arcadius, I think it is a feature. If no match terms found on hl.fl fields then it triggers hl.alternateField function, and if you set hl.maxAlternateFieldLength=[LENGTH], the highlighter extracts the first [LENGTH] characters of stored data of the hl.fl field. As this is the common feature

Re: Doubts in PathHierarchyTokenizer

2012-09-12 Thread Koji Sekiguchi

Use delimiter option instead of pattern for PathHierarchyTokenizerFactory: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PathHierarchyTokenizerFactory koji -- http://soleami.com/blog/starting-lab-work.html (12/09/12 22:22), mechravi25 wrote: Hi, Im Using Solr 3.6.1 version

Re: PathHierarchyTokenizerFactory behavior

2012-07-09 Thread Koji Sekiguchi

(12/07/09 19:41), Alok Bhandari wrote: Hello, this is how the field is declared in schema.xml when I query for this filed with input "M:/Users/User/AppData/Local/test/abc.txt" . It searches for documents containing any of the token generated M,Users, User

Re: using Carrot2 custom ITokenizerFactory

2012-05-21 Thread Koji Sekiguchi

My problem was gone. Thanks Staszek and Dawid! koji -- Query Log Visualizer for Apache Solr http://soleami.com/ (12/05/21 18:11), Stanislaw Osinski wrote: Hi Koji, Dawid came up with a simple fix for this, it's committed to trunk and 3.6 branch. Staszek

Re: using Carrot2 custom ITokenizerFactory

2012-05-20 Thread Koji Sekiguchi

1 branch now. If you hit any other issues with this, let me know. Staszek On Sun, May 20, 2012 at 1:02 PM, Koji Sekiguchi wrote: Hi Staszek, I'll wait your fix. Thank you! Koji Sekiguchi from iPad2 On 2012/05/20, at 18:18, Stanislaw Osinski wrote: Hi Koji, You're right, the cur

Re: Newbie with Carrot2?

2012-05-20 Thread Koji Sekiguchi

(12/05/20 23:21), Xue-Feng Yang wrote: Hi Staszek, I haven't found a way for inputting data into solr in the wiki. Does that mean docs can be inputted in a normal solr way after configuration? for example, DIH or solrj. Thanks, Xue-Feng Right, because Carrot2 clustering is for search time.

Re: using Carrot2 custom ITokenizerFactory

2012-05-20 Thread Koji Sekiguchi

Hi Staszek, I'll wait your fix. Thank you! Koji Sekiguchi from iPad2 On 2012/05/20, at 18:18, Stanislaw Osinski wrote: > Hi Koji, > > You're right, the current code overwrites the custom tokenizer though it > shouldn't. LuceneCarrot2TokenizerFactory is there to av

using Carrot2 custom ITokenizerFactory

2012-05-20 Thread Koji Sekiguchi

Hello, As I'd like to use custom ITokenizerFactory, I set the following Carrot2 key in solrconfig.xml: default : my.own.TokenizerFactory But seems that CarrotClusteringEngine overwrites it with LuceneCarrot2TokenizerFactory in init() method: BasicPrepro

Re: Is it possible to limit the bandwidth of replication

2012-05-07 Thread Koji Sekiguchi

(12/05/07 15:38), James wrote: I notice the index replication utilize the full bandwidth. So the normal query stalled. Is there any method to control the bandwidth of replication>? I don't know the status of Java based replication, but there is bwlimit option for your problem for script based

Re: Solr 3.5 - Elevate.xml causing issues when placed under /data directory

2012-05-02 Thread Koji Sekiguchi

(12/05/03 1:39), Noordeen, Roxy wrote: Hello, I just started using elevation for solr. I am on solr 3.5, running with Drupal 7, Linux. 1. I updated my solrconfig.xml from ${solr.data.dir:./solr/data} To /usr/local/tomcat2/data/solr/dev_d7/data 2. I placed my elevate.xml in my solr's data dire

Re: How to integrate sen and lucene-ja in SOLR 3.x

2012-05-01 Thread Koji Sekiguchi

(12/05/02 1:47), Shanmugavel SRD wrote: Hi, Can anyone help me on how to integrate sen and lucene-ja.jar in SOLR 3.4 or 3.5 or 3.6 version? I think lucene-ja.jar no longer exists in Internet and doesn't work with Lucene/Solr 3.x because interface doesn't match (lucene-ja doesn't know Attribu

Re: Solr: Highlighting word parts in excerpt does not work

2012-04-05 Thread Koji Sekiguchi

(12/04/05 15:34), Thomas Werthmüller wrote: Hi I configured solr that also word parts are found. When is search "Monday" or "Mond" the right document is found. This is done with the following configuration in the schema.xml:. Now, when I add hl=true to the query sting, the excerpt for "Monday"

Re: Why my highlights are wrong(one character offset)?

2012-03-27 Thread Koji Sekiguchi

How does your sequence field look like in schema.xml, fieldType and field? And what version are you using? koji -- Query Log Visualizer for Apache Solr http://soleami.com/ (12/03/27 13:06), neosky wrote: all of my highlights has one character mistake in the offset,some fragments from my respons

Re: Reporting tools

2012-03-09 Thread Koji Sekiguchi

(12/03/09 12:35), Donald Organ wrote: Are there any reporting tools out there? So I can analyzer search term frequency, filter frequency, etc? You may be interested in: Free Query Log Visualizer for Apache Solr http://soleami.com/ koji -- Query Log Visualizer for Apache Solr http://soleami.

Re: Help with Synonyms

2012-03-05 Thread Koji Sekiguchi

(12/03/06 11:23), Donald Organ wrote: Ok so do I need to use a different format in my synonyms.txt file in order to do this at index time? Right, if you want to apply synonym rules to only index time. Use "," like this: floor locker, storage locker And don't forget to set expand="true" in yo

Re: Help with Synonyms

2012-03-05 Thread Koji Sekiguchi

(12/03/06 11:07), Donald Organ wrote: No I do synonyms at index time. : I am still getting results for storage locker and no results for floor locker synonyms.txt still looks like this: floor locker=>storage locker So that's the cause of the problem. Due to the definition "floor locker=>s

Re: Help with Synonyms

2012-03-05 Thread Koji Sekiguchi

(12/03/06 0:11), Donald Organ wrote: Try to remove tokenizerFactory="**KeywordTokenizerFactory" in your synonym filter definition because I think you would want to tokenize the synonym settings in synonyms.txt as "floor" / "locker" => "storage" / "locker". But if you set it to KeywordTokenizer,

Re: nutch log

2012-03-03 Thread Koji Sekiguchi

LinkDb.java:175) at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:149) at org.apache.nutch.crawl.Crawl.run(Crawl.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.Crawl.main(Crawl.java:55) why, in your opinion? thanks again alessio Il giorno 03 marzo 2012 16:43, Koji Sekiguchi

Re: nutch log

2012-03-03 Thread Koji Sekiguchi

(12/03/04 0:09), alessio crisantemi wrote: is true. this is the slr problem: mar 03, 2012 12:08:04 PM org.apache.solr.common.SolrException log Grave: org.apache.solr.common.SolrException: invalid boolean value: Solr said that there was an erroneous boolean value in your solrconfig.xml. Check th

Re: nutch log

2012-03-03 Thread Koji Sekiguchi

(12/03/03 20:32), alessio crisantemi wrote: this is my nutch log after configured it for solr index: : org.apache.solr.common.SolrException: Internal Server Error Internal Server Error request: http://localhost:8983/solr/update?wt=javabin&version=2 at org.apache.solr.client.solrj.impl.Common

1 2 3 4 5 6 >

1 - 100 of 545 matches

Mail list logo