Community Over Code NA 2024 Search track, CFP closing soon

2024-04-09 Thread Anshum Gupta
your talks here - https://communityovercode.org/call-for-presentations/ We hope to see many of you talk about Search in Denver! -- Anshum Gupta

[REMINDER] CFP Open for Search Track at Community Over Code EU (Formerly ApacheCon)

2024-01-05 Thread Anshum Gupta
there. -- Anshum Gupta

[REMINDER] CFP Open for Search Track at Community Over Code (Formerly ApacheCon)

2023-06-12 Thread Anshum Gupta
! Hope to see you all there. -- Anshum Gupta

Re: Call for Presentations now open, ApacheCon North America 2022

2022-03-30 Thread Anshum Gupta
. Good luck! -Anshum On Wed, Mar 30, 2022 at 5:47 AM Michael Wechner wrote: > Hi Together > > I would be interested to submit a proposal/presentation re Lucene's > vector search, but would like to ask first whether somebody else wants > to do this as well or might be i

Search Track at ApacheCon 2021, Sep 21-23

2021-09-15 Thread Anshum Gupta
website - https://www.apachecon.com/acah2021/index.html Registration - https://hopin.com/events/apachecon-2021-home Slack - http://s.apache.org/apachecon-slack Search Track - https://www.apachecon.com/acah2021/tracks/search.html See you all at ApacheCon 2021! -Anshum

Fwd: Call for Presentations for ApacheCon 2021 now open

2021-03-08 Thread Anshum Gupta
- To unsubscribe, e-mail: announce-unsubscr...@apachecon.com For additional commands, e-mail: announce-h...@apachecon.com -- Anshum Gupta

[ANNOUNCE] Apache Solr TLP Created

2021-02-18 Thread Anshum Gupta
that they can continue to expect critical bug fixes for releases previously made under the Apache Lucene project. We will send another update as the mailing lists and website are set up for the Solr project. -Anshum On behalf of the Apache Lucene and Solr PMC

ApacheCon at Home 2020 starts tomorrow!

2020-09-28 Thread Anshum Gupta
://www.apachecon.com/acah2020/tracks/search.html See you at ApacheCon. -- Anshum Gupta

Re: [VOTE] Lucene logo contest, third time's a charm

2020-09-08 Thread Anshum Gupta
https://lucene.apache.org/theme/images/lucene/lucene_logo_green_300.png > > > > Please vote for one of the above choices. This vote will close about one > > week from today, Mon, Sept 7, 2020 at 11:59PM. > > > > Thanks! > > > > [jira-issue] https://issues.apache.org/jira/browse/LUCENE-9221 > > [first-vote] > > > http://mail-archives.apache.org/mod_mbox/lucene-dev/202006.mbox/%3cCA+DiXd74Mz4H6o9SmUNLUuHQc6Q1-9mzUR7xfxR03ntGwo=d...@mail.gmail.com%3e > > [second-vote] > > > http://mail-archives.apache.org/mod_mbox/lucene-dev/202009.mbox/%3cCA+DiXd7eBrQu5+aJQ3jKaUtUTJUqaG2U6o+kUZfNe-m=smn...@mail.gmail.com%3e > > [rank-choice-voting] https://en.wikipedia.org/wiki/Instant-runoff_voting > > > -- Anshum Gupta

[ANNOUNCE] Apache Lucene 7.0.0 released

2017-09-20 Thread Anshum Gupta
tensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also applies to Maven access. ReleaseNote70 (last edited 2017-09-20 10:27:30 by AnshumGupta <https://wiki.apache.org/lucene-java/AnshumGupta>) Anshum Gupta

[ANNOUNCE] Apache Lucene 5.5.3 released

2016-09-09 Thread Anshum Gupta
try another mirror. This also goes for Maven access. -Anshum Gupta

[ANNOUNCE] Apache Lucene 5.5.1 released

2016-05-06 Thread Anshum Gupta
replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. -- Anshum Gupta

[ANNOUNCE] Apache Lucene 5.3.2 released

2016-01-23 Thread Anshum Gupta
replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. -- Anshum Gupta

[ANNOUNCE] Apache Lucene 5.2.0 released

2015-06-07 Thread Anshum Gupta
Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. -- Anshum Gupta

Re: [ANNOUNCE] Apache Lucene 5.0.0 released

2015-02-20 Thread Anshum Gupta
: u...@thetaphi.de -Original Message- From: Anshum Gupta [mailto:ans...@anshumgupta.net] Sent: Friday, February 20, 2015 9:55 PM To: d...@lucene.apache.org; gene...@lucene.apache.org; java- u...@lucene.apache.org Subject: [ANNOUNCE] Apache Lucene 5.0.0 released 20 February

[ANNOUNCE] Apache Lucene 5.0.0 released

2015-02-20 Thread Anshum Gupta
and notes on upgrading. Please report any feedback to the mailing lists ( http://lucene.apache.org/core/discussion.html) -- Anshum Gupta http://about.me/anshumgupta

Bangalore Apache Lucene/Solr meetup

2013-05-21 Thread Anshum Gupta
meetup event: http://www.meetup.com/Bangalore-Apache-Solr-Lucene-Group/events/113806762/ . -- Anshum Gupta http://www.anshumgupta.net

Re: Finding match term positions in the document

2011-10-28 Thread Anshum
Hi Vidya, Perhaps this could help you: http://hrycan.com/2009/10/25/lucene-highlighter-howto/ -- Anshum Gupta http://ai-cafe.blogspot.com On Fri, Oct 28, 2011 at 2:18 PM, Vidya Kanigiluppai Sivasubramanian vidya...@hcl.com wrote: Hi, I am using lucene 2.4.1 in my project. I need

Re: Is There a Way To Split The Lucene Index Segments To Samller Size Less Than 1 GB

2011-07-27 Thread Anshum
hand, why do you want to split a 9G index? Is there a reason? performance issue? It'd be good if you could share the reason as the problem could be completely different. -- Anshum Gupta http://ai-cafe.blogspot.com 2011/7/27 Gudi, Ravi Sankar ravisankarg.ravisank...@hp.com Hi Lucene Team

Re: Lucene Result

2011-06-08 Thread Anshum
from the 'search' method. Also, I'd suggest you to grab a copy of Lucene in Action 2nd Edition as it'd help you a lot in understanding the way Lucene works/is used. -- Anshum Gupta http://ai-cafe.blogspot.com On Wed, Jun 8, 2011 at 11:00 AM, Pranav goyal pranavgoyal40...@gmail.comwrote: Hi all

Re: Lucene Indexing

2011-06-06 Thread Anshum
the updateDocument function as of now would internally delete the document and add the new supplied document. Hope this answer helps. -- Anshum Gupta http://ai-cafe.blogspot.com On Mon, Jun 6, 2011 at 11:59 AM, Pranav goyal pranavgoyal40...@gmail.comwrote: Hi all, I am a newbie to lucene. I

Re: Lucene Document No

2011-06-06 Thread Anshum
. -- Anshum Gupta http://ai-cafe.blogspot.com On Mon, Jun 6, 2011 at 4:41 PM, Pranav goyal pranavgoyal40...@gmail.comwrote: Hi all, Is there any way to change my lucene document no? Like if I can change my lucene document no's with con_key. I am a newbie and don't know whether this is a silly

Re: Lucene Indexing

2011-06-06 Thread Anshum
Yes, You'd need to delete the document and then re-add a newly created document object. You may use the key and delete the doc using the Term(key, value). -- Anshum Gupta http://ai-cafe.blogspot.com On Mon, Jun 6, 2011 at 4:45 PM, Pranav goyal pranavgoyal40...@gmail.comwrote: Hi Anshum

Re: best practice for reusing documents with multi-valued fields

2011-04-19 Thread Anshum
){ System.out.println(ir.document(scoreDoc.doc)); } is.close(); ir.close(); iw.close(); *--Snip--* -- Anshum Gupta http://ai-cafe.blogspot.com On Fri, Apr 15, 2011 at 6:32 AM, Christopher Condit con...@sdsc.edu wrote: I know that it's best practice to reuse

Re: Lucene: Indexsearcher: java.lang.UnsupportedOperationException

2011-04-19 Thread Anshum
Could you also print and send the entire stack-trace? Also, the query.toString() -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Apr 19, 2011 at 7:40 PM, Patrick Diviacco patrick.divia...@gmail.com wrote: I get the following error message: java.lang.UnsupportedOperationException

Re: Choosing boosting in Lucene

2011-04-18 Thread Anshum
the best. Relevance or an apt method about boost values, can again be figured out using varying the boost *via* *trial and error*. That is pretty much a general practice. Hope this helps you figuring out a reasonable solution and boost values. -- Anshum Gupta http://ai-cafe.blogspot.com On Sat, Apr

Re: Calculate document lucene score after the search

2011-04-18 Thread Anshum
Hi Madhu, You could use IndexSearcher.explain(..) to explain the result and get the detailed breakup of the score. That should probably help you with understanding the boost and score as calculated by lucene for your app. -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Apr 19, 2011 at 2:32

Re: Update Document based on Query instead of Term

2011-04-13 Thread Anshum
So Update basically is nothing but delete and add (a fresh doc). You could just go ahead at using the deletedocument(Query query) function and then adding the new document? That is the general approach for such cases and it works just about fine. -- Anshum Gupta http://ai-cafe.blogspot.com

Re: how to get all documents in the results ?

2011-03-23 Thread Anshum
So functionally I am assuming you've achieved what you'd been aiming for. About the scores, the matchalldocs does score docs based on norm factors etc. therefore the score wouldn't be 0. -- Anshum Gupta http://ai-cafe.blogspot.com On Wed, Mar 23, 2011 at 1:38 PM, Patrick Diviacco patrick.divia

Re: Is it possible to update only selected fields in a document ?

2011-03-22 Thread Anshum
Hi, No as of now, there's no way to do so. -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Mar 22, 2011 at 12:29 PM, shrinath.m shrinat...@webyog.com wrote: I am asking for partial update in Lucene, where I want to update only a selected field of all fields in the document. Does Lucene

Re: Is it possible to update only selected fields in a document ?

2011-03-22 Thread Anshum
Also, Is there a particular reason why you wouldn't want to index that considering you'd want to 'update' documents. Its good practice to index the unique field specially if you have one. It has generally helped more often than not. -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Mar 22

Re: Is it possible to update only selected fields in a document ?

2011-03-22 Thread Anshum
Yes, that's how its generally done. Also, you should just handle data/fields aptly rather than trying to avoid them in the first place. You could safely add these, use these internally and never return these or use these for an end user search. -- Anshum Gupta http://ai-cafe.blogspot.com On Tue

Re: how to get all documents in the results ?

2011-03-22 Thread Anshum
Hi Patrick, You may have a look at this, perhaps this will help you with it. Let me know if you're still stuck up. http://stackoverflow.com/questions/3300265/lucene-3-iterating-over-all-hits -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Mar 22, 2011 at 4:10 PM, karl.wri...@nokia.com

Re: how to get all documents in the results ?

2011-03-22 Thread Anshum
so a few things 1. are you looking to get 'all' documents or only docs matching your query? 2. if its about fetching all docs, why not use the matchalldocs query? 3. did you try using a collector instead of topdocs? -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Mar 22, 2011 at 4:46 PM

Re: how to get all documents in the results ?

2011-03-22 Thread Anshum
are trying to achieve. You may have a completely different option that you haven't read which someone could advice if they know the exact intent. Hope this helps. -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Mar 22, 2011 at 4:59 PM, Patrick Diviacco patrick.divia...@gmail.com wrote: 1

Re: lucid gaze

2011-03-16 Thread Anshum
Hi Suman, I tried it a while ago. Found it nice and useful. You could get some hints on using it at http://ai-cafe.blogspot.com/2009/09/lucid-gaze-tough-nut.html (in case you need some ! :) ) -- Anshum Gupta http://ai-cafe.blogspot.com On Wed, Mar 16, 2011 at 11:37 AM, suman.holani suman.hol

Re: document object

2011-03-10 Thread Anshum
, otherwise if you're using very selective field which may be used though a FieldCache it'd be a nice thing to do. Hope that helps. -- Anshum Gupta http://ai-cafe.blogspot.com On Thu, Mar 10, 2011 at 3:01 PM, suman.holani suman.hol...@zapak.co.inwrote: Hi, I am facing the problem The line

Re: document object

2011-03-10 Thread Anshum
Depends on your data. I know that's a vague answer but that's the point. What you could do is use FieldCache if memory and data let you do so. Would it? -- Anshum Gupta http://ai-cafe.blogspot.com On Thu, Mar 10, 2011 at 3:12 PM, suman.holani suman.hol...@zapak.co.inwrote: Hi Anshum, Thanks

Re: finding the length of a field

2011-02-28 Thread Anshum
Hi Lahiru, A few questions here. Why would you need that? Is the field stored? -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Mar 1, 2011 at 11:04 AM, Lahiru Samarakoon lahir...@gmail.comwrote: Hi all, Is there a way to find the length of a field of a lucene index document? Thanks

Re: Multi Index Search Query

2011-02-14 Thread Anshum
If you actually intend at getting the intersection of 2 results from a 'union' of 2 indexes, you could use the filter and query approach. You could use a multi searcher or a parallel multi searcher to perform the search in this case. -- Anshum Gupta http://ai-cafe.blogspot.com On Mon, Feb 14

Re: Search in multiple indexes which have differnt field name

2011-02-14 Thread Anshum
Hi Liat, You could use open a multi/parallelmultisearcher on the indexes that you have and then construct an OR query e.g. (contents:A OR text:A) I am assuming that the field names do not overlap. If that is not the case then you'd need another solution. -- Anshum Gupta http://ai

Re: construct a field without analyzer?

2011-02-14 Thread Anshum
KeywordAnalyzer()); /snip In the above snip, I instantiate an analyzer which by default would use the StandardAnalyzer but for 'anotherfield' would use KeywordAnalyzer. Hope this helps you. -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Feb 15, 2011 at 2:19 AM, Yuhan Zhang yzh...@onescreen.com wrote

Re: where can i download a sample index

2011-02-13 Thread Anshum
Why don't you generate your own index off some sample docs or dataset. Would give you a lot more flexibility to play around as otherwise even if you get an index, you wouldn't have info in the analyzer used etc.. while indexing. -- Anshum Gupta http://ai-cafe.blogspot.com On Sun, Feb 13, 2011

Re: lucene 3.0.3 | phrase query problem

2011-02-10 Thread Anshum
Hi Ranjit, That would be because all stop words (space, comma, stop word set, etc..) would be treated in a similar fashion and escaped while indexing, subject to the analyzer you use while index your content. Hope that explains the issue. -- Anshum Gupta http://ai-cafe.blogspot.com On Thu, Feb

Re: Please Help

2011-01-20 Thread Anshum
of an ngram, and then treat those phrases at terms. Doing it at runtime would not be a feasible option. -- Anshum Gupta http://ai-cafe.blogspot.com On Thu, Jan 20, 2011 at 3:30 PM, Ashish Pancholi apanch...@chambal.comwrote: Using Lucene_3.0.3. we would like to implement following: The number

Re: Scale out design patterns

2011-01-20 Thread Anshum
mod of some numeric (auto increment) userid. This works well under normal cases unless your partitioning is not predictable. -- Anshum Gupta http://ai-cafe.blogspot.com On Fri, Jan 21, 2011 at 10:52 AM, Ganesh emailg...@yahoo.co.in wrote: Hello all, Could you any one guide me what all

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Anshum
mirrors them internally or via a downstream project) -- Anshum Gupta http://ai-cafe.blogspot.com

Re: Result ordering

2011-01-16 Thread Anshum
understanding on lucene and getting a copy of Lucene In Action 2nd Edhttp://www.manning.com/hatcher3/. would be a good idea for you and everyone in your position. Hope that helps. -- Anshum Gupta http://ai-cafe.blogspot.com On Sun, Jan 16, 2011 at 8:03 PM, Pelit Mamani pelit.mam

Re: Creating an index with multiple values for a single field

2011-01-07 Thread Anshum
Hi Ryan, You should try the synonym filter. That should help you with this kinda problem. You could also look at turning off norms for the name field, or turning off tf or idf. -- Anshum Gupta http://ai-cafe.blogspot.com On Sat, Jan 8, 2011 at 6:03 AM, Ryan Aylward r...@glassdoor.com wrote

Re: Lucene index

2010-12-29 Thread Anshum
page, starting at http://lucene.apache.org/java/2_9_1/api/all/org/apache/lucene/index/IndexWriter.html#DEFAULT_RAM_BUFFER_SIZE_MB http://lucene.apache.org/java/2_9_1/api/all/org/apache/lucene/index/IndexWriter.html#DEFAULT_RAM_BUFFER_SIZE_MB -- Anshum Gupta http://ai-cafe.blogspot.com On Wed, Dec

Re: Can lucene index survives a machine crash during the merge or optimize operation?

2010-12-29 Thread Anshum
. -- Anshum Gupta http://ai-cafe.blogspot.com On Wed, Dec 29, 2010 at 5:36 PM, Jiang mingyuan mailtojiangmingy...@gmail.com wrote: Can lucene index survives a machine crash during the merge or optimize operation? or can I stop the running index program during the merge or optimize period

Re: Using Lucene to search live, being-edited documents

2010-12-28 Thread Anshum
. -- Anshum Gupta http://ai-cafe.blogspot.com On Wed, Dec 29, 2010 at 3:36 AM, software visualization softwarevisualizat...@gmail.com wrote: This has probably been asked before but I couldn't find it, so... Is it possible / advisable / practical to use Lucene as the basis of a live document

Re: Using Lucene to search live, being-edited documents

2010-12-28 Thread Anshum
Hi Umesh, I'm not really confident that Zoie or anything built on the current version of Lucene would be able to handle search as you type kind of a setup. -- Anshum Gupta http://ai-cafe.blogspot.com On Wed, Dec 29, 2010 at 10:39 AM, Umesh Prasad umesh.i...@gmail.com wrote: You can also look

Re: Editing StopWordList

2010-12-21 Thread Anshum
below). -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Dec 21, 2010 at 3:54 PM, manjula wijewickrema manjul...@gmail.comwrote: Hi Gupta, Thanx a lot for your reply. But I could not understand whether I could modify (adding more words) to the default stop word list or should I have

Re: Editing StopWordList

2010-12-20 Thread Anshum
. -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Dec 21, 2010 at 9:20 AM, manjula wijewickrema manjul...@gmail.comwrote: Hi, 1) In my application, I need to add more words to the stop word list. Therefore, is it possible to add more words into the default lucene stop word list? 2

Re: What is the difference between the AND and + operator?

2010-11-30 Thread Anshum
with a single '=' :) -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Nov 30, 2010 at 3:03 PM, maven apache apachemav...@gmail.comwrote: 2010/11/30 Chris Hostetter hossman_luc...@fucit.org : Subject: What is the difference between the AND and + operator? In this query, y

Re: field cross search in lucene

2010-11-30 Thread Anshum
You could change Occur.SHOULD to Occur.MUST for both fields. This should work for you if what I understood is what you wanted. -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Nov 30, 2010 at 5:12 PM, maven apache apachemav...@gmail.comwrote: Hi: I have two documents: title

Re: What is the difference between the AND and + operator?

2010-11-29 Thread Anshum
#setMinimumNumberShouldMatch(int)Finally all would depend on the case at hand and what you think is the expected behavior of search. Hope this helps. -- Anshum Gupta http://ai-cafe.blogspot.com On Mon, Nov 29, 2010 at 1:31 PM, yang Yang m4ecli...@gmail.com wrote: What is the difference between

Re: asking about index verification tools

2010-11-17 Thread Anshum
the index and the source. -- Anshum Gupta http://ai-cafe.blogspot.com On Wed, Nov 17, 2010 at 1:36 PM, Lance Norskog goks...@gmail.com wrote: The Lucene CheckIndex program does this. It is a class somewhere in Lucene with a main() method. Samarendra Pratap wrote: It is not guaranteed

Re: lucene anchor-distance based search

2010-11-17 Thread Anshum
/lucene-java/SpatialSearch For your understanding, you could have a look at the bounding box approach. -- Anshum Gupta http://ai-cafe.blogspot.com On Thu, Nov 18, 2010 at 7:38 AM, yang Yang m4ecli...@gmail.com wrote: We are using the hibernate search which is based on lucene as the search engine

Re: asking about index verification tools

2010-11-15 Thread Anshum
. This would also give you a fair idea of the index state. -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Nov 16, 2010 at 11:36 AM, Yakob jacob...@opensuse-id.org wrote: hello all, I would like to ask about lucene index. I mean I created a simple program that created lucene indexes and stored

Re: Update lucene index

2010-10-12 Thread Anshum
wanting to do so? is it that you only index data coming from a stream and you don't have access to the original source at a later time? -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Oct 12, 2010 at 11:35 AM, Nilesh Vijaywargiay nilesh.vi...@gmail.com wrote: Hi Group, I understand

Re: Update lucene index

2010-10-12 Thread Anshum
ParallelReader though theoretically sounds useful, I doubt if how much the overhead of maintaining and synchronizing the document ids would be. I haven't used it so far, perhaps someone who's used the ParallelReader for such a purpose on production environment/scale may help you. -- Anshum Gupta

Re: Indexing is hung or doesn't complete

2010-10-12 Thread Anshum
Version? Machine and JVM (32/64 bit)? This most probably seems like a code level issue rather than lucene, but I may be wrong. -- Anshum Gupta http://ai-cafe.blogspot.com On Wed, Oct 13, 2010 at 8:08 AM, Ching zchin...@gmail.com wrote: Hi All, Can anyone help with this issue? I have about 2000 pdf

Re: How to make a search log

2010-10-12 Thread Anshum
at SOLR, which provides an out of the box engine. -- Anshum Gupta http://ai-cafe.blogspot.com On Wed, Oct 13, 2010 at 8:57 AM, Hyun Joo Noh dbfudrp...@gmail.com wrote: Hi, how would you make Lucene leave a search log of who searched what, when, etc (i.e. cookie, query, timestamp, etc

Re: how to get the first term from index?

2010-09-30 Thread Anshum
this is what you intended! -- Anshum Gupta http://ai-cafe.blogspot.com On Thu, Sep 30, 2010 at 11:54 PM, Sahin Buyrukbilen sahin.buyrukbi...@gmail.com wrote: Hi all, I need to get the first term in my index and iterate it. Can anybody help me? Best.

Re: slow search threads during a disk copy

2010-08-23 Thread Anshum
Seems like a case of I/O issues. You may be reading content off the index while performing searches while the I/O for copy is also happening. -- Anshum Gupta http://ai-cafe.blogspot.com On Mon, Aug 23, 2010 at 1:12 PM, gag...@graffiti.net wrote: Hi all, We're observing search threads

Re: Wanting batch update to avoid high disk usage

2010-08-23 Thread Anshum
(). -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Aug 24, 2010 at 4:38 AM, Justin cry...@yahoo.com wrote: In an attempt to avoid doubling disk usage when adding new fields to all existing documents, I added a call to IndexWriter::expungeDeletes. Then my colleague pointed out that Lucene

Re: Wanting batch update to avoid high disk usage

2010-08-23 Thread Anshum
of reclaiming lost disc space. -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Aug 24, 2010 at 9:22 AM, Justin cry...@yahoo.com wrote: My actual code did not call expungeDeletes every time through the loop; however, calling expungeDeletes or optimize after the loop means that the index has doubled

Re: Sorting a Lucene index

2010-08-18 Thread Anshum
it comfortably. btw, are you facing any issues in sort time or is it a presumption? -- Anshum Gupta http://ai-cafe.blogspot.com On Wed, Aug 18, 2010 at 5:12 PM, Shelly_Singh shelly_si...@infosys.comwrote: Hi, I have a Lucene index that contains a numeric field along with certain other fields

Re: about RAMDirectory based B/S plantform problem

2010-08-17 Thread Anshum
? -- Anshum Gupta http://ai-cafe.blogspot.com 2010/8/17 xiaoyan Zheng hillyzh...@gmail.com the question is like this: when one user is using IndexWirter.addDocument(doc), and another user has already finished adding part and have closed IndexWirter, then, the first user embraces the error ERROR

Re: Scaling Lucene to 1bln docs

2010-08-10 Thread Anshum
reading the source takes time in your case, though, the indexwriter would have to be shared among all threads. -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Aug 10, 2010 at 12:24 PM, Shelly_Singh shelly_si...@infosys.comwrote: Hi, I am developing an application which uses Lucene

Re: Scaling Lucene to 1bln docs

2010-08-10 Thread Anshum
for that period. This would make the data manageable and searchable within reasonable time. -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Aug 10, 2010 at 5:49 PM, Shelly_Singh shelly_si...@infosys.comwrote: No sort. I will need relevance based on TF. If I shard, I will have to search in al indices

Re: Scaling Lucene to 1bln docs

2010-08-10 Thread Anshum
So, you didn't really use the setRamBuffer.. ? Any reasons for that? -- Anshum Gupta http://ai-cafe.blogspot.com On Wed, Aug 11, 2010 at 10:28 AM, Shelly_Singh shelly_si...@infosys.comwrote: My final settings are: 1. 1.5 gig RAM to the jvm out of 2GB available for my desktop 2

Re: Storing The content

2010-05-17 Thread Anshum
Hi Saurabh, I don't think there's a way to do that? Why not use other constructs? -- Anshum Gupta http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw On Mon, May 17, 2010 at 8:04 PM, Saurabh Agarwal srbh.g

Re: Trace only exactly matching terms!

2010-05-07 Thread Anshum
Hi Manjula, Yes lucene by default would only tackle exact term matches unless you use a custom analyzer to expand the index/query. -- Anshum Gupta http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw On Fri

Re: Range Score in Lucene

2010-04-27 Thread Anshum
Hi Clara, Any particular reason why you'd need the score? Perhaps this would be of help http://lucene.apache.org/java/2_9_1/scoring.html http://lucene.apache.org/java/2_3_2/scoring.pdf Hope this explains whatever you were looking for. -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com

Re: VM options for faster lucene search

2010-04-26 Thread Anshum
There are a few things you could do, 1. Run the JVM in server mode [-server] 2. Assign more RAM (in case you're running a 64 bit architecture) (both initial and max limit) -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me

Re: Indexing and Searching fields that have unique values

2010-04-22 Thread Anshum
Hi Ravi, Adding to what Erick said, you could do index the numbers as numeric fields instead of strings. This should improve things for you by a considerable amount. P.S: I'm talking with my knowledge on Java Lucene. -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed

Re: It is possible to change the meaning of a match in lucene

2010-04-22 Thread Anshum
something like a synonym analyzer while conducting search in this case. -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw On Fri, Apr 23, 2010 at 2:39 AM, Wei Yi jasonwe...@gmail.com

Re: Lucene India Users/Developers

2010-03-31 Thread Anshum
Reposting as the first post didn't get many hits! Apologies for all who consider this spam! -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw On Wed, Feb 17, 2010 at 3:35 PM

Re: Optimising the lucene search

2010-03-23 Thread Anshum
the fields at run time. As far as relational nature is concerned, I'd say lucene's model is pretty different from what you're taking it to be. Lucene documents are just a collection of field/value pairs. -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong

Re: how lucene search works in memory

2010-03-23 Thread Anshum
copy in any manner though) -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw On Tue, Mar 23, 2010 at 3:51 PM, suman.hol...@zapak.co.in wrote: Hello, I am trying

Re: What is the best practice of using synonymy ?

2010-03-23 Thread Anshum
Index time is a much better approach. The only negative about it is the index size increase. I've used it for a considerable sized dataset and even the index time doesn't seem to go up considerably. Searching of multiple terms is generally unoptimized when you can do it with 1. -- Anshum Gupta

Re: Prefix And Fuzzy

2010-03-19 Thread Anshum
tokenized/processed prior to getting indexed. The way the processing would happen depends on your analyzer (which here is StopAnalyzer). So point 1. If you analyze a field with value *'My name is anshum' *it would get broken down into tokens, e.g. [my] [name] [is] [anshum] where each term

Re: search on documents which DO NOT have field defined

2010-03-11 Thread Anshum
Hi, How about indexing a dummy token for empty docs? that way you may pick up all docs that are actually null/empty by querying for the dummy token. Make sure that the dummy token is never a part of any actual document (token stream). Perhaps this should work! -- Anshum Gupta Naukri Labs! http

Re: how to use DuplicateFilter to get unique documents based on a fieldName

2010-03-05 Thread Anshum
multiple genres instead of duplicate entries. I'm still not sure if I've gotten tre problem correctly, but hope this is of help! -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw

Lucene India Users/Developers

2010-02-17 Thread Anshum
://groups.google.com/group/luceneindia* to join and share! -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw

Re: Limiting search result for web search engine

2010-02-02 Thread Anshum
Hi Mike, Not really through queries, but you may do this by writing a custom collector. You'd need some supporting data structure to mark/hash the occurrence of a domain in your result set. -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody

Re: CANNOT use a * or ? symbol as the first character of a search.

2009-12-28 Thread Anshum
be: Index flipped terms (using an appropriate analyzer) i.e. cat is also indexed as tac. You may then query on ta* instead of at*. Does that solve your issues/concern? -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me

Re: Recover special terms from StandardTokenizer

2009-12-11 Thread Anshum
How about getting the original token stream and then converting c++ to cplusplus or anyother such transform. Or perhaps you might look at using/extending(in the non java sense) some other tokenized! -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong

Re: Lucene 3.0.0 writer with a Lucene 2.3.1 index

2009-12-11 Thread Anshum
in the index size should be anticipated and handled. -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw On Fri, Dec 11, 2009 at 10:50 PM, Rob Staveley (Tom) rstave...@seseit.comwrote: I'm

Re: How to include some more fields to be indexed in the file document class?

2009-12-04 Thread Anshum
an indexer from scratch, you'd have to write a java file on the same lines as the demo (modified) and include it. Does that help? -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw

Re: Storing image with Lucene

2009-12-02 Thread Anshum
(in the wrapper code). -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw On Thu, Dec 3, 2009 at 8:02 AM, blazingwolf7 blazingwo...@gmail.com wrote: Hi, As per title

Re: Need help regarding implementation of autosuggest using jquery

2009-11-26 Thread Anshum
Just add a check in the while statement to exit as soon as the pattern of the term changes. You could check if the term does not start with your input and exit from the while loop there. It would exit wherever the term start changes from what you want. -- Anshum Gupta Naukri Labs! http://ai

Re: Need help regarding implementation of autosuggest using jquery

2009-11-26 Thread Anshum
Try this, Change the code as required: - import java.io.IOException; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.Term; import org.apache.lucene.index.TermEnum; /** * @author anshum * */ public class

Re: How to find the fields that are indexed?

2009-11-23 Thread Anshum
By autosuggest, would you mean similar documents? In that case you could try the lucene 'morelikethis' class. -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw On Mon, Nov 23

Re: autosuggest - in the sense of autocomplete

2009-11-23 Thread Anshum
For auto complete, you could try the following: 1. Run a prefix query. [Could be a fuzzy query] 2. Index using something like ngrams. term : sample is indexed as 4 terms, viz: t te ter term -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody

Re: RamDirectory and FS at the same moment

2009-11-23 Thread Anshum
Hi Rafal, If what I understand about your implementation is correct, you could try a parallelmultisearcher http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/ParallelMultiSearcher.html -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong

Re: Could one filed include more than one value?

2009-11-10 Thread Anshum
(field, new FileReader(f12)); iw.addDocument(doc); --snip ends-- -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw On Tue, Nov 10, 2009 at 2:50 PM, Wenhao Xu xuwenhao2...@gmail.com

  1   2   >