Thank-you.
Glen
On Sat, 6 Aug 2022 at 23:46, Tomoko Uchida
wrote:
> Hi Glen,
> I verified your Jira/GitHub usernames and added a mapping.
>
> https://github.com/apache/lucene-jira-archive/commit/ae78d583b40f5bafa1f8ee09854294732dbf530b
>
> Tomoko
>
>
> 20
jira: gnewton
github: gnewton (github.com/gnewton)
Thanks,
Glen
On Sat, 6 Aug 2022 at 14:11, Tomoko Uchida
wrote:
> Hi everyone.
>
> I wanted to let you know that we'll extend the deadline until the date the
> migration is started (the date is not fixed yet).
> Please let us know your
o source is up-to-date though.
>
> Shai
>
> On Thu, Nov 10, 2016 at 4:40 PM Glen Newton <glen.new...@gmail.com> wrote:
>
> > I am looking for documentation on Lucene faceting. The most recent
> > documentation I can find is for 4.0.0 here:
> >
>
I am looking for documentation on Lucene faceting. The most recent
documentation I can find is for 4.0.0 here:
http://lucene.apache.org/core/4_0_0/facet/org/apache/lucene/facet/doc-files/userguide.html
Is there more recent documentation for 6.3.0? Or 6.x?
Thanks,
Glen
> load a single document (or a fixed number of them) for every step. In
> the case you call loadAll() there is a problem with memory.
>
>
>
>
> 2016-08-19 15:39 GMT+02:00, Glen Newton <glen.new...@gmail.com>:
> > Making docid an int64 is a non-trivial undertaki
Making docid an int64 is a non-trivial undertaking, and this work needs to
be compared against the use cases and how compelling they are.
That said, in the lifetime of most software projects a decision is made to
break backward compatibility to move the project forward.
When/if moving to int64
Or maybe it is time Lucene re-examined this limit.
There are use cases out there where >2^31 does make sense in a single index
(huge number of tiny docs).
Also, I think the underlying hardware and the JDK have advanced to make
this more defendable.
Constructively,
Glen
On Thu, Aug 18, 2016 at
of your grouped data. This is really limiting if your
relationships are truly many to many.
Hope that helps,
Greg
On Tue, Dec 16, 2014 at 10:46 AM, Glen Newton glen.new...@gmail.com wrote:
Anyone?
On Thu, Dec 11, 2014 at 2:53 PM, Glen Newton glen.new...@gmail.com wrote:
Is there any reason
Anyone?
On Thu, Dec 11, 2014 at 2:53 PM, Glen Newton glen.new...@gmail.com wrote:
Is there any reason JoinUtil (below) does not have a 'Query toQuery'
available? I was wanting to filter on the 'to' side as well. I feel I
am missing something here.
To make sure this is not an XY problem, here
Is there any reason JoinUtil (below) does not have a 'Query toQuery'
available? I was wanting to filter on the 'to' side as well. I feel I
am missing something here.
To make sure this is not an XY problem, here is my use case:
I have a many-to-many relationship. The left, join, and right 'table'
Hi Koji,
Semantic vectors is here: http://code.google.com/p/semanticvectors/
It is a project that has been around for a number of years and used by many
people (including me
http://zzzoot.blogspot.com/2009/07/project-torngat-building-large-scale.html
).
If you could compare and contrast
You should consider making each _line_ of the log file a (Lucene)
document (assuming it is a log-per-line log file)
-Glen
On Fri, Feb 14, 2014 at 4:12 PM, John Cecere john.cec...@oracle.com wrote:
I'm not sure in today's world I would call 2GB 'immense' or 'enormous'. At
any rate, I don't have
Hello,
I know I've seen it go by on this list and elsewhere, but cannot seem
to find it: can someone point me to the best way to do term expansions
at indexing time.
That is, when the sentence is: This foo is in my way
And I somewhere: foo=bar|yak
Lucene indexes something like:
This
Thanks :-)
On Fri, May 3, 2013 at 2:31 PM, Alan Woodward a...@flax.co.uk wrote:
Hi Glen,
You want the SynonymFilter:
http://lucene.apache.org/core/4_2_0/analyzers-common/org/apache/lucene/analysis/synonym/SynonymFilter.html
Alan Woodward
www.flax.co.uk
On 3 May 2013, at 19:14, Glen
I am in the process of upgrading LuSql from 2.x to 4.x and I am first
going to 3.6 as the jump to 4.x was too big.
I would suggest this to you. I think it is less work.
Of course I am also able to offer LuSql to 3.6 users, so this is
slightly different from your case.
-Glen
On Wed, Jan 9, 2013
Unfortunately, Lucene doesn't properly index
spans (it records the start position but not the end position), so
that limits what kind of matching you can do at search time.
If this could be fixed (i.e. indexing the _end_ of a span) I think all
the things that I want to do, and the things that can
It is not clear this is exactly what is needed/being discussed.
From the issue:
We are also planning a Tokenizer/TokenFilter that can put parts of
speech as either payloads (PartOfSpeechAttribute?) on a token or at
the same position.
This adds it to a token, not a span. 'same position' does not
example of adding an annotation to text.
On 12/13/2012 01:54 PM, Glen Newton wrote:
It is not clear this is exactly what is needed/being discussed.
From the issue:
We are also planning a Tokenizer/TokenFilter that can put parts of
speech as either payloads (PartOfSpeechAttribute?) on a token
+10
These are the kind of things you can do in GATE[1] using annotations[2].
A VERY useful feature.
-Glen
[1]http://gate.ac.uk
[2]http://gate.ac.uk/wiki/jape-repository/annotations.html
On Wed, Dec 12, 2012 at 3:02 PM, Wu, Stephen T., Ph.D.
wu.step...@mayo.edu wrote:
Is there any
Yes, very interested.
-- Quick scan: very cool work! +10 :-)
Thanks,
Glen Newton
On Wed, Sep 26, 2012 at 9:59 AM, Carsten Schnober
schno...@ids-mannheim.de wrote:
Hi,
in case someone is interested in an application of the Lucene indexing
engine in the field of corpus linguistics rather
Storing content in large indexes can significantly add to index time.
The model of indexing fields only in Lucene and storing just a key,
and then storing the content in some other container (DBMS, NoSql,
etc) with the key as lookup is almost a necessity for this use case
unless you have a
Do the check _before_ indexing.
Use https://code.google.com/p/language-detection/ to verify the
language of the text document before you put it in the index.
-Glen Newton
http://zzzoot.blogspot.com/
On Mon, Feb 27, 2012 at 10:53 AM, Ilya Zavorin izavo...@caci.com wrote:
Suppose I have a bunch
I'd suggest writing a perl script or
insert-favourite-scripting-language-here script to pre-filter this
content out of the files before it gets to Lucene/Solr
Or you could just grep for Data' andDescription (or is
'Description' multi-line)?
-Glen Newton
On Mon, Feb 27, 2012 at 11:55 AM, Prakash
what to do in it.
Regards,
Prakash Bande
Director - Hyperworks Enterprise Software
Altair Eng. Inc.
Troy MI
Ph: 248-614-2400 ext 489
Cell: 248-404-0292
-Original Message-
From: Glen Newton [mailto:glen.new...@gmail.com]
Sent: Monday, February 27, 2012 12:05 PM
To: java-user
Caste -- Castle
https://bitbucket.org/acunu
http://support.acunu.com/entries/20216797-castle-build-instructions
It looks very promising.
It is a kernel module and I'm not sure it can run in user space, which
I'd prefer.
-Glen Newton
On Sat, Sep 3, 2011 at 9:21 PM, Otis Gospodnetic
and
http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.genprogc/doc/genprogc/sys_mem_alloc.htm
Finally (or before doing all of this! :-) ), do some profiling, both
inside of Java, and of the AIX native heap using svmon (see Native
Heap Exhaustion, p.135).
-Glen Newton
Could you elaborate what you want to do with the index of large
documents? Do you want to search at the document or sentence level?
This can drive how to index this content.
-Glen
On Fri, Jul 22, 2011 at 10:52 AM, starz10de farag_ah...@yahoo.com wrote:
Hi,
I have one text file that contains
So to use Lucene-speak, each sentence is a document.
I don't know how you are indexing and what code you are using (and
what hardware, etc.), but you if you are not already, should consider
multi-threading the indexing which should give you a significant
indexing performance boost.
-Glen
On
gmail interprets the closing asterisk as part of the URL, for all
three URLs -- 404s
You might want to add a space before the '*'...
-glen
On Thu, Jul 7, 2011 at 2:17 PM, Abhishek Rakshit abhis...@architexa.com wrote:
Hey folks,
We received great feedback on the Lucene Architecture site that
-threaded-query-lucene.html
http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks.html
Glen Newton
On Tue, Jan 25, 2011 at 11:31 AM, Siraj Haider si...@jobdiva.com wrote:
Hello there,
I was looking for best practices for indexing/searching on a
multi-processor/core machine
Where do you get your Lucene/Solr downloads from?
[x] ASF Mirrors (linked in our release announcements or via the Lucene website)
[] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[] I/we build them from source via an SVN/Git checkout.
-Glen Newton
me.
Thanks,
Glen Newton
http://zzzoot.blogspot.com
-- Old LuSql benchmarks:
http://zzzoot.blogspot.com/2008/11/lucene-231-vs-24-benchmarks-using-lusql.html
On Thu, Dec 16, 2010 at 12:04 PM, Dyer, James james.d...@ingrambook.com wrote:
We have ~50 long-running SQL queries that need to be joined
Does anyone know what technology they are using: http://www.indextank.com/
Is it Lucene under the hood?
Thanks, and apologies for cross-posting.
-Glen
http://zzzoot.blogspot.com
--
-
-
To unsubscribe, e-mail:
the ClueWeb collection
http://trec.nist.gov/pubs/trec18/papers/arsc.WEB.pdf
Expanding Queries Using Multiple Resources
http://staff.science.uva.nl/~mdr/Publications/Files/trec2006-proceedings-genomics.pdf
-Glen Newton
http://zzzoot.blogspot.com/2008/06/simultaneous-threaded-query-lucene.html
http
Hi Luan,
Could you tell us the name and/or URL of this plugin so that the list
might know about it?
Thanks,
Glen
On 10 August 2010 12:21, Luan Cestari luan.cest...@gmail.com wrote:
We would like to say thanks for the replies.
We found a plugin in Nutch (the Creative Commons plugin) that
, in a Solr context.
http://wiki.apache.org/solr/DataImportHandler
Thanks,
-Glen Newton
LuSql author
http://zzzoot.blogspot.com/
On 23 July 2010 15:46, manjula wijewickrema manjul...@gmail.com wrote:
Hi,
Normally, when I am building my index directory for indexed documents, I
used to keep my
There are a number of strategies, on the Java or OS side of things:
- Use huge pages[1]. Esp on 64 bit and lots of ram. For long running,
large memory (and GC busy) applications, this has achieved significant
improvements. Like 300% on EJBs. See [2],[3],[4]. For a great article
introducing and
Pluggable compression allowing for alternatives to gzip for text
compression for storing.
Specifically I am interested in bzip2[1] as implemented in Apache
Commons Compress[2].
While bzip2 compression is considerable slower than gzip (although
decompression is not too much slower than gzip) it
Hello Uwe.
That will teach me for not keeping up with the versions! :-)
So it is up to the application to keep track of what it used for compression.
Understandable.
Thanks!
Glen
On 27 February 2010 10:17, Uwe Schindler u...@thetaphi.de wrote:
Hi Glen,
Pluggable compression allowing for
Documents cannot be re-used in v3.0?
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
-glen
http://zzzoot.blogspot.com/
On 2 February 2010 02:55, Simon Willnauer
simon.willna...@googlemail.com wrote:
Ganesh,
do you reuse your Document instances in any way or do you create new
docs
, say when looking at their index with
Luke. :)
Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
- Original Message
From: Glen Newton glen.new...@gmail.com
To: java-user@lucene.apache.org
Sent: Tue
Could someone send me where the rationale for the removal of
COMPRESSED fields is? I've looked at
http://people.apache.org/~uschindler/staging-area/lucene-3.0.0-rc1/changes/Changes.html#3.0.0.changes_in_runtime_behavior
but it is a little light on the 'why' of this change.
My fault - of course -
/LUCENE-652
https://issues.apache.org/jira/browse/LUCENE-1960
Glen Newton wrote:
Could someone send me where the rationale for the removal of
COMPRESSED fields is? I've looked at
http://people.apache.org/~uschindler/staging-area/lucene-3.0.0-rc1/changes/Changes.html#3.0.0
You might try re-implementing, using ThreadPoolExecutor
http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.html
glen
2009/11/10 Jamie Band ja...@stimulussoft.com:
Hi There
Our app spends alot of time waiting for Lucene to finish writing to the
index. I'd like to
Disclosure: I am the author of LuSql.
-Glen Newton
http://zzzoot.blogspot.com/
http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/Glen_Newton
2009/10/22 Paul Taylor paul_t...@fastmail.fm:
I'm building a lucene index from a database, creating 1 about 1 million
documents
This is basically what LuSql does. The time increases (8h to 30 min)
are similar. Usually on the order of an order of magnitude.
Oh, the comments suggesting most of the interaction is with the
database? The answer is: it depends.
With large Lucene documents: Lucene is the limiting factor
and/or tests if
you have them.
Cheers,
Anthony
On Mon, Sep 14, 2009 at 1:03 PM, Glen Newton glen.new...@gmail.com wrote:
Hi,
In 2.4.1, Field has 2 constructors that involve a Reader:
public Field(String name,
Reader reader)
public Field(String name,
Reader
I appreciate your explanation, but I think that the use case I
described merits a deeper exploration:
Scenario 1: 16 threads indexing; queue size = 1000; present api; need to store
In this scenario, there are always 1000 Strings with all the contents
of their respective files.
Averaging 50k per
,
Reader reader,
Field.Store store,
Field.Index index,
Field.TermVector termVector)
Constructively,
Glen Newton
http://zzzoot.blogspot.com/
http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql
In this project:
http://zzzoot.blogspot.com/2009/07/project-torngat-building-large-scale.html
I concatenate all the text of all of articles of a single journal into
a single text file.
This can create a text file that is 500MB in size.
Lucene is OK in indexing files this size (in parallel even),
@lucene.apache.org] On
Behalf Of Glen Newton
Sent: Friday, September 11, 2009 9:53 AM
To: java-user@lucene.apache.org
Subject: Re: Indexing large files? - No answers yet...
In this project:
http://zzzoot.blogspot.com/2009/07/project-torngat-building-large-scale.html
I concatenate all
You are optimizing before the threads are finished adding to the index.
I think this should work:
IndexWriter writer = new IndexWriter(D:\\index, new StandardAnalyzer(),
true);
File file=new File(args[0]);
Thread t1=new Thread(new IndexFiles(writer,file));
Thread t2=new Thread(new
only the
full-text (no metadata).
For more info howto:
http://zzzoot.blogspot.com/2009/07/project-torngat-building-large-scale.html
Glen Newton
--
-
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
see you include Lucene v2.3 in your
code...does it work correctly with indexes created on v2.4 as well?
- Greg
On Mon, Apr 13, 2009 at 6:49 PM, Glen Newton glen.new...@gmail.com wrote:
As the creator of LuSql
[http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql]
I would have
Another solution is to have your application on the AppEngine, but the
index is on another machine. Then the application 'proxies' the
requests to the machine that has the index, which is using Solr
[http://lucene.apache.org/solr/] or some other way to expose to the
index to the web.
Yes, this
As the creator of LuSql
[http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql]
I would have hoped for a more creative (and more different) name.
:-)
-glen
2009/4/13 jonathan esposito jonathan.e...@gmail.com:
I created a command-line tool in Java that allows the user to execute
Dear Shashi,
It should work now.
A temporary failure: our apologies.
thanks,
Glen
2009/4/2 Shashi Kant sk...@sloan.mit.edu:
Hi all, I have been trying to get the latest version of LuSQL from the
NRC.ca website but get 404s on the download links. I have written to the
webmaster, but anyone
You might try looking in a list that talks about recommender systems.
Google hits:
- http://en.wikipedia.org/wiki/Recommendation_system
- ACM Recommender Systems 2009 http://recsys.acm.org/
- A Guide to Recommender Systems
http://www.readwriteweb.com/archives/recommender_systems.php
2009/3/17
I would suggest you try LuSql, which was designed specifically to
index relational databases into Lucene.
It has an extensive user manual/tutorial which has some complex
examples involving multi-joins and sub-queries.
I am the author of LuSql.
LuSql home page:
InfoSystems (P) Ltd
http://www.mapmyindia.com
Glen Newton wrote:
Could you give some configuration details:
- Solaris version
- Java VM version, heap size, and any other flags
- disk setup
You should also consider using huge pages (see
http://zzzoot.blogspot.com/2009/02/java-mysql-increased
Could you give some configuration details:
- Solaris version
- Java VM version, heap size, and any other flags
- disk setup
You should also consider using huge pages (see
http://zzzoot.blogspot.com/2009/02/java-mysql-increased-performance-with.html)
I will also be posting performance gains using
V1 of a project of mine, Ungava[1], which uses Lucene to index
research articles and library catalog metadata, also uses Project
Simile's Metaphor and Timeline. I have some simple examples using
them:
Here is the search for cell in articles:
Congrats good-luck on this new endeavour!
-Glen :-)
2009/1/26 Grant Ingersoll gsing...@apache.org:
Hi Lucene and Solr users,
As some of you may know, Yonik, Erik, Sami, Mark and I teamed up with
Marc Krellenstein to create a company to provide commercial
support (with SLAs), training,
There is a discussion here:
http://www.terracotta.org/web/display/orgsite/Lucene+Integration
Also of interest: Katta - distribute lucene indexes in a grid
http://katta.wiki.sourceforge.net/
-glen
http://zzzoot.blogspot.com/2008/11/lucene-231-vs-24-benchmarks-using-lusql.html
I'm not sure if it's a better idea to use something like Solr or start from
scratch and customize the application as I move forward. What do you think
LuSql might be appropriate for your needs:
LuSql is a high-performance, simple tool for indexing data held in a
DBMS into a Lucene index. It can
- Fast Similarity Search in Large Dictionaries. http://fastss.csg.uzh.ch/
- Paper: Fast Similarity Search in Large Dictionaries.
http://fastss.csg.uzh.ch/ifi-2007.02.pdf
- FastSimilarSearch.java http://fastss.csg.uzh.ch/FastSimilarSearch.java
- Paper: Fast Similarity Search in Peer-to-Peer
From what I understand:
faceted browse is a taxonomy of depth =1
A taxonomy in general has an arbitrary depth:
Example: Biological taxonomy:
Kingdom Animalia
Phylum Acanthocephala
Class Archiacanthocephala
Phylum Annelida
Kingdom Fungi
Phylum Ascomycota
Class Ascomycetes
I don't think this is an Open Source project: I couldn't find any
source on the site and the only download is a jar with .class files...
-glen
2008/12/10 John Wang [EMAIL PROTECTED]:
www.browseengine.com
-John
On Wed, Dec 10, 2008 at 10:55 AM, Glen Newton [EMAIL PROTECTED] wrote:
From what
Oops. Thanks! :-)
2008/12/10 Gary Moore [EMAIL PROTECTED]:
svn co https://bobo-browse.svn.sourceforge.net/svnroot/bobo-browse/trunk
bobo-browse
-Gary
Glen Newton wrote:
I don't think this is an Open Source project: I couldn't find any
source on the site and the only download is a jar
want concurrent writes.
-John
On Thu, Dec 4, 2008 at 2:44 PM, Glen Newton [EMAIL PROTECTED] wrote:
Am I missing something here?
Why not use:
IndexWriter writer = new IndexWriter(NIOFSDirectory.getDirectory(new
File(filename), analyzer, true);
Another question: is NIOFSDirectory
Sorrywhat version are we talking about? :-)
thanks,
Glen
2008/12/4 Yonik Seeley [EMAIL PROTECTED]:
On Thu, Dec 4, 2008 at 4:11 PM, John Wang [EMAIL PROTECTED] wrote:
Hi guys:
We did some profiling and benchmarking:
The thread contention on FSDIrectory is gone, and for the set of
at 4:32 PM, Glen Newton [EMAIL PROTECTED]
wrote:
Sorrywhat version are we talking about? :-)
The current development version of Lucene allows you to directly
instantiate FSDirectory subclasses.
-Yonik
thanks,
Glen
2008/12/4 Yonik Seeley [EMAIL PROTECTED]:
On Thu
Hi Magnus,
Could you post the OS, version, RAM size, swapsize, Java VM version,
hardware, #cores, VM command line parameters, etc? This can be very
relevant.
Have you tried other garbage collectors and/or tuning as described in
http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html?
Let's say I have 8 indexes on a 4 core system and I want to merge them
(inside a single vm instance).
Is it better to do a single merge of all 8, or to in parallel threads
merge in pairs, until there is only a single index left? I guess the
question involves how multi-threaded merging is and if it
I have some simple indexing benchmarks comparing Lucene 2.3.1 with 2.4:
http://zzzoot.blogspot.com/2008/11/lucene-231-vs-24-benchmarks-using-lusql.html
In the next couple of days I will be running benchmarks comparing
Solr's DataImportHandler/JdbcDataSource indexing performance with
LuSql and
an
86GB Lucene index in ~13 hours.
http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql
Glen Newton
--
-
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
I have a use case where I want all of my documents to have - in
addition to their other fields - a single field=value.
An example use is where I have multiple Lucene indexes that I search
in parallel, but still need to distinguish them.
Index 1: All documents have: source=a1
Index 2: All
Thanks! :-)
2008/11/6 Michael McCandless [EMAIL PROTECTED]:
The field never changes across all docs? If so, this will work fine.
Mike
Glen Newton wrote:
I have a use case where I want all of my documents to have - in
addition to their other fields - a single field=value.
An example
Hello,
I am using Lucene 2.3.1.
I have concurrent threads adding Fields to the same Document, but
getting some odd behaviour.
Before going into too much depth, is Document thread-safe?
thanks,
Glen
http://zzzoot.blogspot.com/
--
-
Yes, the problem goes away when I do the following:
synchronized(doc)
{
doc.add(field);
}
Thanks.
[I'll use a Lock to do this properly]
-glen
2008/10/31 Yonik Seeley [EMAIL PROTECTED]:
On Fri, Oct 31, 2008 at 11:53 AM, Glen Newton [EMAIL PROTECTED] wrote:
I have concurrent threads adding
You might want to look at my indexing of 6.4 million PDF articles,
full-text and metadata. It resulted in an 83GB index taking 20.5 hours
to run. It uses multiple writers, is massively multithreaded.
More info here:
http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks.html
2008/10/23 Mark Miller [EMAIL PROTECTED]:
It sounds like you might have some thread synchronization issues outside of
Lucene. To simplify things a bit, you might try just using one IndexWriter.
If I remember right, the IndexWriter is now pretty efficient, and there
isn't much need to index to
2008/10/23 Michael McCandless [EMAIL PROTECTED]:
Mark Miller wrote:
Glen Newton wrote:
2008/10/23 Mark Miller [EMAIL PROTECTED]:
It sounds like you might have some thread synchronization issues outside
of
Lucene. To simplify things a bit, you might try just using one
IndexWriter.
If I
Sorry, could you explain what you mean by a link map over lucene results?
thanks,
-glen
2008/10/16 Darren Govoni [EMAIL PROTECTED]:
Hi,
Has anyone created a link map over lucene results or know of a link
describing the process? If not, I would like to build one to contribute.
Also, I read
vectors in Lucene, but
I've never used TFV's before. And can then be limited to just a set of
results.
HTH,
Darren
On Thu, 2008-10-16 at 14:09 -0400, Glen Newton wrote:
Sorry, could you explain what you mean by a link map over lucene results?
thanks,
-glen
2008/10/16 Darren Govoni [EMAIL
See also:
http://zzzoot.blogspot.com/2007/10/drill-clouds-for-search-refinement-id.html
and
http://zzzoot.blogspot.com/2007/10/tag-cloud-inspired-html-select-lists.html
-glen
2008/10/16 Glen Newton [EMAIL PROTECTED]:
Yes, tag clouds.
I've implemented them using Lucene here for NRC Research
IndexWriter is thread-safe and has been for a while
(http://www.mail-archive.com/[EMAIL PROTECTED]/msg00157.html)
so you don't have to worry about that.
As reported in my blog in April
(http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks.html)
but perhaps not explicitly
I think it is not good idea to use lucene as storage, it is just index.
I strongly disagree with this position.
To qualify my disagreement: yes, you should not use Lucene as your
primary storage for your data in your organization.
But, for a particular application, taking content from your
There are a number of ways to do this. Here is one:
Lose the parentid field (unless you have other reasons to keep it).
Add a field fullName, and a field called depth :
doc1
fullName: state
depth: 0
doc2
fullName: state/department
depth:1
doc3
fullName: state/department/Boston
depth: 2
doc4
A subset of your questions are answered (or at least examined) in my
postings on multi-thread queries on a multiple-core single system:
http://zzzoot.blogspot.com/2008/06/simultaneous-threaded-query-lucene.html
http://zzzoot.blogspot.com/2008/06/lucene-concurrent-search-performance.html
-Glen
Use Carrot2:
http://project.carrot2.org/
For Lucene + Carrot2:
http://project.carrot2.org/faq.html#lucene-integration
-glen
2008/7/7 Ariel [EMAIL PROTECTED]:
Hi everybody:
Do you have Idea how to make how to make documents clustering and topic
classification using lucene ??? Is there
Lutan,
Yes, no problem. I am away at a conference next week but plan to
release the code the following week. Is this OK for you?
thanks,
Glen
2008/6/13 lutan [EMAIL PROTECTED]:
TO: Glen Newton Could I get your test code or code architecture for study.
I have try to using
work and
I will clean things up a bit, write a little documentation.
-Glen
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Glen Newton [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Tuesday, June 10, 2008 12:51:41 AM
Subject
/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!
On Mon, Jun 9, 2008 at 3:51 PM, Glen Newton [EMAIL PROTECTED] wrote:
A number of people have asked about query benchmarks.
I have
? :-)
Is this something that is already being talked about/looked in
to/being implemented? :-)
thanks,
Glen Newton
http://zzzoot.blogspot.com/
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
indexing and querying out-of-the-box.
Best
Erick
On Thu, Jun 5, 2008 at 12:14 PM, Glen Newton [EMAIL PROTECTED] wrote:
I would like to be able to get multi-language support within a single
index.
I would appreciate input on what I am suggesting:
Assuming that you want something like
You should consider keeping the PageRank (and any other more dynamic
data) in a separate index (with the documents in the same oder as your
bigger, more static index) and then use a ParallelReader on both of
them. See:
2008/5/22 Otis Gospodnetic [EMAIL PROTECTED]:
Some quick feedback. Those are all very expensive queries (wildcards and
ranges). The first thing I'd do is try without Hibernate Search (to make
sure HS is not the bottleneck). 100 threads is a lot, I'm guessing you are
reusing your
Vaijanath,
I think I would do things in a different fashion:
Lucene default distance metric is based on tf/idf and the cosine
model, i.e. the frequencies of items. I believe the values that you
are adding as Fields are the values in n-space for each of these
image-based attributes. I don't
I have created Indexes with 1.5 billion documents.
It was experimental: I took an index with 25 million documents, and
merged it with itself many times. While not definitive as there were
only 25m unique documents that were duplicated, it did prove that
Lucene should be able to handle this number
1 - 100 of 116 matches
Mail list logo