I don't think there's an easy way to jump straight from term + freq
per doc to a Lucene index.
Mike
On Tue, Apr 21, 2009 at 7:14 AM, Thomas Pönitz
wrote:
> Hi,
>
> I have the same problem as discussed here:
> http://mail-archives.apache.org/mod_mbox/lucene-java-user/200511.mbox/%3c200511021310.1
Hi,
I have the same problem as discussed here:
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200511.mbox/%3c200511021310.18686...@last.fm%3e
I want to specify termvectors directly instead of constructing a dummy
string like "a a a b b c" that will be transformed to a[3] b[2] c[1].
> Ah, so the fact that "1" actually appears many times in the string you
> give Lucene is important. Neat application!
>
> Sounds like the custom Analyzer (really a custom TokenStream) approach
> suggested by others may be the way for you to go. If the information
> you get from the MySQL profile
Richard Jones wrote:
If you're willing to continue subsetting / summarizing the data out into
Lucene, how about subsetting it out into a dedicated MySQL instance for
this purpose? 100 artists * 1M profiles * 2 ints * 4 bytes/int =
roughly 1 GB of data, which would easily fit into RAM. Queries
> If you're willing to continue subsetting / summarizing the data out into
> Lucene, how about subsetting it out into a dedicated MySQL instance for
> this purpose? 100 artists * 1M profiles * 2 ints * 4 bytes/int =
> roughly 1 GB of data, which would easily fit into RAM. Queries should
> be pret
Richard Jones wrote:
The data i'm dealing with is stored over a few mysql dbs on different
machines, horizontally partitioned so each user is assigned to a single db.
The queries i'm doing can be done in SQL in parallel over all machines then
combined, which i've tested - it's unacceptably slo
Not sure if this is feasible, but is there someway you could use a
"fake" analyzer that you constructed using your hashtable/termvector and
then have it output the tokens directly from the hashtable via the
TokenStream? Maybe you would have to pass in an empty/dummy string to
the field constru
Hi Erik
Our lucene-powered music search went live this week, so your search should
work now: http://www.last.fm/explore/search.php?q=Michael+Hedges
Before we discovered lucene our search sucked *really* badly ;)
Adding multiple fields like this is similar to what i'm doing now (i am using
whites
> I can think of a few ways. If elegance is your goal, then a little
> relational database theory might help. Specifically, instead of having
> one record per listener, have one record per listener-artist
> combination, with three fields: listenerid, artistid, and count. Your
> example above wo
On 2 Nov 2005, at 08:10, Richard Jones wrote:
If i've listened to Radiohead (id 1) 10 times, Coldplay (id 2) 5
times and
Beck (id 3) 2 times, the field would look like this "1 1 1 1 1 1 1
1 1 1 2 2
2 2 2 3 3"
I use this index for quickly finding "top fans" of an artist or
combination of
Richard Jones wrote:
Hi,
I'm using lucene (which rocks, btw ;) behind the scenes at www.last.fm for
various things, and i've run into a situation that seems somewhat inelegant
regarding populating fields which i already know the termvector for.
I'm creating a document for each user (last.fm t
Hi,
I'm using lucene (which rocks, btw ;) behind the scenes at www.last.fm for
various things, and i've run into a situation that seems somewhat inelegant
regarding populating fields which i already know the termvector for.
I'm creating a document for each user (last.fm tracks music taste for pe
12 matches
Mail list logo