Re: Distributed Fulltext?

2002-02-16 Thread David Axmark
On Fri, 2002-02-15 at 02:44, Alex Aulbach wrote: Wednesday, from David Axmark: Your other point about exact vs. approximate answers is unclear, I expect that Google's answers are exact for their currently available indexes at any given time. But even if they are approximate, I'd be

Re: Distributed Fulltext?

2002-02-14 Thread Alex Aulbach
Wednesday, from Mike Wexler: I don't think that would be appropriate. My example, is our site (tias.com) has lots of antiques and collectibles. One popular categories is jewelry. If somebody does a search for gold jewelry and the search engine interprets this as anything that mentions gold or

Re: Distributed Fulltext?

2002-02-14 Thread Alex Aulbach
Wednesday, from David Axmark: Your other point about exact vs. approximate answers is unclear, I expect that Google's answers are exact for their currently available indexes at any given time. But even if they are approximate, I'd be happy with that too. The scoring on a FULLTEXT search

Re: Distributed Fulltext?

2002-02-13 Thread James Montebello
I did this at a previous job, and we split the data up more or less this way (we used a pre-existing item number for the split which was essentially random in relation to the text data), with a aggregator that did the query X ways, each to a separate box holding 1/X of the data. The results from

Re: Distributed Fulltext?

2002-02-13 Thread Brian Bray
It seems to me like the best solution that could be implemented as-is would be to keep a random int column in your table (with a range of say 1-100) and then have fulltext server 1 psudo-replicate records with a the random number in the range of 1-10, server 2 11-20 and server 3 21-30 and so

Re: Distributed Fulltext?

2002-02-13 Thread Tod Harter
On Thursday 07 February 2002 14:53, Brian DeFeyter wrote: Has anyone made a suggestion or thought about ways to distribute databases which focus on fulltext indexes? fulltext indexes do a good job of indexing a moderate amount of data, but when you get a lot of data to be indexed, the

Re: Distributed Fulltext?

2002-02-13 Thread Mike Wexler
Steve Rapaport wrote: On Friday 08 February 2002 06:14 pm, James Montebello wrote: Distribution is how Google gets its speed. You say clustering won't solve the problem, but distributing the indicies across many processors *is* going to gain you a huge speed increase through sheer

Re: Distributed Fulltext?

2002-02-13 Thread alec . cawley
Why is it that Altavista can index terabytes overnight and return a fulltext boolean for the WHOLE WEB within a second, and Mysql takes so long? I don't know about Altavista, but if you read up on Google, they do indeed do some sort of spreading of keywords across multiple machines - last I

Re: Distributed Fulltext?

2002-02-13 Thread Steve Rapaport
Ooops, factual error: If, say, Google, can search 2 trillion web pages, averaging say 70k bytes each, in 1 second, and Mysql can search 22 million records, with an index on 40 bytes each, in 3 seconds (my experience) on a good day, what's the order of magnitude difference? Roughly 10^9.

Re: Distributed Fulltext?

2002-02-13 Thread Steve Rapaport
I said: Why is it that Altavista can index terabytes overnight and return a fulltext boolean for the WHOLE WEB within a second, and Mysql takes so long? On Friday 08 February 2002 08:56, Vincent Stoessel wrote: Apples and oranges. Yeah, I know. But let's see if we can make some

Re: Distributed Fulltext?

2002-02-13 Thread David Axmark
On Tue, 2002-02-12 at 15:38, Steve Rapaport wrote: David Axmark writes: So the standard answer with Apples and Oranges certainly apply here! More like Äpplen och Apelsiner, that is, different but similar. You Swedish guys should know. Thanks for answering, David, I appreciate the

Re: Distributed Fulltext?

2002-02-13 Thread Steven Roussey
[comparisons to Google...] While any speed up with a full table fulltext search would be helpful and useful, there are instances where the search is intersected with another column and the problem of search is therefore more complex but also leads to potential optimizations. In our case we

Re: Distributed Fulltext?

2002-02-13 Thread Mike Wexler
My understanding is that part of how google and Altavista get such high speeds is to keep everything in memory. Is it possible to create a HEAP table with a full text index? If so, does the full text index take advantage of being in memory? For example, I would imagine that if you were keeping

Re: Distributed Fulltext?

2002-02-13 Thread Mike Wexler
Steve Rapaport wrote: Someone correctly pointed out today that it's not Mysql's job to be Google, and I agree. But it seems to me that it would be fair for mysql to be able to handle searches in under 1 second for databases 1 millionth the size of Google. All I want here is a decent

Re: Distributed Fulltext?

2002-02-13 Thread Brian DeFeyter
I sorta like that idea. I don't know exactly what you can and can't do as far as indexing inside of HEAP tables.. but the index size would likely differ from the written index. Then you can expand the idea and use the X/(num slices) on (num slices) boxes technique.. sending the query to each, and

Re: Distributed Fulltext?

2002-02-13 Thread Mike Wexler
Brian DeFeyter wrote: I sorta like that idea. I don't know exactly what you can and can't do as far as indexing inside of HEAP tables.. but the index size would likely differ from the written index. Then you can expand the idea and use the X/(num slices) on (num slices) boxes technique..

Re: Distributed Fulltext?

2002-02-13 Thread Brian DeFeyter
On Wed, 2002-02-13 at 16:39, Mike Wexler wrote: Brian DeFeyter wrote: I sorta like that idea. I don't know exactly what you can and can't do as far as indexing inside of HEAP tables.. but the index size would likely differ from the written index. Then you can expand the idea and use

Re: Distributed Fulltext?

2002-02-13 Thread Mike Wexler
Brian DeFeyter wrote: On Wed, 2002-02-13 at 16:39, Mike Wexler wrote: Brian DeFeyter wrote: I sorta like that idea. I don't know exactly what you can and can't do as far as indexing inside of HEAP tables.. but the index size would likely differ from the written index. Then you can expand

Re: Distributed Fulltext?

2002-02-13 Thread hooker
Steve Rapaport wrote: Someone correctly pointed out today that it's not Mysql's job to be Google, and I agree. But it seems to me that it would be fair for mysql to be able to handle searches in under 1 second for databases 1 millionth the size of Google. All I want here is a decent

Re: Distributed Fulltext?

2002-02-13 Thread hooker
While any speed up with a full table fulltext search would be helpful and useful, there are instances where the search is intersected with another column and the problem of search is therefore more complex but also leads to potential optimizations. In our case we rarely do searches that

Re: Distributed Fulltext?

2002-02-13 Thread Steven Roussey
. 4. If you do #2 and #3 you'll notice that you can have x (10 for us) number of servers partition the FTS. We don't actually do this, but we could and therefore get 'Distributed Fulltext' -- the title of this thread!!! Number 1 and 3 should work for everyone, I think. Only if your app can partition

Re: Distributed Fulltext?

2002-02-12 Thread Brian DeFeyter
On Friday 08 February 2002 08:56, Vincent Stoessel wrote: Apples and oranges. Yeah, I know. But let's see if we can make some distinctions. If, say, Google, can search 2 trillion web pages, averaging say 70k bytes each, in 1 second, and Mysql can search 22 million records, with an

Re: Distributed Fulltext?

2002-02-12 Thread James Montebello
For the slice servers, you simply assume that if one is lost, you lose X% of the data until it is revived, which is usually not even noticable by the end user. For the aggregators, we had four behind a load-balancer. In practice, we had nearly zero downtime over a roughly 18 month period.

Re: Distributed Fulltext?

2002-02-12 Thread George M. Ellenburg
Last week on Slashdot there was an article where the CEO of Google mentioned he uses DRAM (solid state disk arrays) rather than hard drives for the indexes and arrays because of the magnitude of difference in speed they provide. There's your 10^6 difference in speed (or part of it). G.

Re: Distributed Fulltext?

2002-02-12 Thread David Axmark
On Fri, 2002-02-08 at 11:11, Steve Rapaport wrote: I said: Why is it that Altavista can index terabytes overnight and return a fulltext boolean for the WHOLE WEB within a second, and Mysql takes so long? On Friday 08 February 2002 08:56, Vincent Stoessel wrote: Apples and

Re: Distributed Fulltext?

2002-02-12 Thread Steve Rapaport
David Axmark writes: So the standard answer with Apples and Oranges certainly apply here! More like Äpplen och Apelsiner, that is, different but similar. You Swedish guys should know. Thanks for answering, David, I appreciate the attention from a founder. I also appreciate your point that

Re: Distributed Fulltext?

2002-02-11 Thread Steve Rapaport
I second the question. It could also reduce the size of the fulltext index and the time taken to update it. -steve On Thursday 07 February 2002 20:53, Brian wrote: Has anyone made a suggestion or thought about ways to distribute databases which focus on fulltext indexes? fulltext

Re: Distributed Fulltext?

2002-02-11 Thread Brian DeFeyter
On Thu, 2002-02-07 at 15:40, Tod Harter wrote: [snip] Wouldn't be too tough to write a little query routing system if you are using perl. Use DBD::Proxy on the web server side, and just hack the perl proxy server so it routes the query to several places and returns a single result set.

Re: Distributed Fulltext?

2002-02-11 Thread Steve Rapaport
Also, I have to ask the question: Why is it that Altavista can index terabytes overnight and return a fulltext boolean for the WHOLE WEB within a second, and Mysql takes so long? On Friday 08 February 2002 11:50, Steve Rapaport wrote: I second the question. It could also reduce the size

Re: Distributed Fulltext?

2002-02-10 Thread Steve Rapaport
On Friday 08 February 2002 06:14 pm, James Montebello wrote: Distribution is how Google gets its speed. You say clustering won't solve the problem, but distributing the indicies across many processors *is* going to gain you a huge speed increase through sheer parallelism. True, but not

Re: Distributed Fulltext?

2002-02-10 Thread George M. Ellenburg
Last week on Slashdot there was an article where the CEO of Google mentioned he uses DRAM (solid state disk arrays) rather than hard drives for the indexes and arrays because of the magnitude of difference in speed they provide. There's your 10^6 difference in speed (or part of it). G.

Re: Distributed Fulltext?

2002-02-08 Thread alec . cawley
Why is it that Altavista can index terabytes overnight and return a fulltext boolean for the WHOLE WEB within a second, and Mysql takes so long? I don't know about Altavista, but if you read up on Google, they do indeed do some sort of spreading of keywords across multiple machines - last I

Re: Distributed Fulltext?

2002-02-08 Thread Alex Aulbach
Yesterday, from Brian DeFeyter: Has anyone made a suggestion or thought about ways to distribute databases which focus on fulltext indexes? fulltext indexes do a good job of indexing a moderate amount of data, but when you get a lot of data to be indexed, the queries slow down

Re: Distributed Fulltext?

2002-02-08 Thread Steve Rapaport
Ooops, factual error: If, say, Google, can search 2 trillion web pages, averaging say 70k bytes each, in 1 second, and Mysql can search 22 million records, with an index on 40 bytes each, in 3 seconds (my experience) on a good day, what's the order of magnitude difference? Roughly 10^9.

Re: Distributed Fulltext?

2002-02-08 Thread James Montebello
For the slice servers, you simply assume that if one is lost, you lose X% of the data until it is revived, which is usually not even noticable by the end user. For the aggregators, we had four behind a load-balancer. In practice, we had nearly zero downtime over a roughly 18 month period.

Distributed Fulltext?

2002-02-07 Thread Brian DeFeyter
Has anyone made a suggestion or thought about ways to distribute databases which focus on fulltext indexes? fulltext indexes do a good job of indexing a moderate amount of data, but when you get a lot of data to be indexed, the queries slow down significantly. I have an example table, with

Re: Distributed Fulltext?

2002-02-07 Thread Tod Harter
On Thursday 07 February 2002 14:53, Brian DeFeyter wrote: Has anyone made a suggestion or thought about ways to distribute databases which focus on fulltext indexes? fulltext indexes do a good job of indexing a moderate amount of data, but when you get a lot of data to be indexed, the

Re: Distributed Fulltext?

2002-02-07 Thread Brian DeFeyter
On Thu, 2002-02-07 at 15:40, Tod Harter wrote: [snip] Wouldn't be too tough to write a little query routing system if you are using perl. Use DBD::Proxy on the web server side, and just hack the perl proxy server so it routes the query to several places and returns a single result set.

Re: Distributed Fulltext?

2002-02-07 Thread Steve Rapaport
I second the question. It could also reduce the size of the fulltext index and the time taken to update it. -steve On Thursday 07 February 2002 20:53, Brian wrote: Has anyone made a suggestion or thought about ways to distribute databases which focus on fulltext indexes? fulltext

Re: Distributed Fulltext?

2002-02-07 Thread Brian Bray
It seems to me like the best solution that could be implemented as-is would be to keep a random int column in your table (with a range of say 1-100) and then have fulltext server 1 psudo-replicate records with a the random number in the range of 1-10, server 2 11-20 and server 3 21-30 and so

Re: Distributed Fulltext?

2002-02-07 Thread James Montebello
I did this at a previous job, and we split the data up more or less this way (we used a pre-existing item number for the split which was essentially random in relation to the text data), with a aggregator that did the query X ways, each to a separate box holding 1/X of the data. The results from

Re: Distributed Fulltext?

2002-02-07 Thread Amir Aliabadi
How do you make something like this fault tolerant? The answer is probably what I suspect, 2 of every thing. How does the aggregator handle this or are these machines in a cluster? We are thinking of how to rebuild our fulltext search. Currently it is in MS SQL 7.0 - MySQL 4.0 seems to blow

Re: Distributed Fulltext?

2002-02-07 Thread Steve Rapaport
Also, I have to ask the question: Why is it that Altavista can index terabytes overnight and return a fulltext boolean for the WHOLE WEB within a second, and Mysql takes so long? On Friday 08 February 2002 11:50, Steve Rapaport wrote: I second the question. It could also reduce the size