On Fri, 2002-02-15 at 02:44, Alex Aulbach wrote:
Wednesday, from David Axmark:
Your other point about exact vs. approximate answers is unclear, I expect
that Google's answers are exact for their currently available indexes at any
given time. But even if they are approximate, I'd be
Wednesday, from Mike Wexler:
I don't think that would be appropriate. My example, is our site (tias.com) has
lots of antiques and collectibles. One popular categories is jewelry. If
somebody does a search for gold jewelry and the search engine interprets this
as anything that mentions gold or
Wednesday, from David Axmark:
Your other point about exact vs. approximate answers is unclear, I expect
that Google's answers are exact for their currently available indexes at any
given time. But even if they are approximate, I'd be happy with that too.
The scoring on a FULLTEXT search
I did this at a previous job, and we split the data up more or less
this way (we used a pre-existing item number for the split which was
essentially random in relation to the text data), with a aggregator that
did the query X ways, each to a separate box holding 1/X of the data.
The results from
It seems to me like the best solution that could be implemented as-is
would be to keep a random int column in your table (with a range of say
1-100) and then have fulltext server 1 psudo-replicate records with a
the random number in the range of 1-10, server 2 11-20 and server 3
21-30 and so
On Thursday 07 February 2002 14:53, Brian DeFeyter wrote:
Has anyone made a suggestion or thought about ways to distribute
databases which focus on fulltext indexes?
fulltext indexes do a good job of indexing a moderate amount of data,
but when you get a lot of data to be indexed, the
Steve Rapaport wrote:
On Friday 08 February 2002 06:14 pm, James Montebello wrote:
Distribution is how Google gets its speed. You say clustering won't
solve the problem, but distributing the indicies across many processors
*is* going to gain you a huge speed increase through sheer
Why is it that Altavista can index terabytes overnight and return
a fulltext boolean for the WHOLE WEB
within a second, and Mysql takes so long?
I don't know about Altavista, but if you read up on Google, they do indeed
do some sort of
spreading of keywords across multiple machines - last I
Ooops, factual error:
If, say, Google, can search 2 trillion web pages, averaging say 70k
bytes each, in 1 second, and Mysql can search 22 million records, with
an index on 40 bytes each, in 3 seconds (my experience) on a good day,
what's the order of magnitude difference? Roughly 10^9.
I said:
Why is it that Altavista can index terabytes overnight and return
a fulltext boolean for the WHOLE WEB
within a second, and Mysql takes so long?
On Friday 08 February 2002 08:56, Vincent Stoessel wrote:
Apples and oranges.
Yeah, I know. But let's see if we can make some
On Tue, 2002-02-12 at 15:38, Steve Rapaport wrote:
David Axmark writes:
So the standard answer with Apples and Oranges certainly apply here!
More like Äpplen och Apelsiner, that is, different but similar. You Swedish
guys should know. Thanks for answering, David, I appreciate the
[comparisons to Google...]
While any speed up with a full table fulltext search would be helpful
and useful, there are instances where the search is intersected with
another column and the problem of search is therefore more complex but
also leads to potential optimizations.
In our case we
My understanding is that part of how google and Altavista get such high speeds
is to keep everything in memory. Is it possible to create a HEAP table with a
full text index? If so, does the full text index take advantage of being in
memory? For example, I would imagine that if you were keeping
Steve Rapaport wrote:
Someone correctly pointed out today that it's not Mysql's job
to be Google, and I agree. But it seems to me that it would be
fair for mysql to be able to handle searches in under 1 second
for databases 1 millionth the size of Google. All I want here
is a decent
I sorta like that idea. I don't know exactly what you can and can't do
as far as indexing inside of HEAP tables.. but the index size would
likely differ from the written index. Then you can expand the idea and
use the X/(num slices) on (num slices) boxes technique.. sending the
query to each, and
Brian DeFeyter wrote:
I sorta like that idea. I don't know exactly what you can and can't do
as far as indexing inside of HEAP tables.. but the index size would
likely differ from the written index. Then you can expand the idea and
use the X/(num slices) on (num slices) boxes technique..
On Wed, 2002-02-13 at 16:39, Mike Wexler wrote:
Brian DeFeyter wrote:
I sorta like that idea. I don't know exactly what you can and can't do
as far as indexing inside of HEAP tables.. but the index size would
likely differ from the written index. Then you can expand the idea and
use
Brian DeFeyter wrote:
On Wed, 2002-02-13 at 16:39, Mike Wexler wrote:
Brian DeFeyter wrote:
I sorta like that idea. I don't know exactly what you can and can't do
as far as indexing inside of HEAP tables.. but the index size would
likely differ from the written index. Then you can expand
Steve Rapaport wrote:
Someone correctly pointed out today that it's not Mysql's job
to be Google, and I agree. But it seems to me that it would be
fair for mysql to be able to handle searches in under 1 second
for databases 1 millionth the size of Google. All I want here
is a decent
While any speed up with a full table fulltext search would be helpful
and useful, there are instances where the search is intersected with
another column and the problem of search is therefore more complex but
also leads to potential optimizations.
In our case we rarely do searches that
.
4. If you do #2 and #3 you'll notice that you can have x (10 for us)
number of servers partition the FTS. We don't actually do this, but we
could and therefore get 'Distributed Fulltext' -- the title of this
thread!!!
Number 1 and 3 should work for everyone, I think. Only if your app can
partition
On Friday 08 February 2002 08:56, Vincent Stoessel wrote:
Apples and oranges.
Yeah, I know. But let's see if we can make some distinctions.
If, say, Google, can search 2 trillion web pages, averaging say 70k
bytes each, in 1 second, and Mysql can search 22 million records, with
an
For the slice servers, you simply assume that if one is lost, you lose X%
of the data until it is revived, which is usually not even noticable by
the end user. For the aggregators, we had four behind a load-balancer.
In practice, we had nearly zero downtime over a roughly 18 month period.
Last week on Slashdot there was an article where the CEO of Google mentioned he
uses DRAM (solid state disk arrays) rather than hard drives for the indexes and
arrays because of the magnitude of difference in speed they provide.
There's your 10^6 difference in speed (or part of it).
G.
On Fri, 2002-02-08 at 11:11, Steve Rapaport wrote:
I said:
Why is it that Altavista can index terabytes overnight and return
a fulltext boolean for the WHOLE WEB
within a second, and Mysql takes so long?
On Friday 08 February 2002 08:56, Vincent Stoessel wrote:
Apples and
David Axmark writes:
So the standard answer with Apples and Oranges certainly apply here!
More like Äpplen och Apelsiner, that is, different but similar. You Swedish
guys should know. Thanks for answering, David, I appreciate the attention
from a founder.
I also appreciate your point that
I second the question. It could also reduce the size of the
fulltext index and the time taken to update it.
-steve
On Thursday 07 February 2002 20:53, Brian wrote:
Has anyone made a suggestion or thought about ways to distribute
databases which focus on fulltext indexes?
fulltext
On Thu, 2002-02-07 at 15:40, Tod Harter wrote:
[snip]
Wouldn't be too tough to write a little query routing system if you are using
perl. Use DBD::Proxy on the web server side, and just hack the perl proxy
server so it routes the query to several places and returns a single result
set.
Also, I have to ask the question:
Why is it that Altavista can index terabytes overnight and return
a fulltext boolean for the WHOLE WEB
within a second, and Mysql takes so long?
On Friday 08 February 2002 11:50, Steve Rapaport wrote:
I second the question. It could also reduce the size
On Friday 08 February 2002 06:14 pm, James Montebello wrote:
Distribution is how Google gets its speed. You say clustering won't
solve the problem, but distributing the indicies across many processors
*is* going to gain you a huge speed increase through sheer parallelism.
True, but not
Last week on Slashdot there was an article where the CEO of Google mentioned he
uses DRAM (solid state disk arrays) rather than hard drives for the indexes and
arrays because of the magnitude of difference in speed they provide.
There's your 10^6 difference in speed (or part of it).
G.
Why is it that Altavista can index terabytes overnight and return
a fulltext boolean for the WHOLE WEB
within a second, and Mysql takes so long?
I don't know about Altavista, but if you read up on Google, they do indeed
do some sort of
spreading of keywords across multiple machines - last I
Yesterday, from Brian DeFeyter:
Has anyone made a suggestion or thought about ways to distribute
databases which focus on fulltext indexes?
fulltext indexes do a good job of indexing a moderate amount of data,
but when you get a lot of data to be indexed, the queries slow down
Ooops, factual error:
If, say, Google, can search 2 trillion web pages, averaging say 70k
bytes each, in 1 second, and Mysql can search 22 million records, with
an index on 40 bytes each, in 3 seconds (my experience) on a good day,
what's the order of magnitude difference? Roughly 10^9.
For the slice servers, you simply assume that if one is lost, you lose X%
of the data until it is revived, which is usually not even noticable by
the end user. For the aggregators, we had four behind a load-balancer.
In practice, we had nearly zero downtime over a roughly 18 month period.
Has anyone made a suggestion or thought about ways to distribute
databases which focus on fulltext indexes?
fulltext indexes do a good job of indexing a moderate amount of data,
but when you get a lot of data to be indexed, the queries slow down
significantly.
I have an example table, with
On Thursday 07 February 2002 14:53, Brian DeFeyter wrote:
Has anyone made a suggestion or thought about ways to distribute
databases which focus on fulltext indexes?
fulltext indexes do a good job of indexing a moderate amount of data,
but when you get a lot of data to be indexed, the
On Thu, 2002-02-07 at 15:40, Tod Harter wrote:
[snip]
Wouldn't be too tough to write a little query routing system if you are using
perl. Use DBD::Proxy on the web server side, and just hack the perl proxy
server so it routes the query to several places and returns a single result
set.
I second the question. It could also reduce the size of the
fulltext index and the time taken to update it.
-steve
On Thursday 07 February 2002 20:53, Brian wrote:
Has anyone made a suggestion or thought about ways to distribute
databases which focus on fulltext indexes?
fulltext
It seems to me like the best solution that could be implemented as-is
would be to keep a random int column in your table (with a range of say
1-100) and then have fulltext server 1 psudo-replicate records with a
the random number in the range of 1-10, server 2 11-20 and server 3
21-30 and so
I did this at a previous job, and we split the data up more or less
this way (we used a pre-existing item number for the split which was
essentially random in relation to the text data), with a aggregator that
did the query X ways, each to a separate box holding 1/X of the data.
The results from
How do you make something like this fault tolerant?
The answer is probably what I suspect, 2 of every thing.
How does the aggregator handle this or are these machines in a cluster?
We are thinking of how to rebuild our fulltext search. Currently it is
in MS SQL 7.0 - MySQL 4.0 seems to blow
Also, I have to ask the question:
Why is it that Altavista can index terabytes overnight and return
a fulltext boolean for the WHOLE WEB
within a second, and Mysql takes so long?
On Friday 08 February 2002 11:50, Steve Rapaport wrote:
I second the question. It could also reduce the size
43 matches
Mail list logo