On Fri, Nov 27, 2009 at 8:06 AM, Ryan Chan wrote:
> On Fri, Nov 27, 2009 at 6:58 AM, Aryeh Gregor
> wrote:
>> On Thu, Nov 26, 2009 at 8:29 AM, Ryan Chan wrote:
>>> so what are your suggestion?
>>>
>>> use Java?
>>
>> Python!
>>
>
> In what aspect?
>
> Just another scripting language!
>
> ___
On Fri, Nov 27, 2009 at 6:58 AM, Aryeh Gregor
wrote:
> On Thu, Nov 26, 2009 at 8:29 AM, Ryan Chan wrote:
>> so what are your suggestion?
>>
>> use Java?
>
> Python!
>
In what aspect?
Just another scripting language!
___
Wikitech-l mailing list
Wikite
On Thu, Nov 26, 2009 at 8:29 AM, Ryan Chan wrote:
> so what are your suggestion?
>
> use Java?
Python!
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Thu, Nov 26, 2009 at 12:23 AM, Greg Sabino Mullane wrote:
> Frankly, the choice of using PHP as the language for MediaWiki has
> probably caused more problems over the years than the choice of database
> backend. :)
>
so what are your suggestion?
use Java?
___
Hi!
> They do. Google ditched all existing database and built their own
> system
> to handle their main stock and trade. For some things, they use MySQL,
> albeit a modified one.
Their main stock of trade is selling ads, and even though nobody will
ever admit what they are running in publicly
> Top 10? Top 20? Because I did a quick count a while back and found
> at least six of the top ten used MySQL in some capacity. And two of
> the remaining four were owned by Microsoft. :) I'm no DBA, but
> that suggests to me that MySQL is pretty suitable for large websites,
> compared to th
On Wed, Nov 25, 2009 at 9:55 AM, Greg Sabino Mullane wrote:
>> P.P.S. Anyone running PG in production on a big website?
>
> Yep. Course, you might also want to define "big"
Top 10? Top 20? Because I did a quick count a while back and found
at least six of the top ten used MySQL in some capacity
On Wed, Nov 25, 2009 at 10:47 AM, Domas Mituzas wrote:
> Hi!
>
>> Great
>> , so we can close the thread and agree that we'll move everything
>> over to Oracle in the near future. :)
>
> InnoDB has been part of Oracle since 2005... ;-)
And bought MySQL this past April, right?
Maybe they'll rename
Hi!
> Great
> , so we can close the thread and agree that we'll move everything
> over to Oracle in the near future. :)
InnoDB has been part of Oracle since 2005... ;-)
Domas
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wik
On Wed, Nov 25, 2009 at 16:17, Greg Sabino Mullane wrote:
> Great, so we can close the thread and agree that we'll move everything
> over to Oracle in the near future. :)
Oh yeah! And rewrite MediaWiki in PL/SQL while we're at it!
Christopher
___
Wiki
On 11/25/2009 10:13 AM, Domas Mituzas wrote:
>> I'm probably done myself. I just wanted to point out a few factual
>> errors in Mr. Mituzas' email. I'm not so interested in the grey-area
>> "which is better, which is worse" debate.
>
> my emails are also just because someone said "any reason not
> I'm probably done myself. I just wanted to point out a few factual
> errors in Mr. Mituzas' email. I'm not so interested in the grey-area
> "which is better, which is worse" debate.
my emails are also just because someone said "any reason not to... "
and "it should provide better performance
On Wed, Nov 25, 2009 at 9:55 AM, Greg Sabino Mullane wrote:
> I'll resist the urge to say too much more on this thread right now, and
> go back to watching from the sidelines.
I'm probably done myself. I just wanted to point out a few factual
errors in Mr. Mituzas' email. I'm not so interested
Domas points out:
> Which would make sense if no other queries are being ran :) With PG
> though you can define an index on smaller subset, may be better than
> partitioning.
Exactly - this is a perfect use case for partial indexes, not for
partitioning. The MW Postgres schema is already using som
Hi!!!
> It'd make sense if most of your queries used one partition or the
> other, and not both. Kind of like Wikipedia's history/current tables,
Are you talking about MediaWiki 1.4 schema, that we got rid off back
in 2005?
> Not in this case. You want to physically move the data so you can
On Wed, Nov 25, 2009 at 2:22 AM, Domas Mituzas wrote:
> Hi!
>
>> Please read my comment over again: "I can't imagine this is a query
>> you want to run over and over again. If it is, you'd probably want to
>> use partitioning."
>
> Which would make sense if no other queries are being ran :)
It'd
Hi!
> Please read my comment over again: "I can't imagine this is a query
> you want to run over and over again. If it is, you'd probably want to
> use partitioning."
Which would make sense if no other queries are being ran :)
With PG though you can define an index on smaller subset, may be
be
On Wed, Nov 25, 2009 at 12:39 AM, Domas Mituzas wrote:
> But who will partition based on is_redirect? If it is for one-off
> task, you can just create two separate tables and do 'manual
> partitioning' even in sqlite :)
> Even though your is_redirect queries may become faster, you just added
>
Hi,
> That's just not at all true, not for PostgreSQL at least. Say you
> have 100 million records in the page table, of which 20 million are
> is_redirect=1 and 80 million are is_redirect=0. Say the average size
> of a record is 100 bytes, so on average 80 records fit in one page.
> The table is
On Tue, Nov 24, 2009 at 7:57 PM, Domas Mituzas wrote:
> Ryan writes:
>> Any reason I would like to ask is why not use PostgreSQL?
> Any reason we should?
> Actually the main problem with using utf8 is that most of language-
> specific collations are case-insensitive, which would mean lots of
> pa
On Tue, Nov 24, 2009 at 10:17 PM, Anthony wrote:
> On Tue, Nov 24, 2009 at 7:57 PM, Domas Mituzas wrote:
>> Antony writes:
>>> If it is, you'd probably want to use partitioning
>>
>> Partitioning makes selects faster only when there's parallel execution
>> on multiple partitions at once.
> [snip]
On Tue, Nov 24, 2009 at 7:57 PM, Domas Mituzas wrote:
> Antony writes:
>> If it is, you'd probably want to use partitioning
>
> Partitioning makes selects faster only when there's parallel execution
> on multiple partitions at once.
That's just not at all true, not for PostgreSQL at least. Say y
Hi!
I was traveling around a bit, missed some of threads entirely!
Ryan writes:
> Any reason I would like to ask is why not use PostgreSQL?
Any reason we should?
> Seems MySQL is not suitable for handling large table (e.g. over few
> GB), I just wonder why wikipedia don't use PostgreSQL?
Is PG
* Ryan Chan [Mon, 23 Nov 2009 00:45:33 +0800]:
> May I ask why still using the 4.0 version? Seems 5.1 above did provide
> much performance enhancements?
>
There was a message at mysql.com site that google performance
ehancements were incorporated into version 5.4.
Dmitriy
___
On Sun, Nov 22, 2009 at 17:57, Anthony wrote:
> Hmm, okay. Should I be using character set and collation "binary"?
Yes. The main problem with using UTF-8 for the tables is that
MySQL only supports Unicode characters U+ .. U+.
Other characters are silently removed, which leads to problems
On Sun, Nov 22, 2009 at 12:02 PM, Aryeh Gregor
wrote:
> Wikipedia uses binary fields for everything. It has to, since MySQL
> 4.0 doesn't support anything else.
Reading through https://bugzilla.wikimedia.org/show_bug.cgi?id=164 was
enlightening.
___
W
On Sun, Nov 22, 2009 at 11:57 AM, Anthony wrote:
> Hmm, okay. Should I be using character set and collation "binary"?
> The dumps build the table using character set utf8, and don't say
> anything about the collation. Is this specific to edit summaries, or
> does it apply to all the text fields?
On Sun, Nov 22, 2009 at 11:26 AM, Aryeh Gregor
wrote:
> Last I checked, edit summaries were just
> passed to the database with no hard length check, so MySQL in
> non-strict mode with a binary schema (like Wikipedia) will just
> truncate them to fit. (In strict mode it will raise an error, and if
On Sun, Nov 22, 2009 at 11:45 AM, Ryan Chan wrote:
> May I ask why still using the 4.0 version? Seems 5.1 above did provide
> much performance enhancements?
I'm not the one to ask. My understanding is that it's mostly a
question of how much administrative resources it would take to switch.
On Mon, Nov 23, 2009 at 12:26 AM, Aryeh Gregor
wrote:
> On Sat, Nov 21, 2009 at 10:40 PM, Ryan Chan wrote:
>> Is the source available in the svn? Can you point me to the right direction?
>
> http://svn.wikimedia.org/viewvc/mysql/
>
Thanks for the link.
May I ask why still using the 4.0 version
On Sat, Nov 21, 2009 at 9:48 PM, Jona Christopher Sahnwaldt
wrote:
> The one thing that is slow is builiding the indexes after
> the data has been imported (eight hours or so). Maybe
> we could omit some indexes that are not used in our
> application, but I haven't really looked into that.
MyISAM
On Sun, Nov 22, 2009 at 7:39 AM, Aryeh Gregor
wrote:
> It's also worth pointing out that Wikipedia uses a version of MySQL
> with substantial modifications,
Is the source available in the svn? Can you point me to the right direction?
Thanks.
___
Wiki
On Sat, Nov 21, 2009 at 8:45 PM, Aryeh Gregor
wrote:
> Yeah, pretty much. If you had an index on (page_is_redirect,
> page_namespace, page_title) or such, that would speed it up
> significantly (at least in MySQL). It would have to scan through the
> whole index, but that only contains three col
On Sun, Nov 22, 2009 at 02:45, Aryeh Gregor
wrote:
> Yeah, pretty much. If you had an index on (page_is_redirect,
> page_namespace, page_title) or such, that would speed it up
> significantly (at least in MySQL). It would have to scan through the
> whole index, but that only contains three colum
On Sat, Nov 21, 2009 at 7:02 PM, Anthony wrote:
> You could build an index on page_is_redirect in the "page" table (see
> enwiki-*-page.sql.gz). But I'm pretty sure Postgresql wouldn't use
> it, and would do a sequential scan, since pretty much all the pages
> are going to have to be accessed any
On Sat, Nov 21, 2009 at 6:39 PM, Aryeh Gregor
wrote:
> Selecting a list of all titles that are not
> redirects will take a long time on any database, unless you have
> everything in memory, because it requires a table scan -- there's no
> index that covers the relevant columns (IIRC).
You could b
On Fri, Nov 20, 2009 at 10:47 PM, Ryan Chan wrote:
> Any reason I would like to ask is why not use PostgreSQL?
>
> Seems MySQL is not suitable for handling large table (e.g. over few
> GB), I just wonder why wikipedia don't use PostgreSQL?
>
> It should provide better performance.
MySQL is easily
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
Ryan Chan wrote:
> On Sat, Nov 21, 2009 at 12:05 PM, Q wrote:
>> MediaWiki has used MySQL since the beginning and has let the code get
>> away with things that shouldn't have been done which makes switching to
>> Postgres hard.
>
> Isn't that media
On Sat, Nov 21, 2009 at 12:05 PM, Q wrote:
>
> MediaWiki has used MySQL since the beginning and has let the code get
> away with things that shouldn't have been done which makes switching to
> Postgres hard.
Isn't that mediawiki also support pgsql?
http://www.mediawiki.org/wiki/Installation
___
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
Ryan Chan wrote:
> On Sat, Nov 21, 2009 at 12:38 AM, Jona Christopher Sahnwaldt
> wrote:
>> With the indexes defined in tables.sql, query performance
>> is ok. For example, selecting the titles of all articles that
>> are not redirects takes five or
On Sat, Nov 21, 2009 at 12:38 AM, Jona Christopher Sahnwaldt
wrote:
> With the indexes defined in tables.sql, query performance
> is ok. For example, selecting the titles of all articles that
> are not redirects takes five or ten minutes (didn't profile it
> exactly).
>
Any reason I would like to
Hi Ryan,
pages-meta-history hasn't been generated for enwiki in a while
(it's gotten too big), so I can't tell you anything about it. We're
importing pages-articles.xml (currently about 20 GB, 5 GB as
bzip2) using mwdumper. We're using MyISAM, not InnoDB.
The import takes about 8 hours, most of it
Hello,
Anyone has experience in importing enwiki database dump at
http://download.wikimedia.org/backup-index.html into a real MySQL
server?
1. It seems pages-meta-history has the max. size in term of download,
how much storage space does it take when imported into a table?
(including index)
2. Wh
43 matches
Mail list logo