Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-27 Thread Chad
On Fri, Nov 27, 2009 at 8:06 AM, Ryan Chan wrote: > On Fri, Nov 27, 2009 at 6:58 AM, Aryeh Gregor > wrote: >> On Thu, Nov 26, 2009 at 8:29 AM, Ryan Chan wrote: >>> so what are your suggestion? >>> >>> use Java? >> >> Python! >> > > In what aspect? > > Just another scripting language! > > ___

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-27 Thread Ryan Chan
On Fri, Nov 27, 2009 at 6:58 AM, Aryeh Gregor wrote: > On Thu, Nov 26, 2009 at 8:29 AM, Ryan Chan wrote: >> so what are your suggestion? >> >> use Java? > > Python! > In what aspect? Just another scripting language! ___ Wikitech-l mailing list Wikite

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-26 Thread Aryeh Gregor
On Thu, Nov 26, 2009 at 8:29 AM, Ryan Chan wrote: > so what are your suggestion? > > use Java? Python! ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-26 Thread Ryan Chan
On Thu, Nov 26, 2009 at 12:23 AM, Greg Sabino Mullane wrote: > Frankly, the choice of using PHP as the language for MediaWiki has > probably caused more problems over the years than the choice of database > backend. :) > so what are your suggestion? use Java? ___

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-25 Thread Domas Mituzas
Hi! > They do. Google ditched all existing database and built their own > system > to handle their main stock and trade. For some things, they use MySQL, > albeit a modified one. Their main stock of trade is selling ads, and even though nobody will ever admit what they are running in publicly

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-25 Thread Greg Sabino Mullane
> Top 10? Top 20? Because I did a quick count a while back and found > at least six of the top ten used MySQL in some capacity. And two of > the remaining four were owned by Microsoft. :) I'm no DBA, but > that suggests to me that MySQL is pretty suitable for large websites, > compared to th

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-25 Thread Aryeh Gregor
On Wed, Nov 25, 2009 at 9:55 AM, Greg Sabino Mullane wrote: >> P.P.S. Anyone running PG in production on a big website? > > Yep. Course, you might also want to define "big" Top 10? Top 20? Because I did a quick count a while back and found at least six of the top ten used MySQL in some capacity

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-25 Thread Anthony
On Wed, Nov 25, 2009 at 10:47 AM, Domas Mituzas wrote: > Hi! > >> Great >> , so we can close the thread and agree that we'll move everything >> over to Oracle in the near future. :) > > InnoDB has been part of Oracle since 2005... ;-) And bought MySQL this past April, right? Maybe they'll rename

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-25 Thread Domas Mituzas
Hi! > Great > , so we can close the thread and agree that we'll move everything > over to Oracle in the near future. :) InnoDB has been part of Oracle since 2005... ;-) Domas ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wik

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-25 Thread Christopher Sahnwaldt
On Wed, Nov 25, 2009 at 16:17, Greg Sabino Mullane wrote: > Great, so we can close the thread and agree that we'll move everything > over to Oracle in the near future. :) Oh yeah! And rewrite MediaWiki in PL/SQL while we're at it! Christopher ___ Wiki

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-25 Thread Greg Sabino Mullane
On 11/25/2009 10:13 AM, Domas Mituzas wrote: >> I'm probably done myself. I just wanted to point out a few factual >> errors in Mr. Mituzas' email. I'm not so interested in the grey-area >> "which is better, which is worse" debate. > > my emails are also just because someone said "any reason not

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-25 Thread Domas Mituzas
> I'm probably done myself. I just wanted to point out a few factual > errors in Mr. Mituzas' email. I'm not so interested in the grey-area > "which is better, which is worse" debate. my emails are also just because someone said "any reason not to... " and "it should provide better performance

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-25 Thread Anthony
On Wed, Nov 25, 2009 at 9:55 AM, Greg Sabino Mullane wrote: > I'll resist the urge to say too much more on this thread right now, and > go back to watching from the sidelines. I'm probably done myself. I just wanted to point out a few factual errors in Mr. Mituzas' email. I'm not so interested

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-25 Thread Greg Sabino Mullane
Domas points out: > Which would make sense if no other queries are being ran :) With PG > though you can define an index on smaller subset, may be better than > partitioning. Exactly - this is a perfect use case for partial indexes, not for partitioning. The MW Postgres schema is already using som

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-25 Thread Domas Mituzas
Hi!!! > It'd make sense if most of your queries used one partition or the > other, and not both. Kind of like Wikipedia's history/current tables, Are you talking about MediaWiki 1.4 schema, that we got rid off back in 2005? > Not in this case. You want to physically move the data so you can

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-25 Thread Anthony
On Wed, Nov 25, 2009 at 2:22 AM, Domas Mituzas wrote: > Hi! > >> Please read my comment over again: "I can't imagine this is a query >> you want to run over and over again.  If it is, you'd probably want to >> use partitioning." > > Which would make sense if no other queries are being ran :) It'd

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-24 Thread Domas Mituzas
Hi! > Please read my comment over again: "I can't imagine this is a query > you want to run over and over again. If it is, you'd probably want to > use partitioning." Which would make sense if no other queries are being ran :) With PG though you can define an index on smaller subset, may be be

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-24 Thread Anthony
On Wed, Nov 25, 2009 at 12:39 AM, Domas Mituzas wrote: > But who will partition based on is_redirect? If it is for one-off > task, you can just create two separate tables and do 'manual > partitioning' even in sqlite :) > Even though your is_redirect queries may become faster, you just added >

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-24 Thread Domas Mituzas
Hi, > That's just not at all true, not for PostgreSQL at least. Say you > have 100 million records in the page table, of which 20 million are > is_redirect=1 and 80 million are is_redirect=0. Say the average size > of a record is 100 bytes, so on average 80 records fit in one page. > The table is

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-24 Thread Anthony
On Tue, Nov 24, 2009 at 7:57 PM, Domas Mituzas wrote: > Ryan writes: >> Any reason I would like to ask is why not use PostgreSQL? > Any reason we should? > Actually the main problem with using utf8 is that most of language- > specific collations are case-insensitive, which would mean lots of > pa

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-24 Thread Anthony
On Tue, Nov 24, 2009 at 10:17 PM, Anthony wrote: > On Tue, Nov 24, 2009 at 7:57 PM, Domas Mituzas wrote: >> Antony writes: >>> If it is, you'd probably want to use partitioning >> >> Partitioning makes selects faster only when there's parallel execution >> on multiple partitions at once. > [snip]

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-24 Thread Anthony
On Tue, Nov 24, 2009 at 7:57 PM, Domas Mituzas wrote: > Antony writes: >> If it is, you'd probably want to use partitioning > > Partitioning makes selects faster only when there's parallel execution > on multiple partitions at once. That's just not at all true, not for PostgreSQL at least. Say y

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-24 Thread Domas Mituzas
Hi! I was traveling around a bit, missed some of threads entirely! Ryan writes: > Any reason I would like to ask is why not use PostgreSQL? Any reason we should? > Seems MySQL is not suitable for handling large table (e.g. over few > GB), I just wonder why wikipedia don't use PostgreSQL? Is PG

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-23 Thread Dmitriy Sintsov
* Ryan Chan [Mon, 23 Nov 2009 00:45:33 +0800]: > May I ask why still using the 4.0 version? Seems 5.1 above did provide > much performance enhancements? > There was a message at mysql.com site that google performance ehancements were incorporated into version 5.4. Dmitriy ___

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-22 Thread Jona Christopher Sahnwaldt
On Sun, Nov 22, 2009 at 17:57, Anthony wrote: > Hmm, okay.  Should I be using character set and collation "binary"? Yes. The main problem with using UTF-8 for the tables is that MySQL only supports Unicode characters U+ .. U+. Other characters are silently removed, which leads to problems

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-22 Thread Anthony
On Sun, Nov 22, 2009 at 12:02 PM, Aryeh Gregor wrote: > Wikipedia uses binary fields for everything.  It has to, since MySQL > 4.0 doesn't support anything else. Reading through https://bugzilla.wikimedia.org/show_bug.cgi?id=164 was enlightening. ___ W

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-22 Thread Aryeh Gregor
On Sun, Nov 22, 2009 at 11:57 AM, Anthony wrote: > Hmm, okay.  Should I be using character set and collation "binary"? > The dumps build the table using character set utf8, and don't say > anything about the collation.  Is this specific to edit summaries, or > does it apply to all the text fields?

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-22 Thread Anthony
On Sun, Nov 22, 2009 at 11:26 AM, Aryeh Gregor wrote: > Last I checked, edit summaries were just > passed to the database with no hard length check, so MySQL in > non-strict mode with a binary schema (like Wikipedia) will just > truncate them to fit.  (In strict mode it will raise an error, and if

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-22 Thread Aryeh Gregor
On Sun, Nov 22, 2009 at 11:45 AM, Ryan Chan wrote: > May I ask why still using the 4.0 version? Seems 5.1 above did provide > much performance enhancements? I'm not the one to ask. My understanding is that it's mostly a question of how much administrative resources it would take to switch.

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-22 Thread Ryan Chan
On Mon, Nov 23, 2009 at 12:26 AM, Aryeh Gregor wrote: > On Sat, Nov 21, 2009 at 10:40 PM, Ryan Chan wrote: >> Is the source available in the svn? Can you point me to the right direction? > > http://svn.wikimedia.org/viewvc/mysql/ > Thanks for the link. May I ask why still using the 4.0 version

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-22 Thread Aryeh Gregor
On Sat, Nov 21, 2009 at 9:48 PM, Jona Christopher Sahnwaldt wrote: > The one thing that is slow is builiding the indexes after > the data has been imported (eight hours or so). Maybe > we could omit some indexes that are not used in our > application, but I haven't really looked into that. MyISAM

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-21 Thread Ryan Chan
On Sun, Nov 22, 2009 at 7:39 AM, Aryeh Gregor wrote: > It's also worth pointing out that Wikipedia uses a version of MySQL > with substantial modifications, Is the source available in the svn? Can you point me to the right direction? Thanks. ___ Wiki

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-21 Thread Anthony
On Sat, Nov 21, 2009 at 8:45 PM, Aryeh Gregor wrote: > Yeah, pretty much. If you had an index on (page_is_redirect, > page_namespace, page_title) or such, that would speed it up > significantly (at least in MySQL). It would have to scan through the > whole index, but that only contains three col

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-21 Thread Jona Christopher Sahnwaldt
On Sun, Nov 22, 2009 at 02:45, Aryeh Gregor wrote: > Yeah, pretty much.  If you had an index on (page_is_redirect, > page_namespace, page_title) or such, that would speed it up > significantly (at least in MySQL).  It would have to scan through the > whole index, but that only contains three colum

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-21 Thread Aryeh Gregor
On Sat, Nov 21, 2009 at 7:02 PM, Anthony wrote: > You could build an index on page_is_redirect in the "page" table (see > enwiki-*-page.sql.gz).  But I'm pretty sure Postgresql wouldn't use > it, and would do a sequential scan, since pretty much all the pages > are going to have to be accessed any

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-21 Thread Anthony
On Sat, Nov 21, 2009 at 6:39 PM, Aryeh Gregor wrote: > Selecting a list of all titles that are not > redirects will take a long time on any database, unless you have > everything in memory, because it requires a table scan -- there's no > index that covers the relevant columns (IIRC). You could b

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-21 Thread Aryeh Gregor
On Fri, Nov 20, 2009 at 10:47 PM, Ryan Chan wrote: > Any reason I would like to ask is why not use PostgreSQL? > > Seems MySQL is not suitable for handling large table (e.g. over few > GB), I just wonder why wikipedia don't use PostgreSQL? > > It should provide better performance. MySQL is easily

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-20 Thread Q
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Ryan Chan wrote: > On Sat, Nov 21, 2009 at 12:05 PM, Q wrote: >> MediaWiki has used MySQL since the beginning and has let the code get >> away with things that shouldn't have been done which makes switching to >> Postgres hard. > > Isn't that media

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-20 Thread Ryan Chan
On Sat, Nov 21, 2009 at 12:05 PM, Q wrote: > > MediaWiki has used MySQL since the beginning and has let the code get > away with things that shouldn't have been done which makes switching to > Postgres hard. Isn't that mediawiki also support pgsql? http://www.mediawiki.org/wiki/Installation ___

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-20 Thread Q
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Ryan Chan wrote: > On Sat, Nov 21, 2009 at 12:38 AM, Jona Christopher Sahnwaldt > wrote: >> With the indexes defined in tables.sql, query performance >> is ok. For example, selecting the titles of all articles that >> are not redirects takes five or

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-20 Thread Ryan Chan
On Sat, Nov 21, 2009 at 12:38 AM, Jona Christopher Sahnwaldt wrote: > With the indexes defined in tables.sql, query performance > is ok. For example, selecting the titles of all articles that > are not redirects takes five or ten minutes (didn't profile it > exactly). > Any reason I would like to

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-20 Thread Jona Christopher Sahnwaldt
Hi Ryan, pages-meta-history hasn't been generated for enwiki in a while (it's gotten too big), so I can't tell you anything about it. We're importing pages-articles.xml (currently about 20 GB, 5 GB as bzip2) using mwdumper. We're using MyISAM, not InnoDB. The import takes about 8 hours, most of it

[Wikitech-l] Size of DB/table of enwiki after import into MySQL

2009-11-20 Thread Ryan Chan
Hello, Anyone has experience in importing enwiki database dump at http://download.wikimedia.org/backup-index.html into a real MySQL server? 1. It seems pages-meta-history has the max. size in term of download, how much storage space does it take when imported into a table? (including index) 2. Wh