Re: [Wikitech-l] Using MySQL as a NoSQL

2010-12-24 Thread Domas Mituzas
Hi!

 I have recently encountered this text in which the author claims very 
 high MySQL speedups for simple queries

It is not that he speeds up simple queries (you'd notice that maybe if you used 
infiniband, and even then it wouldn't matter much :)
He just avoided hitting some expensive critical sections that make scaling on 
multicore systems problematic. 

 It looks interesting. There are some places where mediawiki could take
 that shortcut if available.

It wouldn't be a shortcut if you had to establish another database connection 
besides existing one. 

 I wonder if we have such CPU bottleneck, though.

No, not really. Our average (do note, this isn't median and is affected by 
heavy queries more) DB response time is 1.3ms (measuring on the client). 

Domas
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Using MySQL as a NoSQL

2010-12-24 Thread Domas Mituzas
Hi!

A:
 It's easy to get fast results if you don't care about your reads being
 atomic (*), and I find it hard to believe they've managed to get
 atomic reads without going through MySQL.

MySQL upper layers know nothing much about transactions, it is all 
engine-specific - BEGIN and COMMIT processing is deferred to table handlers.  
It would incredibly easy for them to implement repeatable read snapshots :) (if 
thats what you mean by atomic read)

 (*) Among other possibilities, just use MyISAM.

How is that applicable to any discussion? 

Domas
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Using MySQL as a NoSQL

2010-12-24 Thread Nikola Smolenski
On 12/24/2010 10:01 AM, Domas Mituzas wrote:
 I wonder if we have such CPU bottleneck, though.

 No, not really. Our average (do note, this isn't median and is affected by 
 heavy queries more) DB response time is 1.3ms (measuring on the client).

This could also reduce memory usage by not using memcached (as often) 
which, I understand, is a bigger problem.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Using MySQL as a NoSQL

2010-12-24 Thread Domas Mituzas
Hi!

 This could also reduce memory usage by not using memcached (as often) 
 which, I understand, is a bigger problem.

No it is not. 

First of all, our memcached and database access times not that far  away - 0.7 
vs 1.3 ms (again, memcached is static response time, whereas database average 
is impacted by calculations). 
On another hand, we don't store in memcached what is stored in database and we 
don't store in database what is stored in memcached.

Think about these as two separate systems, not as complementing each other too 
much.
We use memcached to offload application cluster, not database cluster. 

And database cluster already has over a terabyte of RAM (replicas and whatnot), 
whereas our memcached lives in puny 158GB arena. 

I described some of fundamental differences of how we use memcached in 
http://dom.as/uc/workbook2007.pdf - pages 11-13. Nothing much changed since 
then. 

Domas
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Using MySQL as a NoSQL

2010-12-24 Thread Jared Williams

Hi,  

 -Original Message-
 From: wikitech-l-boun...@lists.wikimedia.org 
 [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of 
 Domas Mituzas
 Sent: 24 December 2010 09:09
 To: Wikimedia developers
 Subject: Re: [Wikitech-l] Using MySQL as a NoSQL
 
 Hi!
 
 A:
  It's easy to get fast results if you don't care about your 
 reads being 
  atomic (*), and I find it hard to believe they've managed to get 
  atomic reads without going through MySQL.
 
 MySQL upper layers know nothing much about transactions, it 
 is all engine-specific - BEGIN and COMMIT processing is 
 deferred to table handlers.  
 It would incredibly easy for them to implement repeatable 
 read snapshots :) (if thats what you mean by atomic read)
 

It seems from my tinkering that MySQL query cache handling is
circumvented via HandlerSocket.
So if you update/insert/delete via HandlerSocket, then query via SQL
your not guarenteed to see the changes unless you use SQL_NO_CACHE.


  (*) Among other possibilities, just use MyISAM.
 
 How is that applicable to any discussion? 
 
 Domas
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Jared


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Using MySQL as a NoSQL

2010-12-24 Thread Domas Mituzas
Hi!

 It seems from my tinkering that MySQL query cache handling is
 circumvented via HandlerSocket.

On busy systems (I assume we talk about busy systems, as discussion is about 
HS) query cache is usually eliminated anyway. 
Either by compiling it out, or by patching the code not to use qcache mutexes 
unless it really really is enabled. In worst case, it is just simply disabled. 
:) 

 So if you update/insert/delete via HandlerSocket, then query via SQL
 your not guarenteed to see the changes unless you use SQL_NO_CACHE.

You are probably right. Again, nobody cares about qcache at those performance 
boundaries. 

Domas
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Using MySQL as a NoSQL

2010-12-24 Thread Anthony
On Fri, Dec 24, 2010 at 4:08 AM, Domas Mituzas midom.li...@gmail.com wrote:
 Hi!

 A:
 It's easy to get fast results if you don't care about your reads being
 atomic (*), and I find it hard to believe they've managed to get
 atomic reads without going through MySQL.

 MySQL upper layers know nothing much about transactions, it is all 
 engine-specific - BEGIN and COMMIT processing is deferred to table handlers.
 It would incredibly easy for them to implement repeatable read snapshots :) 
 (if thats what you mean by atomic read)

I suppose it's possible in theory, but in any case, it's not what
they're doing.  They *are* going through MySQL, via the HandlerSocket
plugin.

I wonder if they'd get much different performance by just using
prepared statements and read committed isolation, with the
transactions spanning multiple requests.  The tables would only get
locked once per transaction, right?

Or do I just have no idea what I'm talking about?

 (*) Among other possibilities, just use MyISAM.

 How is that applicable to any discussion?

It was an example of a way to get fast results if you don't care about
your reads being atomic.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Using MySQL as a NoSQL

2010-12-24 Thread Platonides
Domas Mituzas wrote:
 It looks interesting. There are some places where mediawiki could take
 that shortcut if available.
 
 It wouldn't be a shortcut if you had to establish another database connection 
 besides existing one. 

I was assuming usage of pfsockopen(), of course.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Using MySQL as a NoSQL

2010-12-24 Thread Jared Williams
 

 -Original Message-
 From: wikitech-l-boun...@lists.wikimedia.org 
 [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of 
 Domas Mituzas
 Sent: 24 December 2010 13:42
 To: Wikimedia developers
 Subject: Re: [Wikitech-l] Using MySQL as a NoSQL
 
 Hi!
 
  It seems from my tinkering that MySQL query cache handling is 
  circumvented via HandlerSocket.
 
 On busy systems (I assume we talk about busy systems, as 
 discussion is about HS) query cache is usually eliminated anyway. 
 Either by compiling it out, or by patching the code not to 
 use qcache mutexes unless it really really is enabled. In 
 worst case, it is just simply disabled. :) 
 
  So if you update/insert/delete via HandlerSocket, then 
 query via SQL 
  your not guarenteed to see the changes unless you use
SQL_NO_CACHE.
 
 You are probably right. Again, nobody cares about qcache at 
 those performance boundaries. 
 
 Domas

Ah, interesting. The only reason I took at it was because you don't
have to pfaff with encoding/escaping values* the way you have to SQL.
SQL injection vulnerabilities don't exist. 

* And the protocol handles binary values which normally have to pfaff
about getting in and out of MySQL with the various PHP apis.

Does seem a bit specialised, could have a persistent cache, maybe as a
session handler.

Jared


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Using MySQL as a NoSQL

2010-12-24 Thread Jared Williams
 

 -Original Message-
 From: wikitech-l-boun...@lists.wikimedia.org 
 [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of 
 Jared Williams
 Sent: 24 December 2010 16:18
 To: 'Wikimedia developers'
 Subject: Re: [Wikitech-l] Using MySQL as a NoSQL
 
  
 
  -Original Message-
  From: wikitech-l-boun...@lists.wikimedia.org
  [mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Domas

  Mituzas
  Sent: 24 December 2010 13:42
  To: Wikimedia developers
  Subject: Re: [Wikitech-l] Using MySQL as a NoSQL
  
  Hi!
  
   It seems from my tinkering that MySQL query cache handling is 
   circumvented via HandlerSocket.
  
  On busy systems (I assume we talk about busy systems, as 
 discussion is 
  about HS) query cache is usually eliminated anyway.
  Either by compiling it out, or by patching the code not to 
 use qcache 
  mutexes unless it really really is enabled. In worst case, 
 it is just 
  simply disabled. :)
  
   So if you update/insert/delete via HandlerSocket, then
  query via SQL
   your not guarenteed to see the changes unless you use
 SQL_NO_CACHE.
  
  You are probably right. Again, nobody cares about qcache at those 
  performance boundaries.
  
  Domas
 
 Ah, interesting. The only reason I took at it was because you 
 don't have to pfaff with encoding/escaping values* the way 
 you have to SQL.
 SQL injection vulnerabilities don't exist. 
 
 * And the protocol handles binary values which normally have 
 to pfaff about getting in and out of MySQL with the various PHP
apis.
 
 Does seem a bit specialised, could have a persistent cache, 
 maybe as a session handler.

Maybe a session handler even.

 
 Jared
 
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Using MySQL as a NoSQL

2010-12-24 Thread Domas Mituzas
Hi!

 I was assuming usage of pfsockopen(), of course.

Though protocol is slightly cheaper, you still have to do TCP handshake :)

Domas

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Alternative to opendir() functions?

2010-12-24 Thread Soxred93
In the HISTORY file:

* glob() is horribly unreliable and doesn't work on some systems, including 
free.fr shared hosting. No longer using it in Language::getLanguageNames() 

-X!

On Dec 24, 2010, at 12:24 PM, Brion Vibber wrote:

 Glob works too I think.
 
 -- brion
 On Dec 23, 2010 12:06 PM, Ilmari Karonen nos...@vyznev.net wrote:
 On 12/22/2010 12:16 AM, Platonides wrote:
 
 We are only using opendir for getting a full directory list.
 
 That's a good point. Perhaps what we need is simply a utility method to
 list all files in a directory.
 
 In fact, I just realized that PHP already has one. It's called
 scandir(). Its only flaw IMO is that it doesn't automatically skip the
 current and parent dir entries, but you could always do something like
 
 $files = array_diff( scandir( $dir ), array( '.', '..' ) );
 
 to accomplish that cleanly (or use preg_grep() to remove all dotfiles if
 you prefer).
 
 --
 Ilmari Karonen
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Xmldatadumps-l] dataset1, xml dumps

2010-12-24 Thread Ariel T. Glenn
The new host Dataset2 is now up and running and serving XML dumps. Those
of you paying attention to DNS entries should see the change within the
hour.  We are not generating new dumps yet but expect to do so soon.  

Ariel



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Xmldatadumps-l] dataset1, xml dumps

2010-12-24 Thread Jamie Morken

Hi,

That is great news, that you for all the hard work you have done on this and 
most of all Seasons Greetings, Merry Christmas, and Happy New Year! :)

best regards,
Jamie


- Original Message -
From: Ariel T. Glenn ar...@wikimedia.org
Date: Friday, December 24, 2010 10:42 am
Subject: Re: [Xmldatadumps-l] [Wikitech-l]   dataset1, xml dumps
To: Wikimedia developers wikitech-l@lists.wikimedia.org
Cc: xmldatadump...@lists.wikimedia.org

 The new host Dataset2 is now up and running and serving XML 
 dumps. Those
 of you paying attention to DNS entries should see the change 
 within the
 hour.  We are not generating new dumps yet but expect to do 
 so soon.  
 
 Ariel
 
 
 
 ___
 Xmldatadumps-l mailing list
 xmldatadump...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
 
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l