Re: [Wikitech-l] Speed up tests, make @group Database smarter

2012-07-03 Thread Christian Aistleitner
Hi Platonides,

On Tue, Jul 03, 2012 at 12:40:35AM +0200, Platonides wrote:
 On 01/07/12 13:49, Christian Aistleitner wrote:
  But as I said, one would mock /above/ the SQL layer.

 I was thinking on it. Some operations are easy, insert a field, select
 a field, you can perform the joins in php.

Yes.

 But you also need table schema knowledge.

Yes, obviously.

But this sounds more severe than it actually is. And regardless of
whether we use a mock or a real database, this coupling of schema and
tests is pretty much inevitable, once we decide to stop labelling our
“PHP-wide database integration tests” as “unit tests” and start to
additionally implement real unit tests ;-)

Note that changes in the database schema are not much of an issue
anyways. In 2012, there have been no major schema changes and only two
minimal ones [1]. The situation is alike when looking further back in
time.

Compare it to changing the method names on dependent objects. You'd
also have to reflect such changes in tests.



However, the main remaining hurdle is somewhere else: It's the
somewhat tight coupling of business objects. So when mocking the
database, it may be worthwhile to decouple objects along the way
bottom up to keep the effort manageable.


Kind regards,
Christian


[1] Adding job_timestamp column, and adding ipb_parent_block_id column.


-- 
 quelltextlich e.U.  \\  Christian Aistleitner 
   Companies' registry: 360296y in Linz
Christian Aistleitner
Gruendbergstrasze 65aEmail:  christ...@quelltextlich.at
4040 Linz, Austria   Phone:  +43 732 / 26 95 63
 Fax:+43 732 / 26 95 63
 Homepage: http://quelltextlich.at/
---


signature.asc
Description: Digital signature
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Speed up tests, make @group Database smarter

2012-07-02 Thread Chris McMahon
Christian, thanks for the information about DatabaseBase for mocking, that
makes sense.

I don't know PHP at all, but I know something about how to do automated
tests.  Besides manipulating tables directly, I've seen a couple of other
things in the unit tests that struck me as strange also:

* at least one test relies on a hard-coded path to a file on disk
* there is a global timeout variable used by a number of tests settable (I
think to 1s, 10s, 60s). (As opposed to for example, polling for a
particular state from within the test itself)

On Friday Erik sent a msg. to Engineering Alternatives to the 20%
approach where he discussed a number of aspects of code review.  My
impression is that production code gets a lot of scrutiny, but the tests do
not.  I wonder if we reviewed the unit tests for good practice as well if
that would eventually reduce the chore and burden required by the
current situation with code review.

-Chris

PS It would be great to see some of this information on
http://www.mediawiki.org/wiki/Manual:PHP_unit_testing which is pretty spare
right now.

On Sun, Jul 1, 2012 at 5:49 AM, Christian Aistleitner 
christ...@quelltextlich.at wrote:

 Hi Platonides,

 On Sat, Jun 30, 2012 at 03:45:14PM +0200, Platonides wrote:
  On 30/06/12 14:24, Christian Aistleitner wrote:
   [ Mocking the database ]
  
   [...]
   One would have to abstract database access above the SQL
   layer (separate methods for select, insert, ...) [...]
 
  You still need to implement some complex SQL logic.

 One might think so, yes.
 But as I said, one would mock /above/ the SQL layer.
 For typical database operations, SQL would not even get generated in
 the first place!

 Consider for example code containing
   $db-insert( $param1, $param2, ... );

 The mock db's insert function would compare $param1, $param2,
 ... against the invocations the test setup injected. If there is no
 match, the test fails. If there is a match, the mock returns the
 corresponding return value right away.
 No generating SQL.
 No call to $db-tableName.
 No call to $db-makeList.
 No call to $db-query.
 No nothing. \o/

 But maybe you hinted at DatabaseBase::query?
 DatabaseBase::query should not be used directly, and it's hardly
 is. We can go for straight for parameter comparison there as well. No
 need to parse the SQL.


 Unit testing is about decoupling and testing things in isolation. With
 DatabaseBase and the corresponding factories, MediaWiki has a layer
 that naturally decouples business logic from direct database access.

 Use the decoupling, Luke!

 Christian


 P.S.: As an example for decoupling and mocking in MW, consider
 tests/phpunit/maintenance/backupTextPassTest.php:testPrefetchPlain. This
 test is about dumping a wiki's database using prefetch.

 The idea behind prefetch is to use an old dump and use texts from this
 old dump instead of asking the database for every single text of the
 new dump.

 To test dumping using prefetch /without/ mocking, one would have to
 setup an XML for the old dump. This old dump's XML would get read,
 parsed, interpreted, ... upon each single test invocation. Tedious and
 time consuming. Upon each update of the XML format, we'd also have to
 update also the XML representation of the old XML dump. Yikes! [1]
 Besides, it duplicates efforts, as reading dumps, interpreting them is
 a separate issue and dealt with in isolation already in
 tests/phpunit/maintenance/backupPrefetchTest.php

 So the handling of the old dump reading, ... has been mocked out. All
 that's necessary for this is lines 143--153 and line 160 of
 tests/phpunit/maintenance/backupTextPassTest.php:testPrefetchPlain

 $prefetchMock is the mock for the prefetch (i.e.: old
 dump). $prefetchMap models the expected parameters and return values
 of the mocked method. So for example invoking
   $prefetchMock-prefetch( $this-pageId1, $this-revId1_1 )
 yields
   Prefetch_1Text1
 .


 [1] Yes, we had that situation recently, when the parentid tag got
 introduced [2]. The XML dumps of
 tests/phpunit/maintenance/backupTextPassTest.php
 tests/phpunit/maintenance/backupPrefetchTest.php were updated. So the
 tests assert that both dumping, and prefetch works with parentid. But
 we did not have to touch the mock due to this decoupling.

 [2] See commit d04b8ceea67660485245beaa4aca1625cf2170aa
 https://gerrit.wikimedia.org/r/#/c/10743/


 --
  quelltextlich e.U.  \\  Christian Aistleitner 
Companies' registry: 360296y in Linz
 Christian Aistleitner
 Gruendbergstrasze 65aEmail:  christ...@quelltextlich.at
 4040 Linz, Austria   Phone:  +43 732 / 26 95 63
  Fax:+43 732 / 26 95 63
  Homepage: http://quelltextlich.at/
 ---

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 

Re: [Wikitech-l] Speed up tests, make @group Database smarter

2012-07-02 Thread Platonides
On 01/07/12 13:49, Christian Aistleitner wrote:
 One might think so, yes.
 But as I said, one would mock /above/ the SQL layer.
 For typical database operations, SQL would not even get generated in
 the first place!
 
 Consider for example code containing
   $db-insert( $param1, $param2, ... );
 
 The mock db's insert function would compare $param1, $param2,
 ... against the invocations the test setup injected. If there is no
 match, the test fails. If there is a match, the mock returns the
 corresponding return value right away.
 No generating SQL.
 No call to $db-tableName.
 No call to $db-makeList.
 No call to $db-query.
 No nothing. \o/
 
 But maybe you hinted at DatabaseBase::query?
 DatabaseBase::query should not be used directly, and it's hardly
 is. We can go for straight for parameter comparison there as well. No
 need to parse the SQL.
 
 
 Unit testing is about decoupling and testing things in isolation. With
 DatabaseBase and the corresponding factories, MediaWiki has a layer
 that naturally decouples business logic from direct database access.
 
 Use the decoupling, Luke!
 
 Christian

I was thinking on it. Some operations are easy, insert a field, select
a field, you can perform the joins in php.
But you also need table schema knowledge. select('*', 'user'), insert()
which is generating a new id, an insert which is violating uniqueness of
a primary key, transactions...
It can obviously be done, but beware of the corner cases! As nifty as it
would be, it may need quite more effort than expected.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Speed up tests, make @group Database smarter

2012-07-01 Thread Christian Aistleitner
Hi Platonides,

On Sat, Jun 30, 2012 at 03:45:14PM +0200, Platonides wrote:
 On 30/06/12 14:24, Christian Aistleitner wrote:
  [ Mocking the database ]
  
  [...]
  One would have to abstract database access above the SQL
  layer (separate methods for select, insert, ...) [...]
 
 You still need to implement some complex SQL logic.

One might think so, yes.
But as I said, one would mock /above/ the SQL layer.
For typical database operations, SQL would not even get generated in
the first place!

Consider for example code containing
  $db-insert( $param1, $param2, ... );

The mock db's insert function would compare $param1, $param2,
... against the invocations the test setup injected. If there is no
match, the test fails. If there is a match, the mock returns the
corresponding return value right away.
No generating SQL.
No call to $db-tableName.
No call to $db-makeList.
No call to $db-query.
No nothing. \o/

But maybe you hinted at DatabaseBase::query?
DatabaseBase::query should not be used directly, and it's hardly
is. We can go for straight for parameter comparison there as well. No
need to parse the SQL.


Unit testing is about decoupling and testing things in isolation. With
DatabaseBase and the corresponding factories, MediaWiki has a layer
that naturally decouples business logic from direct database access.

Use the decoupling, Luke!

Christian


P.S.: As an example for decoupling and mocking in MW, consider
tests/phpunit/maintenance/backupTextPassTest.php:testPrefetchPlain. This
test is about dumping a wiki's database using prefetch.

The idea behind prefetch is to use an old dump and use texts from this
old dump instead of asking the database for every single text of the
new dump.

To test dumping using prefetch /without/ mocking, one would have to
setup an XML for the old dump. This old dump's XML would get read,
parsed, interpreted, ... upon each single test invocation. Tedious and
time consuming. Upon each update of the XML format, we'd also have to
update also the XML representation of the old XML dump. Yikes! [1]
Besides, it duplicates efforts, as reading dumps, interpreting them is
a separate issue and dealt with in isolation already in
tests/phpunit/maintenance/backupPrefetchTest.php

So the handling of the old dump reading, ... has been mocked out. All
that's necessary for this is lines 143--153 and line 160 of
tests/phpunit/maintenance/backupTextPassTest.php:testPrefetchPlain

$prefetchMock is the mock for the prefetch (i.e.: old
dump). $prefetchMap models the expected parameters and return values
of the mocked method. So for example invoking
  $prefetchMock-prefetch( $this-pageId1, $this-revId1_1 )
yields
  Prefetch_1Text1
.


[1] Yes, we had that situation recently, when the parentid tag got
introduced [2]. The XML dumps of
tests/phpunit/maintenance/backupTextPassTest.php
tests/phpunit/maintenance/backupPrefetchTest.php were updated. So the
tests assert that both dumping, and prefetch works with parentid. But
we did not have to touch the mock due to this decoupling.

[2] See commit d04b8ceea67660485245beaa4aca1625cf2170aa
https://gerrit.wikimedia.org/r/#/c/10743/


-- 
 quelltextlich e.U.  \\  Christian Aistleitner 
   Companies' registry: 360296y in Linz
Christian Aistleitner
Gruendbergstrasze 65aEmail:  christ...@quelltextlich.at
4040 Linz, Austria   Phone:  +43 732 / 26 95 63
 Fax:+43 732 / 26 95 63
 Homepage: http://quelltextlich.at/
---


signature.asc
Description: Digital signature
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Speed up tests, make @group Database smarter

2012-06-30 Thread Christian Aistleitner
Hi Chris,

On Thu, Jun 28, 2012 at 08:25:05AM -0600, Chris McMahon wrote:
  P.S.: On a related note ... one could think about mocking the database
  as a whole for PHPUnit tests. Thereby, one would get rid of
  unnecessary database coupling for unit testing, get better
  control/detection of side effects, and really solve the database
  performance problem for unit tests in one go.
 
 
 I'd like to hear more about this.

Luckily enough, we're in a pretty much standard situation here.

Object A relies on an Object B that it receives from a factory C.

For unit testing Object A, you do not want to use a real heavy-weight
Object B, but much rather a fake Object B, say B'. B' does not need /
incorporate the business logic of B. Only for some well-defined cases,
B' should give the same answers that B would give. B' is an extremely
double-plus lightweight B. What B' should however do is book keeping:
Which method has been called using what parameters.

So when unit testing Object A using B'
- the tests typically run faster because
  - B' contains no business logic of B or objects B depends on.
  - B' already knows (instead of B having to compute them) the
correct return values for the defined method invocations.
- we can check what methods Object A invoked on B' and thereby assert
  that the expected functions (and only the expected ones) have been
  invoked on the factory provided object [1].
- failed tests typically signify problems closely related to Object A,
  and no longer problems caused by Object X, that is used by Object Y,
  that ... is used by Object B.

In our case, Object A is the object we want to test.
B is an implementation of DatabaseBase.
B' is a mock implementation of DatabaseBase.
C is the load balancer / wfGetDB(...) / ... whatever layer you choose.

Typically, mocking the database is a huge issue, and hardly
feasible. One would have to abstract database access above the SQL
layer (separate methods for select, insert, ...) to be able to get a
meaningful and usable interface for the tests.

But wait!

We already have all that ;-)

Kind regards,
Christian


[1] Note that we do /not/ have to change B for this, as it all happens
on B'.


-- 
 quelltextlich e.U.  \\  Christian Aistleitner 
   Companies' registry: 360296y in Linz
Christian Aistleitner
Gruendbergstrasze 65aEmail:  christ...@quelltextlich.at
4040 Linz, Austria   Phone:  +43 732 / 26 95 63
 Fax:+43 732 / 26 95 63
 Homepage: http://quelltextlich.at/
---


signature.asc
Description: Digital signature
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Speed up tests, make @group Database smarter

2012-06-30 Thread Platonides
On 30/06/12 14:24, Christian Aistleitner wrote:
 In our case, Object A is the object we want to test.
 B is an implementation of DatabaseBase.
 B' is a mock implementation of DatabaseBase.
 C is the load balancer / wfGetDB(...) / ... whatever layer you choose.
 
 Typically, mocking the database is a huge issue, and hardly
 feasible. One would have to abstract database access above the SQL
 layer (separate methods for select, insert, ...) to be able to get a
 meaningful and usable interface for the tests.
 
 But wait!
 
 We already have all that ;-)

You still need to implement some complex SQL logic.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Speed up tests, make @group Database smarter

2012-06-28 Thread Chris McMahon


 P.S.: On a related note ... one could think about mocking the database
 as a whole for PHPUnit tests. Thereby, one would get rid of
 unnecessary database coupling for unit testing, get better
 control/detection of side effects, and really solve the database
 performance problem for unit tests in one go.


I'd like to hear more about this.
-Chris
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Speed up tests, make @group Database smarter

2012-06-28 Thread Platonides
Am 28/06/12 01:41, Christian Aistleitner schrieb:
 Hi Daniel,
 
 On Wed, Jun 27, 2012 at 09:59:19PM +0200, Daniel Kinzler wrote:
 * MediaWikiTestCase will notice this group and use temporary tables instead 
 of
 the wiki database's actual tables. The temporary tables are re-created for 
 every
 test. This protected the wiki from modifications through test cases, and
 isolates test. So far, so good.
 
 The picture you're painting here is a bit too pessimistic ...
 ... performance-wise.

Daniel, which db backend are you using for your tests?
It can matter where the data resides. Even for temporary tables, you can
end up paying the fsync() penalty of storing the data into disk (when
you really don't care).


 Maybe attacking those parser tests directly will solve the
 database-caused performance problem for you without going through the
 trouble of adding read-only Database tests?

You can just run the parser tests with parserTests.php, which doesn't
recreate tables and is much faster.
(you'll need to apply https://gerrit.wikimedia.org/r/#/c/4111/ under mysql)



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Speed up tests, make @group Database smarter

2012-06-27 Thread Daniel Kinzler
Hi all

Phpunit tests that use the database should use @group Database. But that makes
them extremely slow. I'd like to elevate this problem. Here's the situation:

@group Database does two things, as far as I understand:

* MediaWikiTestCase will notice this group and use temporary tables instead of
the wiki database's actual tables. The temporary tables are re-created for every
test. This protected the wiki from modifications through test cases, and
isolates test. So far, so good.

* Jenkins will do one run for tests with @group Database, and a separate run for
tests *without* @group Database - the latter test is actually run without any
database connection. Any test using the database in any way, and be it only
indirectly and for reading, will fail if it doesn't declare @group Database. Or
so it seems.


There are a number of test cases (and there could and should be many many more)
that use the database for reading only. It would be good to have a @group
DatabaseRead, that does not enforce the slow and expensive creation of temporary
tables, but allows the tests to be run with a database connection in place. This
run could use a connection with a database user that has read-only rights to the
database.

If we decide to do this, there should be some way to define and re-create the
initial contents of the read-only database. Perhaps a maintenance script that
other parts of the code (and extensions) can register with could do the trick.

What do you think? Is this feasible?

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Speed up tests, make @group Database smarter

2012-06-27 Thread Platonides
On 27/06/12 21:59, Daniel Kinzler wrote:
 If we decide to do this, there should be some way to define and re-create the
 initial contents of the read-only database. Perhaps a maintenance script that
 other parts of the code (and extensions) can register with could do the trick.
 
 What do you think? Is this feasible?
 
 -- daniel

How do you define a database read? What would they read if there had
been nothing written earlier?
The idea seems fine, but if we are not really interested in setting
somnehting to be read (ie. we don't care about the read results), that
suggests that it should be possible to make the test run without a db
(eg. see patchset 13037 for one change of that kind).


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Speed up tests, make @group Database smarter

2012-06-27 Thread Christian Aistleitner
Hi Daniel,

On Wed, Jun 27, 2012 at 09:59:19PM +0200, Daniel Kinzler wrote:
 * MediaWikiTestCase will notice this group and use temporary tables instead of
 the wiki database's actual tables. The temporary tables are re-created for 
 every
 test. This protected the wiki from modifications through test cases, and
 isolates test. So far, so good.

The picture you're painting here is a bit too pessimistic ...
... performance-wise.

The tables are /not/ recreated for each test. Run for example the
tests in the maintenance subdirectory. 119 tests. 23 database
tests. But MediaWikiTestCase::initDB is called just /once/.
(Using MySQL backend)

However, the tables' data (a single user, and a single page with a
single revision) is re-injected for each test---but without removing
the data first.

And as a matter of fact, (currently) nothing in the way database unit
tests work isolates the database to a single test. (At least for MySQL)
everything happens on the very same database connection.
To properly isolate (at least table-wise), you'll have to use the
MediaWikiTestCase::tablesUsed array. :-(

But none the less. You are right: Yes, the current situation towards
Database tests is sub-optimal ;-)

 [ Fighting unit test slowness by separate read-only Database tests ]
 What do you think? Is this feasible?

While you're approach is certainly feasible, I am not too sure whether
your approach will cut off much run time.

Of the ~950 Database tests:
 ~70 do not carry out SQL Statements [1] (Most of them probably
 could be treated with or without your approach)
 ~30 carry out at least one read, but no writes [1] (obviously,
 they benefit from your approach)
~850 tests would have to be inspected by hand whether they can benefit
 from your approach. And one would have to merge their data into a
 /single/ fixture :-(

Note however, that of those ~950 Database tests, ~700 are parser
tests. So gaining considerable speed depends on the handling of those.

Maybe attacking those parser tests directly will solve the
database-caused performance problem for you without going through the
trouble of adding read-only Database tests?

Kind regards,
Christian


P.S.: On a related note ... one could think about mocking the database
as a whole for PHPUnit tests. Thereby, one would get rid of
unnecessary database coupling for unit testing, get better
control/detection of side effects, and really solve the database
performance problem for unit tests in one go.


[1] In addDBData / the test function itself.


-- 
 quelltextlich e.U.  \\  Christian Aistleitner 
   Companies' registry: 360296y in Linz
Christian Aistleitner
Gruendbergstrasze 65aEmail:  christ...@quelltextlich.at
4040 Linz, Austria   Phone:  +43 732 / 26 95 63
 Fax:+43 732 / 26 95 63
 Homepage: http://quelltextlich.at/
---


signature.asc
Description: Digital signature
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l