> I have done some manual WAL decoding for my forensic software that can > identifiy a previous DB state - its fun :)... > > (From memory) > > To determine which pages belong to the last transaction, you need to : > > Read the WAL header to obtain the current salt, then read each wal > frame to determine which frames belong to the current transaction and > then you can get the page number from the frame. > > To determine which table/index the page belongs to you will need to > check each b-tree (making sure that you take account of pages that are > in the current WAL checkpoint - i.e. exist in the WAL prior to the > page you are looking for). > > You will then need to manually decode each page you have identified in > the WAL to get the rows within the page and then you need to determine > what rows have been added/deleted or altered. > > The issues here are that a small change to a table (one row > added/edited) can result in many changes to the pages for that table, > e.g. if a string is appended to then the row may no longer fit in the > page that it currently occupies so the row will be moved. Depending on > the key structure for the table this may also result in other rows > being moved to maintain the structure of the B-Tree. So if you read a > page from the WAL and then find the* previous version of that page > (which may be earlier in the WAL or in the DB) and a row is not > present you will need to parse all the changed pages in that tree to > determine whether it has been deleted or just moved. > > (* I say THE previous page because you can and very often do have > multiple copies of a page in the WAL.) > > All of the information you need is in the DB file format at the link > provided earlier. It is however not as simple as just reading the > format spec though, it took me a lot of experimentation (including > writing some DB/WAL page level visualisation tools) to work out what > is actually going on.
Thanks for the info. I think this is what I need. Can you comment on time complexity in this method?