I'm trying to write a parser for the Mozilla/Firefox history file. The format of the file is very ugly.
The first starts off with a comment marking the version: // <!-- <mdb:mork:z v="1.4"/> --> Then you get the table that defines what the fields mean: < <(a=c)> // (f=iso-8859-1) (8A=Typed)(8B=LastPageVisited)(80=ns:history:db:row:scope:history:all) (81=ns:history:db:table:kind:history)(82=URL)(83=Referrer) (84=LastVisitDate)(85=FirstVisitDate)(86=VisitCount)(87=Name) (88=Hostname)(89=Hidden)> After that you start to get into the data: <(80=http://www.google.com/)(9442=1128471102815097) (81=1124134918854512)(82=www.google.com)(83 =L$00o$00r$00e$00n$00 $00B$00a$00n$00d$00i$00e$00r$00a$00$19 s$00 $00w$00e\ $00b$00l$00o$00g$00)(919E=171)(86=1)> This is where I start to run into problems. I want to extract that block of data which appears to be in the format: <(key=value)(...repeating pattern...)> I read the file into a string and then get rid of the first line comment. Next I use the following Regex to get the key table: Regex keyTable = new Regex (@"\s*<\(a=c\)>\s*(?:\/\/)?\s*(\(.+?\))\s*>", RegexOptions.Compiled | RegexOptions.Singleline); m = keyTable.Match (morkData); I can then use the Match and parse the table fine. The next thing I do is create a substring starting from where the key table ends to the rest of the data I read from the file. I then use the following Regex to pull out the value table: Regex valueTable = new Regex (@"<\s*(\(.+?\))\s*>", RegexOptions.Compiled | RegexOptions.Singleline); sub = morkData.Substring (pos); m = valueTable.Match (sub); This doesn't work at all. I get a chuck of the data (around 3623 bytes) but I'm expecting more like 800,000. The strange thing is the last part of the string I get back is : "// <!-- <mdb:mork:z v="1.4"/> -->< <(a=c)>" That shouldn't even be there. I'm not sure where that is coming from. I get the same output on Mono as I do with MS.NET so it appears the problem is something I'm doing. I've tried looking at the some of the other solutions to this problem and see what they do: http://www.jwz.org/hacks/mork.pl http://off.net/~mhoye/moz/demork.py But that didn't really help. Does anyone have any suggestions on how I can extract that value data ("<(key=value)(...)>") from the string? -- Loren Bandiera, CISSP <[EMAIL PROTECTED]> MMG Security, Inc. _______________________________________________ Mono-list maillist - Mono-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-list