If all you need is the <title> tag then I'd just get the file size (or enough bytes to make sure the title is read) and then calloc that memory, read it all in as a single string, use strstr() to get <title> and </title> and take out what is between the pointers of each strstr().
For something that simple this is much easier, you don't need to link libxml2, etc. Yes, I am the contrary one that is always looking for the quick and easy way -- the above will require no changes to the link Makefile and 10 lines of code in your C program. E Eric S Eberhard VICS (Vertical Integrated Computer Systems) Voice: 928 567 3529 Cell : 928 301 7537 (not reliable except for text or if not home) 2933 W Middle Verde Rd Camp Verde, AZ 86322 -----Original Message----- From: xml [mailto:xml-boun...@gnome.org] On Behalf Of Liam R E Quin Sent: Thursday, August 09, 2018 7:23 PM To: James Read <jamesread5...@gmail.com>; xml@gnome.org Subject: Re: [xml] Extract title from html file On Fri, 2018-08-10 at 02:46 +0100, James Read via xml wrote: > I have a bunch of html files on disk and want to open them and extract > the contents of the title tag using libxml2. By this do you mean the title element in the head? You can use XPath on an XML document to extract /html/head/title but you may need to use the HTML reader, as most HTML files are not well- formed XML syntactically. You can experiment first with xmllint --xpath /html/head/title foo.xml and see what happens. If "a bunch" means tens of thousands of HTML files and you do this often, consider a tree store such as dbxml or (much easier to get started with i think) BaseX, so that there's an element index (or btree) and retrieval might be orders of magnitude faster. Liam -- Liam Quin, https://www.holoweb.net/liam/cv/ Web slave for vintage clipart http://www.fromoldbooks.org/ Available for XML/Document/Information Architecture/ XSL/XQuery/Web/Text Processing/A11Y work & consulting. _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml