If all you need is the <title> tag then I'd just get the file size  (or
enough bytes to make sure the title is read) and then calloc that memory,
read it all in as a single string, use strstr() to get <title> and </title>
and take out what is between the pointers of each strstr().

For something that simple this is much easier, you don't need to link
libxml2, etc.

Yes, I am the contrary one that is always looking for the quick and easy way
-- the above will require no changes to the link Makefile and 10 lines of
code in your C program.

E


Eric S Eberhard
VICS (Vertical Integrated Computer Systems)
Voice: 928 567 3529
Cell    : 928 301 7537  (not reliable except for text or if not home)
2933 W Middle Verde Rd
Camp Verde, AZ  86322

-----Original Message-----
From: xml [mailto:xml-boun...@gnome.org] On Behalf Of Liam R E Quin
Sent: Thursday, August 09, 2018 7:23 PM
To: James Read <jamesread5...@gmail.com>; xml@gnome.org
Subject: Re: [xml] Extract title from html file

On Fri, 2018-08-10 at 02:46 +0100, James Read via xml wrote:
> I have a bunch of html files on disk and want to open them and extract 
> the contents of the title tag using libxml2.

By this do you mean the title element in the head?

You can use XPath on an XML document to extract /html/head/title but you may
need to use the HTML reader, as most HTML files are not well- formed XML
syntactically. You can experiment first with xmllint --xpath
/html/head/title foo.xml and see what happens.

If "a bunch" means tens of thousands of HTML files and you do this often,
consider a tree store such as dbxml or (much easier to get started with i
think) BaseX, so that there's an element index (or
btree) and retrieval might be orders of magnitude faster.

Liam


--
Liam Quin, https://www.holoweb.net/liam/cv/ Web slave for vintage clipart
http://www.fromoldbooks.org/ Available for XML/Document/Information
Architecture/ XSL/XQuery/Web/Text Processing/A11Y work & consulting.

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/ xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml

Reply via email to