> to be able to retrieve informations from man pages, specially > existing > options and their help strings.
I'd actually try parsing the man page sources. They are already in a layout language so finding the options should be easy - there is a man macro for that I believe. But i've never tried it! If that turns out to be harder than I think, then an XML option would be next best. But i donlt know of a man2xml command. Finally the man2html route is definitely possible and BeautifulSoup should be able to find the option tags for you - but it may find many others besides so you might need some clever selection criteria. But the parsing at least should be done for you. > could parse directly the troff sources (is there already a parser > for that?), I don;t know if there is a parser, but ISTR there is a specific option tag in the man macros for command options so it should be easy to find and extract the data by looking for .OP or whatever the tag is. The good news is that troff macros are nearly always located at the marging and start with a dot so they are easy to locate using simple regex. Actually I just had a quick look and its not so good after all, the format seems to be .SH Options ......text in here .B <option> But worse not all pages follow the official format as exemplified in the "yes" man page, some don;t even have an Options SubHeading... Another option(sic) to consider is parsing the .cat files that are produced by man on first use - a bit like pyhon produces .pyc files). They are plain text so might be easier to search. Finally you could look at the info files(written in LaTeX, if they exist for all your commands. But I think an xml/html solution is looking better! HTH, Alan Gauld Author of the Learn to Program web site http://www.freenetpages.co.uk/hp/alan.gauld _______________________________________________ Tutor maillist - [email protected] http://mail.python.org/mailman/listinfo/tutor
