> > So I got my code working now and it looks like this > > TAG = '{http://www.mediawiki.org/xml/export-0.10/}page' > doc = etree.iterparse(wiki) > > for _, node in doc: > if node.tag == TAG: > title = > node.find("{http://www.mediawiki.org/xml/export-0.10/}title").text > if title in page_titles: > print (etree.tostring(node)) > node.clear() > Its mostly giving me what I want. However it is adding extra formatting (I > believe name_spaces and attributes). I was wondering if there was a way to > strip these out when I'm printing the node tostring?
I suspect that you'll want to do an explicit walk over the node. Rather than use etree.tostring(), which indiscriminately walks the entire tree, you'll probably want to write a function to walk over selected portions of the tree structure. You can see: https://docs.python.org/2/library/xml.etree.elementtree.html#tutorial for an introduction to navigating portions of the tree, given a node. As a more general response: you have significantly more information about the problem than we do. At the moment, we don't have enough context to effectively help; we need more information. Do you have a sample *input* file that folks here can use to execute on your program? Providing sample input is important if you want reproducibility. Reproducibility is important because then we'll be on the same footing in terms of knowing what the problem's inputs are. See: http://sscce.org/ As for the form of the desired output: can you say more precisely what parts of the document you want? Rather than just say: "this doesn't look the way I want it to", it may be more helpful to say: "here's *exactly* what I'd like it to look like..." and show us the desired text output. That is: by expressing what you want as a collection of concrete input/output examples, you gain the added benefit that once you have revised your program, you can re-run it and see if what it's producing is what you anticipate. That is, you can use these concrete examples as a "regression test suite". This technique is something that software engineers use regularly in their day-to-day work, to make believe that their incomplete programs are working, to write out explicitly what those programs should do, and then to work on their programs until they do what they want them to. Good luck to you! _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor