The sx command is available as osx from the open jade package in many linux distributions. On Sep 21, 2012 12:29 PM, "DM Smith" <dmsm...@crosswire.org> wrote:
> So far the discussion is around whether the xml is well-formed. > Once you get that working, then you need to make sure it is valid wrt the > OSIS schema. > > There's an old tool that will convert sgml to well-formed xml. I think it > was James Clark's "sx". I've used it successfully on initial conversions > and getting something that will work within xml tools. > > Finally, OSIS has the notion of milestones for start and end elements. > There are semantic rules regarding this that cannot be checked by standard > xml tools. Osis2mod tries to handle this. When you get to that point, I can > help unravel the logging options. > > The purpose of milestoned elements is to allow for two competing document > models to be in the same xml document: BSP and BCV (names we've given it > here and in the wiki). > > We recommend using BSP (book, chapter, section, paragraph, poetry, lists > to all be containers, not milestoned) and verse elements be milestoned. > > Note, the OSIS manual says that if you have one element milestoned, then > all other elements with the same tag name have to be milestoned. > Practically speaking, this does not matter. SWORD and JSword don't care. > Having verses milestoned only if necessary is probably a better way to > create a good XML document. Start out with all of them as containers and > each place where that causes a problem, either fix the xml or if otherwise > correct, convert to milestoned verses. > > Generally speaking these BSP elements should not start just inside or at > the end of a verse. Rather they should be between verse elements or within > the text. When they are placed just after the verse start, they often will > cause the verse number to be orphaned. When they are placed just before the > verse end, then it is generally not noticeable (just bad form). > > Quotes will create the biggest grief in the above. They often cross > boundaries. Certainly, the beatitudes does, starting in one chapter and > ending a couple of chapters later. For this reason, using the milestoned > version is necessary. > > If you're document follows some simple rules (some required by xml, others > simplifications), then checking nesting is a simple matter of having a > push/pop stack of elements. The simple rules: > 1) All attributes when present have quoted values. > 2) All entities are properly formed and used when needed. Also, < and > > are not in attribute values. > 3) Tags are marked with < ... >, </ ... >, or < ... />. and now new lines > between < and >. > > If this is true then a simple perl script can be written to find the > problems in the file: > Look for < ... /> and skip them. They cause no problems. > Look for < xxx ... > and push the tag name along with its location in the > file on to the stack. > Look for < xxx />, compare xxx to the top element on the stack. If it > doesn't match, then it causes an error. > When you get to the end of the document and the stack is not empty, then > the elements on the stack are not closed properly. > > Printing out the stack (elements and locations) would help find what the > problem is. > > For example: > if xxx is deeper in the stack, then there is a problem with nesting. Look > at all the elements above the xxx on the stack for problems. > if it is not in the stack, then the element was not started prior to that > point or it may have been ended twice. > > Here is a simple perl script (that I wrote), which doesn't do that, but > could be adapted to do it. This creates a histogram/dictionary of tag and > attribute names. > > #!/usr/bin/perl > > use strict; > > my %tags = (); > my %attrs = (); > while (<>) > { > #print; > # While there is a tag on the line > while (/<[^\/\s>]+[\/\s>]/o) > { > # While there is an attribute in the tag > while (/<[^\/\s>]+\s+[^\=\/\>]+=\"[^\"]+\"/o) > { > # remove the attribute > s/<([^\/\s>]+)\s+([^\=\/\>]+)(\="[^\"]+\")(.*)/<$1 $4/o; > my ($t, $a, $v, $r) = ($1, $2, $3, $4); > $attrs{"$t.$a"}++; > } > # remove the tag > s/<([^\/\s>]+)[\/\s>]//o; > $tags{$1}++; > #print("do next tag on line\n"); > } > #print("do next line\n"); > } > > foreach my $tag (sort keys %tags) > { > print("$tag\n"); > } > > foreach my $attr (sort keys %attrs) > { > print("$attr\n"); > } > > Hope this helps, > DM > > On Sep 21, 2012, at 10:52 AM, Andrew Thule <thules...@gmail.com> wrote: > > Thanks everyone for suggestions. I'll give them all a try. > > That said, the emacs recommendation is nearly a religious conversion > recommendation. (I'm on the vi side of the vi verses emacs debate. I > suppose as long as it doesn't kill me I should give it a try, though I'm > not certain what impact it will have on the health of my soul ... :D ) > > ~A > > > On Thursday, September 20, 2012, Daniel Owens wrote: > >> I use jEdit with the XML plugin installed. I find it helps me find >> problems fairly easily. >> >> Daniel >> >> On 09/20/2012 05:26 PM, Greg Hellings wrote: >> >>> There are a number of pieces of software out there that will >>> pretty-print the XML for you, with indenting and whatnot. Overly >>> indented for what you would want in production but decent for >>> debugging mismatching nesting and the like. >>> >>> For example, 'xmllint --format' will properly indent the file, etc. I >>> don't know how it will handle poorly formed XML. >>> >>> GUI editors can do wonders as well. On Windows I use Notepad++ and >>> manually set it to display XML. gEdit and Geany - I believe - both >>> support similar display worlds. And there are some plugins for Eclipse >>> that might handle what you need as well. >>> >>> --Greg >>> >>> On Thu, Sep 20, 2012 at 4:19 PM, Karl Kleinpaste <k...@kleinpaste.org> >>> wrote: >>> >>>> Andrew Thule <thules...@gmail.com> writes: >>>> >>>>> One of my least favour things is finding mismatched tags in OSIS.xml >>>>> files >>>>> Has anyone successfully climbed this summit? >>>>> >>>> XEmacs and xml-mode (and font-lock-mode). M-C-f and M-C-b execute >>>> sgml-forward-element and -backward-. That is, sitting at the beginning >>>> of <tag>, M-C-f (meta-control-f) moves forward to the matching </tag>, >>>> properly handling nested tags. >>>> >>>> ______________________________**_________________ >>>> sword-devel mailing list: sword-devel@crosswire.org >>>> http://www.crosswire.org/**mailman/listinfo/sword-devel<http://www.crosswire.org/mailman/listinfo/sword-devel> >>>> Instructions to unsubscribe/change your settings at above page >>>> >>> ______________________________**_________________ >>> sword-devel mailing list: sword-devel@crosswire.org >>> http://www.crosswire.org/**mailman/listinfo/sword-devel<http://www.crosswire.org/mailman/listinfo/sword-devel> >>> Instructions to unsubscribe/change your settings at above page >>> >>> >> >> ______________________________**_________________ >> sword-devel mailing list: sword-devel@crosswire.org >> http://www.crosswire.org/**mailman/listinfo/sword-devel<http://www.crosswire.org/mailman/listinfo/sword-devel> >> Instructions to unsubscribe/change your settings at above page >> > _______________________________________________ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page > > > > _______________________________________________ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page >
_______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page