Hi,

my main advice up-front: if you want good advice, give good details about the problem you face and enough background to let others understand what the real problem is and what your environmental constraints are.


Karjer Jdfjdf, 15.04.2010 08:03:
I know the theory of XML but have never used it really and
I'm a bit unsecure about it.

At least in Python, XML is trivial to use, as long as you use proper XML tools. It's a bit more tricky in Java, but there are some usable tools there, too. From what you write about the program, it's rather unlikely that is uses them, though.


Basically I'm doing the following:

1. retrieve data from a database ( instance in q )
2. pass the data to an external java-program that requires file-input

XML input, I assume. So this is the part where you generate your XML output from database input. Fair enough.


3. the java-program modifies the inputfile and creates an outputfile based on 
the inputfile

Is this program outside of your control or do you have the sources?

Does it do anything really complex and tricky, or could it be reimplemented in a couple of lines in Python? The mere fact that a Java program "is huge and takes minutes to run" doesn't mean much in general. That's often just due to a bad design.


4. I read the outputfile and try to parse it.

Is the output file a newly created XML file or does the Java program do text processing on the input file?


1 to 3 are performed by a seperate program that creates the XML
4 is a program that tries to parse it (and then perform other
modifications using python)

When I try to parse the outputfile it creates different errors such as:
    * ExpatError: not well-formed (invalid token):

Basically it ususally has something to do with not-well-formed XML.

I.e. not XML at all.


Unfortunately the Java-program also alters the content on essential
points such as inserting spaces in tags (e.g. id="value" to id = " value " ),

Any XML parser will handle the whitespace around 'id' and '=' for you, but the fact that the program alters the content of an 'id' attribute seems rather frightening.


The Java is really a b&%$#!, but I have
no alternatives because it is custommade (but very poorly imho).

You should make sure it's really carved into stone. Could you give a hint about what the program actually does?


This means trying a few little things takes hours.
Because of the long load and processing time of the Java-program I'm forced
to store the output in a single file instead of processing it record by record.

Maybe you could reduce the input file size for testing, or does it really take 10 Minutes to run a small file through it?


Also each time I have to change something I have to modify functions in
different libraries that perform specific functions. This probably means
that I've not done it the right way in the first place.

Not sure I understand what you mean here.


      text = str('<record id="' + str(instance.id)+ '">\n' + \
'<date>' + str(instance.datetime) + '</date>\n' + \
'<order>' + instance.order + '</order>\n' + \
'</record>\n')

You can simplify this quite a lot. You almost certaionly don;t need
the outer str() and you probably don;t need the \ characters either.

I use a very simplified text-variable here. In reality I also include
other fields which contain numeric values as well. I use the \ to
keep each XML-tag on a seperate line to keep the overview.

You should really use a proper tool to generate the XML. I never used jaxml myself, but I guess it would do the job at hand. If the file isn't too large (a couple of MB maybe), you can also go with cElementTree and just create the file in memory before writing it out.


Also it might be easier to use a triple quoted string and format
characters to insert the dasta values.

That's certainly good advice if you really want to take the "string output" road. Check the tutorial for '%' string formatting.


Did you try to use the standard library tools that come with Python,
like elementTree or even sax?

I've been trying to do this with minidom, but I'm not sure if this
is the right solution because I'm pretty unaware of XML-writing/parsing

If anything, use cElementTree instead.


At the moment I'm tempted to do a line-by-line parse and trigger on
an identifier-string that identifies the end and start of a record.
But that way I'll never learn XML.

If the Java program really can't be forced into outputting XML, running an XML parser over the result is doomed to fail. XML is a very well specified format and a compliant XML parser is bound to reject anything that violates the spec.

Stefan

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to