Hi again My data are big. Im trying to do subsets form Medline database and I also have to read CDATA tags and store the ones I am interested in. my current version of the code is simply reading the xml elements and storing and thats takes 13 hours to process and its not good at all :S
thanks Ashjan On Sat, 6 Jul 2019 at 13:00, <[email protected]> wrote: > Send xml mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.gnome.org/mailman/listinfo/xml > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of xml digest..." > > > Today's Topics: > > 1. Re: Xml Question (Eric Eberhard) > 2. Re: Xml Question (Liam R E Quin) > 3. Re: Xml Question (Eric Eberhard) > 4. Re: Xml Question (Eric Eberhard) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 5 Jul 2019 12:18:41 -0700 > From: "Eric Eberhard" <[email protected]> > To: "'Liam R E Quin'" <[email protected]>, "'Ashjan Alsulaimani'" > <[email protected]>, <[email protected]> > Subject: Re: [xml] Xml Question > Message-ID: <[email protected]> > Content-Type: text/plain; charset="us-ascii" > > Dear Ashjan, > > If it was me I'd do it the cheap way and not use the parser. Get the file > and then read through it with your favorite language and look for starting > tags you want moved, then scan until you hit the ending tag, write that > out. > Rinse and repeat. You can use the parser on each piece you write out. > > It is surely possible to do it in both ways described and I know of other > that works on small files. But this is a LOT easier. > > Eric > > -----Original Message----- > From: xml [mailto:[email protected]] On Behalf Of Liam R E Quin > Sent: Thursday, July 04, 2019 6:28 AM > To: Ashjan Alsulaimani <[email protected]>; [email protected] > Subject: Re: [xml] Xml Question > > On Thu, 2019-07-04 at 10:33 +0100, Ashjan Alsulaimani wrote: > > > > > > What's the best way to approach such a task and the most efficient way > > as I'm dealing with Medline database! > > If your input files are a few hundred megabytes or less, start with the > XSLT > identity transform and add empty templates to match what you want to > delete. > > If your input is over a gigabyte (say) or you do lots of different subsets > of the same document, you may find XQuery update works better for you, with > a databaase (e.g. BaseX or eXistb). > > Liam > > > -- > Liam Quin, https://www.delightfulcomputing.com/ > Available for XML/Document/Information Architecture/XSLT/ > XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. > Upcoming courses: DocBook (sold out); CSS for XML People > > _______________________________________________ > xml mailing list, project page http://xmlsoft.org/ [email protected] > https://mail.gnome.org/mailman/listinfo/xml > > > > > ------------------------------ > > Message: 2 > Date: Fri, 05 Jul 2019 17:24:05 -0400 > From: Liam R E Quin <[email protected]> > To: Eric Eberhard <[email protected]>, 'Ashjan Alsulaimani' > <[email protected]>, [email protected] > Subject: Re: [xml] Xml Question > Message-ID: > <[email protected]> > Content-Type: text/plain; charset="UTF-8" > > On Fri, 2019-07-05 at 12:18 -0700, Eric Eberhard wrote: > > Dear Ashjan, > > > > If it was me I'd do it the cheap way and not use the parser. > > Make sure to handle markup in comments and CDATA sections properly,and > to process external files included with XInclude or by entities defined > in the DTD. > > Working with XML at the text level can be reasonably safe if you know > the input files well, and yes, i sometimes do it too, but cheap isn't > the same as good :) > > Liam > > > -- > Liam Quin, https://www.delightfulcomputing.com/ > > Upcoming course: CSS for XML People, Rockville MD, August 2019 > See https://www.delightfulcomputing.com/ > > > > ------------------------------ > > Message: 3 > Date: Fri, 5 Jul 2019 14:49:01 -0700 > From: "Eric Eberhard" <[email protected]> > To: "'Liam R E Quin'" <[email protected]>, "'Ashjan Alsulaimani'" > <[email protected]>, <[email protected]> > Subject: Re: [xml] Xml Question > Message-ID: <[email protected]> > Content-Type: text/plain; charset="utf-8" > > Your answer is spot on. I don't know if he has markup and CDATA or if his > files are large. If none of those are true, cheap is good :-) If it is a > gig file with CDATA and markup, cheap would be bad. > > E > > -----Original Message----- > From: Liam R E Quin [mailto:[email protected]] > Sent: Friday, July 05, 2019 2:24 PM > To: Eric Eberhard <[email protected]>; 'Ashjan Alsulaimani' < > [email protected]>; [email protected] > Subject: Re: [xml] Xml Question > > On Fri, 2019-07-05 at 12:18 -0700, Eric Eberhard wrote: > > Dear Ashjan, > > > > If it was me I'd do it the cheap way and not use the parser. > > Make sure to handle markup in comments and CDATA sections properly,and to > process external files included with XInclude or by entities defined in the > DTD. > > Working with XML at the text level can be reasonably safe if you know the > input files well, and yes, i sometimes do it too, but cheap isn't the same > as good :) > > Liam > > > -- > Liam Quin, https://www.delightfulcomputing.com/ > > Upcoming course: CSS for XML People, Rockville MD, August 2019 > See https://www.delightfulcomputing.com/ > > > > > > ------------------------------ > > Message: 4 > Date: Fri, 5 Jul 2019 14:57:57 -0700 > From: "Eric Eberhard" <[email protected]> > To: "'Liam R E Quin'" <[email protected]>, "'Ashjan Alsulaimani'" > <[email protected]>, <[email protected]> > Subject: Re: [xml] Xml Question > Message-ID: <[email protected]> > Content-Type: text/plain; charset="us-ascii" > > Oh -- if smaller file here is some cheap code that works fine. You will > have to create a new document for each smaller pieces and then copy the > pieces over like so: > > for (cur=fromwrk->cur;cur;cur=cur->next) { > tmp = xmlCopyNode(cur,1); > xmlAddChild(towrk->cur,tmp); > } > > >From being you original file and cur being your current little file. > > E > > -----Original Message----- > From: xml [mailto:[email protected]] On Behalf Of Eric Eberhard > Sent: Friday, July 05, 2019 12:19 PM > To: 'Liam R E Quin' <[email protected]>; 'Ashjan Alsulaimani' > <[email protected]>; [email protected] > Subject: Re: [xml] Xml Question > > Dear Ashjan, > > If it was me I'd do it the cheap way and not use the parser. Get the file > and then read through it with your favorite language and look for starting > tags you want moved, then scan until you hit the ending tag, write that > out. > Rinse and repeat. You can use the parser on each piece you write out. > > It is surely possible to do it in both ways described and I know of other > that works on small files. But this is a LOT easier. > > Eric > > -----Original Message----- > From: xml [mailto:[email protected]] On Behalf Of Liam R E Quin > Sent: Thursday, July 04, 2019 6:28 AM > To: Ashjan Alsulaimani <[email protected]>; [email protected] > Subject: Re: [xml] Xml Question > > On Thu, 2019-07-04 at 10:33 +0100, Ashjan Alsulaimani wrote: > > > > > > What's the best way to approach such a task and the most efficient way > > as I'm dealing with Medline database! > > If your input files are a few hundred megabytes or less, start with the > XSLT > identity transform and add empty templates to match what you want to > delete. > > If your input is over a gigabyte (say) or you do lots of different subsets > of the same document, you may find XQuery update works better for you, with > a databaase (e.g. BaseX or eXistb). > > Liam > > > -- > Liam Quin, https://www.delightfulcomputing.com/ > Available for XML/Document/Information Architecture/XSLT/ > XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. > Upcoming courses: DocBook (sold out); CSS for XML People > > _______________________________________________ > xml mailing list, project page http://xmlsoft.org/ [email protected] > https://mail.gnome.org/mailman/listinfo/xml > > > _______________________________________________ > xml mailing list, project page http://xmlsoft.org/ [email protected] > https://mail.gnome.org/mailman/listinfo/xml > > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > xml mailing list > [email protected] > https://mail.gnome.org/mailman/listinfo/xml > > > ------------------------------ > > End of xml Digest, Vol 180, Issue 4 > *********************************** >
_______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] https://mail.gnome.org/mailman/listinfo/xml
