RE: xml problem

2001-06-21 Thread Chas Owens
> > Try > [...]encoding='ISO-8859-4'[...] > > ISO-8859-1 (aka Latin-1) coveres W. Europe, ISO-8859-4 is the specific > Scandinavian character set (almost, but not quite, the same as -1). > > If this does not work, have a look at using UTF-8 (but this means those > accented characters wil

RE: xml problem

2001-06-21 Thread Richard_Cox
Chas Owens [mailto:[EMAIL PROTECTED]] wrote: > On 21 Jun 2001 10:38:08 +0200, Morgan wrote: > > This script is exelent but I need the script to read the > letters "åäö" > > and "ÅÄÖ" too. > > Cuz this is part of my launguage (Swedish) and those > letters are in the > > articles. > I am working o

Re: xml problem

2001-06-21 Thread Chas Owens
On 21 Jun 2001 10:38:08 +0200, Morgan wrote: > This script is exelent but I need the script to read the letters "åäö" > and "ÅÄÖ" too. > Cuz this is part of my launguage (Swedish) and those letters are in the > articles. I am working on this, I don't understand what it is doing with them. If I ad

Re: xml problem

2001-06-21 Thread Morgan
This script is exelent but I need the script to read the letters "åäö" and "ÅÄÖ" too. Cuz this is part of my launguage (Swedish) and those letters are in the articles. And I need to have the word between the tags in too. Finaly I how do I enclose the article with cuz Chas has right there, it can

Re: xml problem

2001-06-20 Thread Chas Owens
> > Not withstanding my other comment, this code is also inefficient, > both tactically and strategically. I know it was horrendous code, it was just the first thing that popped into my head. After I had it working I was going to make it more efficient. > > Take for example the string "\200a

Re: xml problem

2001-06-20 Thread Randal L. Schwartz
> "Randal" == Randal L Schwartz <[EMAIL PROTECTED]> writes: Randal> Take for example the string "\200abc"... Randal> After you replace "\200" with "&200;", Oh blah. I can't do math this early. "\200" is replaced with "€", but the rest of the comment stands. :) -- Randal L. Schwartz - St

Re: xml problem

2001-06-20 Thread Randal L. Schwartz
> "Chas" == Chas Owens <[EMAIL PROTECTED]> writes: Chas> #replace anything not in lower ASCII, Damn Americans Chas> for (my $i = 0; $i < length($file); $i++) { Chas> my $char = ord(substr($file, $i, 1)); Chas> if ($char > 128) { Chas> print "replacing ", chr($

Re: xml problem

2001-06-20 Thread Randal L. Schwartz
> "Chas" == Chas Owens <[EMAIL PROTECTED]> writes: Chas> #replace anything not in lower ASCII, Damn Americans Chas> for (my $i = 0; $i < length($file); $i++) { Chas> my $char = ord(substr($file, $i, 1)); Chas> if ($char > 128) { Chas> print "replacing ", chr($

Re: xml problem

2001-06-20 Thread Chas Owens
On 20 Jun 2001 11:54:12 -0400, Chas Owens wrote: > open FH, ">$ARGV[0].tmp.$$" or die "Could not open $ARGV[0]:$!"; > > print FH $file; > > close FH; > > > and change > > $parser->parsefile($ARGV[0]); > > to > > $parser->parsefile("$ARGV[0].tmp.$$") I have removed the writing of $file t

Re: xml problem

2001-06-20 Thread Chas Owens
Hrmmm... There is a classic joke: What do you call someone who speaks many languages? A polygot. What do you call someone who speaks two languages? A bilingual. What do you call someone who speaks one language? An American. My first attempt to fix this was to add: open FH, $ARGV[0] or die "Cou

Re: xml problem

2001-06-20 Thread Morgan
Thank you very much for this script. And if English had been my native lauage it had been perfect. I need the script to add my native letters as well, I'm Swedish and therfor use "åäöÅÄÖ" in the articles. Is this possible? And in the text some words is wrapped in tags, is it possible to remove

Re: xml problem

2001-06-20 Thread Me
> > I don't know about how XML::Parser handles memory - last time > > I tried to use it to parse content.rdf from http://dmoz.org , > > it soaked up all my memory, then bombed. Sometimes, you need > > to write your own parsing subs :) > > Is the file you referred to a really big file? dmoz is

RE: xml problem

2001-06-20 Thread Grant McLean
From: Nigel Wetters [mailto:[EMAIL PROTECTED]] > I don't know about how XML::Parser handles memory - last time > I tried to use it to parse content.rdf from http://dmoz.org , > it soaked up all my memory, then bombed. Sometimes, you need > to write your own parsing subs :) A casual reader coul

Re: xml problem

2001-06-20 Thread Nigel Wetters
TMTOWTDI. I don't know about how XML::Parser handles memory - last time I tried to use it to parse content.rdf from http://dmoz.org , it soaked up all my memory, then bombed. Sometimes, you need to write your own parsing subs :) >>> Chas Owens <[EMAIL PROTECTED]> 06/19/01 09:39pm >>> Please, p

Re: xml problem

2001-06-19 Thread Chas Owens
Please, please, please, do not try to parse XML with regexps. They only work in the simplest cases. There are perfectly good XML modules designed to parse XML for you and they are not that hard to use. The following code parses an XML file similar to the one you described, but has an additional

Re: xml problem

2001-06-19 Thread Markus Peter
On Tue, 19 Jun 2001, Morgan wrote: > Here is the problem. > I will recive newsarticles three times a day in xml format and I need to > automaticly publish those articels on a web page, on the first page it > should only show the tags down to > tag and a link to the whole page. Well - as mention

Re: xml problem

2001-06-19 Thread Nigel Wetters
I think I can give you some clues. Here's some code out of the Perl Cookbook (6.8 Extracting a Range of Lines), which I've adapted for you. You should be able to nest such structures to get what you want. my $extracted_lines = ''; while (<>) { if (/BEGIN PATTERN/ .. /END PATTERN/) {

xml problem

2001-06-19 Thread Morgan
Hi I'm newbee perl developer and a rookie of xml :( Is there anyone who can give me some hints or help me out with a problem I have? Here is the problem. I will recive newsarticles three times a day in xml format and I need to automaticly publish those articels on a web page, on the first page