subject:"\[sqlalchemy\] Re\: Queries very slow for parsing Wikipedia dump \-\- any ideas for speeding it up\?"

[sqlalchemy] Re: Queries very slow for parsing Wikipedia dump -- any ideas for speeding it up?

2008-09-23 Thread az

this is general programming approach, not sql specific. for a 7 mil objects... u have to try to do some vertical (wrong term probably) layer-splitting of the data. imagine the objects being rectangles on horizontal line, each containg same layers. now u walk the rectangles like for each in X:

[sqlalchemy] Re: Queries very slow for parsing Wikipedia dump -- any ideas for speeding it up?

2008-09-23 Thread Shawn Church

Just a couple thoughts that might help you out: 1) I would profile the code. It seems to me that running a regular expression on an entire wikipedia article would be a VERY expensive operation. 2) Did the first pass succeed and how long did it take? 3) Taking a quick look at

[sqlalchemy] Re: Queries very slow for parsing Wikipedia dump -- any ideas for speeding it up?

2008-09-23 Thread Michael Bayer

On Sep 22, 2008, at 11:20 PM, CodeIsMightier wrote: for link_label, link_dest_title, dest_frag in self.parse_links(self.text): print 'LINK from:', repr(self.title), 'to', repr(link_dest_title + '#' + dest_frag), 'label', repr(link_label) try: