Yes, I have had no time to work on/review this for a while now, for a number of reasons. My calendar clears up a bit after Easter, hoping to pick it up again then. However, its worth pointing out that the patches as it stands are about as far as I want to push the current api (which is based on selecting a 'region', and then under the hood I made it select a run of text). What I'd started on when I ran out of time was a different approach:
* support tagged PDF first, with 'guessed' text order as a fallback (which will give much better results on tagged PDF, obviously). This would be ridiculously hard in the current TextDevice; the structures don't match at all. * use a caret-less version of the AT-SPI AccessibleText interface (http://library.gnome.org/devel/at-spi-cspi/unstable/at-spi-cspi- AccessibleText-Interface.html). This simplifies a lot of the code I wrote, since you don't have to /repeatedly/ figure out the text order at each level of the block hierarchy. This should also make it easier to hook in accessibility efforts on top of poppler. * implement some automated testing of text extraction via the command line; changing how reading order is guessed always introduces regressions, so any time I do anything right now I need to check it out in a dozen or so documents. Its needlessly painful. This added a command line tool to exercise the AccessibleText-style interface, so I don't need to patch evince to make progress, and don't break the existing tools. * All of this means a new Device class in poppler, since as well as completely replacing the internals, the interface is completely different, and hence evince needs to change as well to integrate it. That's not going to happen quickly, I suspect. I mention all this to make it clear that I'm not actively developing the old patch any more, I've moved on to looking at a long-term solution. I'm of the opinion that the old patch is a big improvement, but I'm not getting code review feedback upstream either (I know it doesn't help when I can only work on this from time to time) -- Evince doesn't handle columns properly https://bugs.launchpad.net/bugs/33288 You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs