Yes, I have had no time to work on/review this for a while now, for a
number of reasons. My calendar clears up a bit after Easter, hoping to
pick it up again then. However, its worth pointing out that the patches
as it stands are about as far as I want to push the current api (which
is based on selecting a 'region', and then under the hood I made it
select a run of text). What I'd started on when I ran out of time was a
different approach:

* support tagged PDF first, with 'guessed' text order as a fallback
(which will give much better results on tagged PDF, obviously). This
would be ridiculously hard in the current TextDevice; the structures
don't match at all.

* use a caret-less version of the AT-SPI AccessibleText interface
(http://library.gnome.org/devel/at-spi-cspi/unstable/at-spi-cspi-
AccessibleText-Interface.html). This simplifies a lot of the code I
wrote, since you don't have to /repeatedly/ figure out the text order at
each level of the block hierarchy. This should also make it easier to
hook in accessibility efforts on top of poppler.

* implement some automated testing of text extraction via the command
line; changing how reading order is guessed always introduces
regressions, so any time I do anything right now I need to check it out
in a dozen or so documents. Its needlessly painful. This added a command
line tool to exercise the AccessibleText-style interface, so I don't
need to patch evince to make progress, and don't break the existing
tools.

* All of this means a new Device class in poppler, since as well as
completely replacing the internals, the interface is completely
different, and hence evince needs to change as well to integrate it.
That's not going to happen quickly, I suspect.

I mention all this to make it clear that I'm not actively developing the
old patch any more, I've moved on to looking at a long-term solution.
I'm of the opinion that the old patch is a big improvement, but I'm not
getting code review feedback upstream either (I know it doesn't help
when I can only work on this from time to time)

-- 
Evince doesn't handle columns properly
https://bugs.launchpad.net/bugs/33288
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to