[Bug 151577] Writer PDF import filter should default to producing paragraphs of text, not drawing objects

2025-11-26 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151577

--- Comment #17 from Lightsky  ---
(In reply to Eyal Rozenberg from comment #16)
> we would need to decide what the relative
> linear order of most boxes is; 

Could you please explain this part in more details?

> and which of them start paragraphs.

This would be ideal of course, but my point is not to try to implement a
perfect solution at a 1st step. Couldn't this be ignored at a 1st step and just
do a new paragraph for EACH textbox in Writer?
Citing from my earlier post "1 paragraph for each line of original PDF".
#32249 has been pending since 2010, the perfect solution isn't gonna happen
anytime soon I guess.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 151577] Writer PDF import filter should default to producing paragraphs of text, not drawing objects

2025-11-26 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151577

--- Comment #16 from Eyal Rozenberg  ---
(In reply to Lightsky from comment #15)
> But specifically for Writer even outputting text with new paragraphs (even
> if its 1 paragraph for each line of original PDF) instead of a textbox for
> each line would make it much easier to edit in Writer (as a 1st step).

That is actually not just a "first step" - it is already a big part of the
work. To do what you suggest, we would need to decide what the relative linear
order of most boxes is; and which of them start paragraphs. If you've done that
- you're already half way to complete document structure reconstruction (and
note that I didn't say _perfect_ document reconstruction).

> One could add another "behavior" parameter to a filter function in order not
> to split the code between Draw and Writer.

I'm not a developer working on this code; but I can still share experience from
other projects and note that making a single piece of code have multiple
configurable behaviors sometimes makes it more complicated than just splitting
the code into distinct somewhat-similar pieces. Think about, say, a pair of
smaller and larger knives, vs. a swiss-army-knife... I'm sure that if and when
developers start working on this, they'll make a good call regarding what parts
of the code to share and what parts to keep apart.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 151577] Writer PDF import filter should default to producing paragraphs of text, not drawing objects

2025-11-25 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151577

--- Comment #15 from Lightsky  ---
> My reason for agreeing here is that recombining text in PDFs is technically 
> hard; 

I certainly understand that recombining text from PDF can be an algorithmic
issue.
But specifically for Writer even outputting text with new paragraphs (even if
its 1 paragraph for each line of original PDF) instead of a textbox for each
line would make it much easier to edit in Writer (as a 1st step).
One could add another "behavior" parameter to a filter function in order not to
split the code between Draw and Writer.

Just not to duplicate it here see also my comment here:
https://bugs.documentfoundation.org/show_bug.cgi?id=32249#c45

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 151577] Writer PDF import filter should default to producing paragraphs of text, not drawing objects

2025-07-01 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151577

--- Comment #10 from Eyal Rozenberg  ---
Half-an-apology Stuart, I realize now that you didn't actually mark this as a
dupe, you merely made the baseless claim of this being a dupe, and Dave Gilbert
obliged you.

Oh, and this bug has almost nothing to do with 118370.

Dave: Please don't do these kinds of things.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 151577] Writer PDF import filter should default to producing paragraphs of text, not drawing objects

2025-07-01 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151577

--- Comment #14 from Dave Gilbert  ---
(In reply to Eyal Rozenberg from comment #10)
> Half-an-apology Stuart, I realize now that you didn't actually mark this as
> a dupe, you merely made the baseless claim of this being a dupe, and Dave
> Gilbert obliged you.

Hey! I often disagree with Stuart - I don't 'oblige' people - I read it,
consider
and do what I think is technically correct.

> Oh, and this bug has almost nothing to do with 118370.
> 
> Dave: Please don't do these kinds of things.

Please don't get into ranting matches.

My reason for agreeing here is that recombining text in PDFs is technically
hard; it certainly needs doing - but it's not like we have to fix some existing
code; we need to go and try a whole bunch of systems to see what would work and
write a whole new thing.
It's not like we're missing some small corner/feature that we need to fix.

Having said that, I really don't care if this is a dupe; I just fix code.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 151577] Writer PDF import filter should default to producing paragraphs of text, not drawing objects

2025-07-01 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151577

Eyal Rozenberg  changed:

   What|Removed |Added

 CC||paolo.vecchi@documentfounda
   ||tion.org

--- Comment #13 from Eyal Rozenberg  ---
(In reply to V Stuart Foote from comment #12)
> @Eyal,
> 
> Its been mentioned on multiple occasions it servs no purpose to open
> multiple BZ issues for what are essentially identical issues. 

And it has also been mentioned it serves no purpose, or negative purpose, to
fold different issues into a single kitchen-sink issue.

> And splitting a hair here and calling it a bug rather than an enhancement to
> the Writer PDF import filter is just petty--calling them "your bugs" just
> shows the extent of your ego.
> 
> When you do this petty sniping, I can't help but compare you to Paolo. Is
> that the reputation you're looking for. 

Bugzilla page about importing PDFs into writer is definitely not where I would
write anything regarding Paolo's virtues or faults. So absolutely no comment on
that.

> But to issue at hand, the poppler -> cairo based PDF filters are monolithic,
> what affect one module affects all.

These are two filters, for two applications, which should have significantly
different behavior. Perhaps a separate bug (or meta-bug) should be filed about
drawing them apart from each other, as part of a wider re-organization of
PDF-support-related bugs.

> The scope of effort remaining for bug 32249 is exactly what would still be
> required to work with text runs as Paragraph objects in swriter, or Text
> Boxes in sdraw.

Whatever we may think of bug 32249 (I am hoping to get enough support to split
it up and either keep it as a meta-bug or replace it with one), it regards
importing PDFs into Draw; this bug regards importing them into Writer.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 151577] Writer PDF import filter should default to producing paragraphs of text, not drawing objects

2025-07-01 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151577

--- Comment #12 from V Stuart Foote  ---
@Eyal,

Its been mentioned on multiple occasions it servs no purpose to open multiple
BZ issues for what are essentially identical issues. 

And splitting a hair here and calling it a bug rather than an enhancement to
the Writer PDF import filter is just petty--calling them "your bugs" just shows
the extent of your ego.

When you do this petty sniping, I can't help but compare you to Paolo. Is that
the reputation you're looking for. 

But to issue at hand, the poppler -> cairo based PDF filters are monolithic,
what affect one module affects all.

The scope of effort remaining for bug 32249 is exactly what would still be
required to work with text runs as Paragraph objects in swriter, or Text Boxes
in sdraw. If anything more so, given the distinction between text held inside
sd text box objects and likely need to place extracted paragraphs into object
frames to be able to repliccate the document layout from a PDF source on a
swriter page.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 151577] Writer PDF import filter should default to producing paragraphs of text, not drawing objects

2025-07-01 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151577

Eyal Rozenberg  changed:

   What|Removed |Added

   See Also||https://bugs.documentfounda
   ||tion.org/show_bug.cgi?id=32
   ||249

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 151577] Writer PDF import filter should default to producing paragraphs of text, not drawing objects

2025-07-01 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151577

Eyal Rozenberg  changed:

   What|Removed |Added

   Severity|enhancement |normal

--- Comment #11 from Eyal Rozenberg  ---
Oh, and of course this isn't an enhancement, it's just a bug. If we open a PDF
in Writer, the filter should reconstitute a Writer document - as best it can -
from the PDF. Failing to constitue paragraphs and filling each page with a
bunch of drawing objects is simply a failure.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 151577] Writer PDF import filter should default to producing paragraphs of text, not drawing objects

2025-07-01 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151577

Eyal Rozenberg  changed:

   What|Removed |Added

 Blocks|99746   |
 Ever confirmed|1   |0
 Status|RESOLVED|UNCONFIRMED
 Resolution|DUPLICATE   |---

--- Comment #9 from Eyal Rozenberg  ---
Stuart, stop messing with my bugs. 

> Sorry, it is a dupe of bug 33249 clear an simple.

It clearly and simply isn't.


Referenced Bugs:

https://bugs.documentfoundation.org/show_bug.cgi?id=99746
[Bug 99746] [META] PDF import filter in Draw
-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 151577] Writer PDF import filter should default to producing paragraphs of text, not drawing objects

2025-07-01 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151577

V Stuart Foote  changed:

   What|Removed |Added

   See Also|https://bugs.documentfounda |
   |tion.org/show_bug.cgi?id=32 |
   |249 |

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 151577] Writer PDF import filter should default to producing paragraphs of text, not drawing objects

2025-07-01 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151577

V Stuart Foote  changed:

   What|Removed |Added

   Keywords|needsDevAdvice  |

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 151577] Writer PDF import filter should default to producing paragraphs of text, not drawing objects

2025-07-01 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151577

--- Comment #7 from V Stuart Foote  ---
(In reply to V Stuart Foote from comment #6)
> Sorry, it is a dupe of bug 33249 clear an simple. Filter functions needed to
> ...

better make that bug 32249

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 151577] Writer PDF import filter should default to producing paragraphs of text, not drawing objects

2025-07-01 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151577

V Stuart Foote  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEEDINFO
   Keywords||needsDevAdvice
 Ever confirmed|0   |1
   See Also||https://bugs.documentfounda
   ||tion.org/show_bug.cgi?id=32
   ||249,
   ||https://bugs.documentfounda
   ||tion.org/show_bug.cgi?id=11
   ||8370
   Severity|normal  |enhancement
 Blocks||99746
 CC||[email protected],
   ||[email protected],
   ||[email protected],
   ||[email protected],
   ||[email protected]

--- Comment #6 from V Stuart Foote  ---
Sorry, it is a dupe of bug 33249 clear an simple. Filter functions needed to
render PDF text spans back as Paragraph objects would be the same across all LO
modules. 

Comment 0 was opened against a Writer originated ODF document, but there is no
distinction made in the export filter(s) (PDF has no "paragraph" object keeping
text spans together as sentences, even words might be broken apart). And this
*enhancement* is not about the LO Hybrid PDF that attaches the ODF source
document into the PDF and selectively LO will open that attachment on
import--bypassing the PDF facsimile. But that already functions as an export
option.

For bug 32249 and bug 118370 Justin L. completed *one* reasonable approach
working with the poppler -> cairo extracted sd text box objects from the PDF
BT/ET spans, of "consolidating" a selection of the generated text boxes into a
single text box object.

An alternative was proposed at
https://bugs.documentfoundation.org/show_bug.cgi?id=32249#c19 of an process
taking the extracted strings (still poppler -> cairo based) and reflowing that
into lexically correct full sentences or full paragraph objects. And assembling
those into as an ODF ready object available to style, spell check, etc. Focus
would be less on the layout of the PDF and more on extracting a lexicographic
correct representation of a page.

So, this bz issue could be that additional work. More fully scoped here. Or, we
 could set back to the dupe it is as bug 33249 was left open after the work on
bug 118370 but scope was not expanded to all PDF import filters. 

Added the devs with insight, for their opinions, but coin flip set it again as
the dupe it is.


Referenced Bugs:

https://bugs.documentfoundation.org/show_bug.cgi?id=99746
[Bug 99746] [META] PDF import filter in Draw
-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 151577] Writer PDF import filter should default to producing paragraphs of text, not drawing objects

2025-07-01 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151577

Dave Gilbert  changed:

   What|Removed |Added

 Status|NEEDINFO|RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #8 from Dave Gilbert  ---
(In reply to V Stuart Foote from comment #6)
> Sorry, it is a dupe of bug 33249 clear an simple. Filter functions needed to
> render PDF text spans back as Paragraph objects would be the same across all
> LO modules. 

The poppler import code does have an abstraction of which module it's
targeting,
so it _could_ do something different for writer than draw; however...

> 
> Comment 0 was opened against a Writer originated ODF document, but there is
> no distinction made in the export filter(s) (PDF has no "paragraph" object
> keeping text spans together as sentences, even words might be broken apart).
> And this *enhancement* is not about the LO Hybrid PDF that attaches the ODF
> source document into the PDF and selectively LO will open that attachment on
> import--bypassing the PDF facsimile. But that already functions as an export
> option.
> 
> For bug 32249 and bug 118370 Justin L. completed *one* reasonable approach
> working with the poppler -> cairo extracted sd text box objects from the PDF
> BT/ET spans, of "consolidating" a selection of the generated text boxes into
> a single text box object.
> 
> An alternative was proposed at
> https://bugs.documentfoundation.org/show_bug.cgi?id=32249#c19 of an process
> taking the extracted strings (still poppler -> cairo based) and reflowing
> that into lexically correct full sentences or full paragraph objects. And
> assembling those into as an ODF ready object available to style, spell
> check, etc. Focus would be less on the layout of the PDF and more on
> extracting a lexicographic correct representation of a page.
> 
> So, this bz issue could be that additional work. More fully scoped here. Or,
> we  could set back to the dupe it is as bug 33249 was left open after the
> work on bug 118370 but scope was not expanded to all PDF import filters. 
> 
> Added the devs with insight, for their opinions, but coin flip set it again
> as the dupe it is.

Yeh, the hard part is deciding how to assemble the chunks of text; once you
have those
spitting them out as a paragraph object for writer feels relatively easy.
There's some recent separate non-LO tools that try various heuristics for it
which look pretty neat, so while it's never going to be perfect, something
better should be doable.

Duping as suggested.

If you want to repeatedly edit through a PDF you create from LO, tick the
hybrid box - that's what it's for!

*** This bug has been marked as a duplicate of bug 32249 ***

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 151577] Writer PDF import filter should default to producing paragraphs of text, not drawing objects

2025-07-01 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151577

V Stuart Foote  changed:

   What|Removed |Added

Version|7.5.0.0 alpha0+ |6.4.0.3 release

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 151577] Writer PDF import filter should default to producing paragraphs of text, not drawing objects

2025-06-30 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151577

--- Comment #5 from Eyal Rozenberg  ---
(In reply to Heiko Tietze from comment #4)
> Assuming PDF is a document format that allows editing... but it isn't.

NO, there's no assuming that. We don't edit any format directly except for ODF.
The rest - we import, edit, and export.

Moreover, and perhaps more importantly - whoever said we were writing back to
the PDF? Assume we're going to be saving an ODF file.

> No UX aspect in file format questions.

Ok.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 151577] Writer PDF import filter should default to producing paragraphs of text, not drawing objects

2025-06-30 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151577

Heiko Tietze  changed:

   What|Removed |Added

   Keywords|needsUXEval |
 CC|libreoffice-ux-advise@lists |heiko.tietze@documentfounda
   |.freedesktop.org|tion.org

--- Comment #4 from Heiko Tietze  ---
Assuming PDF is a document format that allows editing... but it isn't.
No UX aspect in file format questions.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 151577] Writer PDF import filter should default to producing paragraphs of text, not drawing objects

2025-06-30 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151577

Eyal Rozenberg  changed:

   What|Removed |Added

   Keywords||needsUXEval
 CC||libreoffice-ux-advise@lists
   ||.freedesktop.org

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 151577] Writer PDF import filter should default to producing paragraphs of text, not drawing objects

2025-06-30 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151577

Eyal Rozenberg  changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|DUPLICATE   |---

--- Comment #3 from Eyal Rozenberg  ---
Considering this bug is about Writer, and 32249, I think this should not be a
dupe. Also, (re)constructing paragraphs is not only for the purpose of
ease-of-editing.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 151577] Writer PDF import filter should default to producing paragraphs of text, not drawing objects

2022-10-16 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151577

m.a.riosv  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED
 CC||[email protected]
   ||rg

--- Comment #2 from m.a.riosv  ---


*** This bug has been marked as a duplicate of bug 32249 ***

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 151577] Writer PDF import filter should default to producing paragraphs of text, not drawing objects

2022-10-16 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151577

Eyal Rozenberg  changed:

   What|Removed |Added

 Blocks||113123


Referenced Bugs:

https://bugs.documentfoundation.org/show_bug.cgi?id=113123
[Bug 113123] [META] PDF import filter in Writer
-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 151577] Writer PDF import filter should default to producing paragraphs of text, not drawing objects

2022-10-16 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151577

--- Comment #1 from Eyal Rozenberg  ---
Created attachment 183090
  --> https://bugs.documentfoundation.org/attachment.cgi?id=183090&action=edit
The original Writer document

Opening the PDF should result in a document that is very similar to this one
(the original document exported to PDF).

-- 
You are receiving this mail because:
You are the assignee for the bug.