[frameworks-baloo] [Bug 506187] baloo_file_extractor crashes on attempting to index specific files and spams tray notifications

2025-07-14 Thread Stefan Brüns
https://bugs.kde.org/show_bug.cgi?id=506187

Stefan Brüns  changed:

   What|Removed |Added

 Resolution|--- |FIXED
  Latest Commit||https://invent.kde.org/fram
   ||eworks/baloo/-/commit/d97f3
   ||f832f31a89f5ca4ee058043003b
   ||c1474223
 Status|ASSIGNED|RESOLVED

--- Comment #17 from Stefan Brüns  ---
Git commit d97f3f832f31a89f5ca4ee058043003bc1474223 by Stefan Brüns.
Committed on 14/07/2025 at 12:13.
Pushed by bruns into branch 'master'.

[TermGenerator] Check input text validity

In case the supplied text contains invalid surrogates (i.e. single
low surrogates or without preceding high surrogate), the text is not
valid unicode. This can also cause QString::toUtf8() to return an
empty QByteArray.
Related: bug 506570

M  +43   -0autotests/unit/engine/termgeneratortest.cpp
M  +12   -3src/engine/termgenerator.cpp
A  +50   -0src/engine/termgenerator_p.h [License: LGPL(v2.1+)]

https://invent.kde.org/frameworks/baloo/-/commit/d97f3f832f31a89f5ca4ee058043003bc1474223

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 506187] baloo_file_extractor crashes on attempting to index specific files and spams tray notifications

2025-07-13 Thread Bug Janitor Service
https://bugs.kde.org/show_bug.cgi?id=506187

Bug Janitor Service  changed:

   What|Removed |Added

 Status|CONFIRMED   |ASSIGNED

--- Comment #16 from Bug Janitor Service  ---
A possibly relevant merge request was started @
https://invent.kde.org/frameworks/baloo/-/merge_requests/241

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 506187] baloo_file_extractor crashes on attempting to index specific files and spams tray notifications

2025-07-07 Thread Stefan Brüns
https://bugs.kde.org/show_bug.cgi?id=506187

--- Comment #15 from Stefan Brüns  ---
Git commit 9fa1aaaf4a841224161e791cb8ffd366485dc7e3 by Stefan Brüns.
Committed on 06/07/2025 at 18:16.
Pushed by bruns into branch 'master'.

[PlaintextExtractor] Fix various issues with UTF-16

Read the file in binary mode, feed the complete data into QStringDecoder
with the detected encoding, and split the lines last.

Opening a file with open mode "QIODevice::Text" mangles Carriage Return
sequences, and the UTF16-LE sequence "\r\0\n\0" ends up as "\0\n\0", i.e.
an invalid sequence.

QIODevice::readline() only supports 8 bit encodings (see QTBUG 121812),
and the fixup attempts here were not working in general.

Unfortunately, QTextStream::setEncoding only supports UTF encodings,
but none of the legacy ISO-8859 or Windows encodings or e.g. GB18030.

M  +0-2autotests/indexerextractortests.cpp
M  +53   -25   src/extractors/plaintextextractor.cpp

https://invent.kde.org/frameworks/kfilemetadata/-/commit/9fa1aaaf4a841224161e791cb8ffd366485dc7e3

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 506187] baloo_file_extractor crashes on attempting to index specific files and spams tray notifications

2025-07-07 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=506187

--- Comment #14 from [email protected] ---
There's a fix for the UTF-16 issue here:
https://invent.kde.org/frameworks/kfilemetadata/-/merge_requests/193
Thank you Stefan!

That's just landed on Neon Unstable. I don't know how long "due course" is but
if it's on Neon Unstable it will arrive on Neon User in "due course" :-)

This doesn't address Bug 506570, a binary file that says it's UTF-32, that
seems a different issue

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 506187] baloo_file_extractor crashes on attempting to index specific files and spams tray notifications

2025-07-06 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=506187

[email protected] changed:

   What|Removed |Added

 CC||[email protected]

--- Comment #13 from [email protected] ---
*** Bug 506570 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 506187] baloo_file_extractor crashes on attempting to index specific files and spams tray notifications

2025-07-06 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=506187

[email protected] changed:

   What|Removed |Added

 CC||[email protected]

--- Comment #12 from [email protected] ---
*** Bug 506608 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 506187] baloo_file_extractor crashes on attempting to index specific files and spams tray notifications

2025-07-06 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=506187

--- Comment #11 from [email protected] ---
*** Bug 506598 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 506187] baloo_file_extractor crashes on attempting to index specific files and spams tray notifications

2025-07-05 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=506187

--- Comment #10 from [email protected] ---
(In reply to Stefan Brüns from comment #9)
> This seems to be a cascade of bugs/implementation errors, finally triggering
> the assert.
I find a couple of things confusing:

*  This has suddenly started happening, with several very similar bugs. 
*  All appear on Neon User, the test case we have works on Neon Testing and
Unstable.

> - The KFileMetaData plaintext extractor uses QIODevice::readline, although
> this is not supported for 16bit encodings (see
> https://bugreports.qt.io/browse/QTBUG-121812)
> - The split code returns a term QString which only contains invalid unicode
> code points
> - QString::toUtf8() returns an empty QByteArray
That would explain why if you convert the file to UTF-8 with iconv, Baloo is
happy

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 506187] baloo_file_extractor crashes on attempting to index specific files and spams tray notifications

2025-07-05 Thread Stefan Brüns
https://bugs.kde.org/show_bug.cgi?id=506187

Stefan Brüns  changed:

   What|Removed |Added

 CC||[email protected]
   ||e

--- Comment #9 from Stefan Brüns  ---
This seems to be a cascade of bugs/implementation errors, finally triggering
the assert.

- The KFileMetaData plaintext extractor uses QIODevice::readline, although this
is not supported for 16bit encodings (see
https://bugreports.qt.io/browse/QTBUG-121812)
- The split code returns a term QString which only contains invalid unicode
code points
- QString::toUtf8() returns an empty QByteArray

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 506187] baloo_file_extractor crashes on attempting to index specific files and spams tray notifications

2025-07-04 Thread toni_rocha
https://bugs.kde.org/show_bug.cgi?id=506187

toni_rocha  changed:

   What|Removed |Added

 CC||[email protected]

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 506187] baloo_file_extractor crashes on attempting to index specific files and spams tray notifications

2025-07-04 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=506187

[email protected] changed:

   What|Removed |Added

 CC||[email protected]

--- Comment #8 from [email protected] ---
*** Bug 506516 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 506187] baloo_file_extractor crashes on attempting to index specific files and spams tray notifications

2025-06-30 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=506187

--- Comment #7 from [email protected] ---
To tidy the UTF-16 loose end, converting the file from UTF-16 to UTF-8 with

$  iconv -f UTF-16 -t UTF-8 home.csv > home2.csv

Baloo can read and index it.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 506187] baloo_file_extractor crashes on attempting to index specific files and spams tray notifications

2025-06-29 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=506187

[email protected] changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|REPORTED|CONFIRMED

--- Comment #6 from [email protected] ---
(In reply to Garirry from comment #5)
> No I don't use CJK generally. Although speaking of that, I do know that many
> if not all of the files that are affected, if opened in an editor like
> KWrite display CJK characters, and it detects an encoding of UTF-16.
If I look at your uploaded "home.csv" (in Libreoffice Calc) it looks like a set
of translations - 10 languages that include Japanese and Chinese scripts (plus
English, German, French etc, etc and etc)

(In reply to tagwerk19 from comment #4)
> I don't get a crash 
... I've just tried a clean install of Neon User. I now see a crash.

> ... a completely different set of plain text terms on a Neon User (dodgy) ...
Best discard that result, it was on a system with a custom locale (it had
LC_TIME=en_SE.UTF-8 to get ISO format short dates - maybe that's too wierd...)

So, I can flag "Confirmed" but don't really know where it goes from here (on
the basis that I don't get the crash on Neon Unstable or Neon testing).
Summarising what I see...

Neon User
Plasma: 6.4.1
Frameworks: 6.15.0
Qt: 6.9.0
Wayland
Crashes

Neon Testing:
Plasma: 6.4.1
Frameworks: 6.16.0
Qt: 6.9.0
Wayland
Seems OK

Neon Unstable:
Plasma: 6.4.80
Frameworks: 6.16.0
Qt: 6.9.0
Wayland
Seems OK

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 506187] baloo_file_extractor crashes on attempting to index specific files and spams tray notifications

2025-06-28 Thread Garirry
https://bugs.kde.org/show_bug.cgi?id=506187

--- Comment #5 from Garirry  ---
(In reply to tagwerk19 from comment #4)
> I'm afraid don't really have an idea here... You are using CJK - Chinese? I
> apologise for not being familiar.
No I don't use CJK generally. Although speaking of that, I do know that many if
not all of the files that are affected, if opened in an editor like KWrite
display CJK characters, and it detects an encoding of UTF-16. If that would
help I could upload more file samples.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 506187] baloo_file_extractor crashes on attempting to index specific files and spams tray notifications

2025-06-28 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=506187

--- Comment #4 from [email protected] ---
(In reply to Garirry from comment #3)
> The files which cause the crash do so consistently, if I don't exclude them
> then baloo scans them again on system boot and crashes for each file.
I'm afraid don't really have an idea here... You are using CJK - Chinese? I
apologise for not being familiar.

I don't get a crash but I do find that if I index the file and check with
"balooshow6 -x home.csv", I get a completely different set of plain text terms
on a Neon User (dodgy) compared to a Neon Unstable (more sensible)

As a marker, we've also had a recent Bug 505968 where there is some strange
behaviour with CJK. https://bugs.kde.org/show_bug.cgi?id=505968#c2

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 506187] baloo_file_extractor crashes on attempting to index specific files and spams tray notifications

2025-06-26 Thread Garirry
https://bugs.kde.org/show_bug.cgi?id=506187

--- Comment #3 from Garirry  ---
(In reply to tagwerk19 from comment #2)
> Does the same happen if you have the file in a folder of its own and just
> index that folder? (You can close down baloo and rename the
> .local/share/baloo/index file to keep it save)

Yes, the exact same error occurs. 

The files which cause the crash do so consistently, if I don't exclude them
then baloo scans them again on system boot and crashes for each file.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 506187] baloo_file_extractor crashes on attempting to index specific files and spams tray notifications

2025-06-26 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=506187

[email protected] changed:

   What|Removed |Added

 CC||[email protected]

--- Comment #2 from [email protected] ---
(In reply to Garirry from comment #0)
>  #13 0x750c0a24e1f9 n/a (kfilemetadata_plaintextextractor.so + 0x31f9)
If I run the same file on a more-or-less scratch system (Neon Unstable), I see
Baloo deciding to use the plain text extractor (using inherited mimetype...)
and then successfully indexing the file. It is an empty index though.

> ASSERT: "!term.isEmpty()" in file ./src/engine/document.cpp, line 23
OK... that's fairly clear

Does the same happen if you have the file in a folder of its own and just index
that folder? (You can close down baloo and rename the .local/share/baloo/index
file to keep it save)

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 506187] baloo_file_extractor crashes on attempting to index specific files and spams tray notifications

2025-06-25 Thread Garirry
https://bugs.kde.org/show_bug.cgi?id=506187

--- Comment #1 from Garirry  ---
After further rebuilding the entire index, I can now add that those .csv files
are not specifically the culprit, as there are many more that cause the exact
same type of crash. The only thing that they have in common is that they all
have text encoded as UTF-16.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 506187] baloo_file_extractor crashes on attempting to index specific files and spams tray notifications

2025-06-25 Thread Garirry
https://bugs.kde.org/show_bug.cgi?id=506187

Garirry  changed:

   What|Removed |Added

Summary|baloo_file_extractor|baloo_file_extractor
   |crashes on attempting to|crashes on attempting to
   |index specific CSV files|index specific files and
   |and spams tray  |spams tray notifications
   |notifications   |

-- 
You are receiving this mail because:
You are watching all bug changes.