[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-02-21 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

Fridolin Somers  changed:

   What|Removed |Added

 CC||fridolin.som...@biblibre.co
   ||m

--- Comment #30 from Fridolin Somers  ---
Depends on Bug 38416 not in 23.11.x

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-02-19 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

Alex Buckley  changed:

   What|Removed |Added

 Version(s)|25.05.00,24.11.02   |25.05.00,24.11.02,24.05.07
released in||
 Status|Pushed to stable|Pushed to oldstable
 CC||alexbuck...@catalyst.net.nz

--- Comment #29 from Alex Buckley  ---
Backported to 24.05.x for 24.05.07

Note: The test plan of the first patch worked exactly as expected.

We ran the t/db_dependent/Koha/SearchEngine/Elasticsearch. unit test before and
after applying the second patch and it was successful both times.

Please let us know if we should not backport the second patch to 24.05

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-02-13 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

Simon Hohl  changed:

   What|Removed |Added

 CC||simon.h...@dainst.org

--- Comment #28 from Simon Hohl  ---
*** Bug 39104 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-02-12 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

David Cook  changed:

   What|Removed |Added

 Blocks|39104   |
   See Also||https://bugs.koha-community
   ||.org/bugzilla3/show_bug.cgi
   ||?id=39104


Referenced Bugs:

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=39104
[Bug 39104] Elasticsearch indexing crashes with exception in catch block
-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-02-12 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

Katrin Fischer  changed:

   What|Removed |Added

 Blocks||39104


Referenced Bugs:

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=39104
[Bug 39104] Elasticsearch indexing crashes with exception in catch block
-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-02-07 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

Paul Derscheid  changed:

   What|Removed |Added

 CC||paul.dersch...@lmscloud.de
   Keywords|rel_24_11_candidate |

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-02-04 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

--- Comment #27 from Paul Derscheid  ---
Nice work everyone!

Pushed to 24.11.x for 24.11.02

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-02-04 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

Paul Derscheid  changed:

   What|Removed |Added

 Version(s)|25.05.00|25.05.00,24.11.02
released in||
 Status|Pushed to main  |Pushed to stable

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-02-04 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

Tomás Cohen Arazi (tcohen)  changed:

   What|Removed |Added

   Keywords||rel_24_05_candidate
 CC||tomasco...@gmail.com

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-01-24 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

Katrin Fischer  changed:

   What|Removed |Added

 Version(s)||25.05.00
released in||
 Status|Passed QA   |Pushed to main

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-01-24 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

--- Comment #26 from Katrin Fischer  ---
Pushed for 25.05!

Well done everyone, thank you!

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-01-24 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

Katrin Fischer  changed:

   What|Removed |Added

   Keywords||rel_24_11_candidate
Version|unspecified |Main

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-01-23 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

Michaela Sieber  changed:

   What|Removed |Added

 CC||clemens.tub...@kit.edu,
   ||lukasz.kos...@kit.edu,
   ||michaela.sie...@kit.edu,
   ||raphael.str...@kit.edu

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-01-21 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

--- Comment #25 from Jonathan Druart  ---
(In reply to Katrin Fischer from comment #24)
> I see there is still a lot of discussion gong on - is it ok to push these
> patches as is and continue on another bug for remaining issues or should I
> wait?

You can push.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-01-21 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

--- Comment #24 from Katrin Fischer  ---
I see there is still a lot of discussion gong on - is it ok to push these
patches as is and continue on another bug for remaining issues or should I
wait?

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-01-21 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

--- Comment #23 from Jonathan Druart  ---
(In reply to Janusz Kaczmarek from comment #22)
> (In reply to Jonathan Druart from comment #20)
> > > Well, every Koha::Item->store triggers $indexer->index_records, so no 
> > > wonder
> > > -- we have 2508 952 fields in the test record :)
> > 
> > Yes, the "interesting" was sarcastic, hence the "..." but that was not
> > obvious, sorry.
> > 
> > It's still a bug IMO.
> > Especially with this:
> > 718 $indexer->update_index( \@search_engine_record_ids,
> > \@search_engine_records ) unless $skip_indexing;
> 
> Does it mean that both in bulkmarcimport and in import staged records from
> UI we should add bibliographic records with { skip_record_index => 1 } and
> then add items with { skip_record_index => 1 }, and then, at the very end,
> or after a certain number of records, or after each record, explicitly call: 
> 
> $indexer->index_records( $biblionumber(s), ...) ? 
> 
> Would it be a right way?

Yes, see what we do in Koha::Items->batch_update.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-01-21 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

--- Comment #22 from Janusz Kaczmarek  ---
(In reply to Jonathan Druart from comment #20)
> > Well, every Koha::Item->store triggers $indexer->index_records, so no wonder
> > -- we have 2508 952 fields in the test record :)
> 
> Yes, the "interesting" was sarcastic, hence the "..." but that was not
> obvious, sorry.
> 
> It's still a bug IMO.
> Especially with this:
> 718 $indexer->update_index( \@search_engine_record_ids,
> \@search_engine_records ) unless $skip_indexing;

Does it mean that both in bulkmarcimport and in import staged records from UI
we should add bibliographic records with { skip_record_index => 1 } and then
add items with { skip_record_index => 1 }, and then, at the very end, or after
a certain number of records, or after each record, explicitly call: 

$indexer->index_records( $biblionumber(s), ...) ? 

Would it be a right way?

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-01-20 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

--- Comment #21 from Martin Renvoize (ashimema) 
 ---
This all reminds me a little about:

Bug 35104 - We should warn when attempting to save MARC records that contain
characters invalid in XML

Whilst it's not specifically about record length, it's meant to try and prevent
bad data making it's way into Koha entirely.  That said, it sounds like this
isn't "bad" data so much as just data our MARC utilities don't deal with well.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-01-20 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

--- Comment #20 from Jonathan Druart  ---
(In reply to Janusz Kaczmarek from comment #14)
> (In reply to Jonathan Druart from comment #13)
> > With this patch:
> > "1 MARC records done in 81.9053399562836 seconds"
> > 
> > However, I have delete all biblio and background_jobs before the import and
> > now I have:
> > 
> > MariaDB [koha_kohadev]> select count(*) from biblio\G
> > count(*): 1
> > 
> > 
> > MariaDB [koha_kohadev]> select count(*) from background_jobs\G
> > count(*): 2508
> > 
> > Interesting!...
> 
> Well, every Koha::Item->store triggers $indexer->index_records, so no wonder
> -- we have 2508 952 fields in the test record :)

Yes, the "interesting" was sarcastic, hence the "..." but that was not obvious,
sorry.

It's still a bug IMO.
Especially with this:
718 $indexer->update_index( \@search_engine_record_ids,
\@search_engine_records ) unless $skip_indexing;

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-01-20 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

--- Comment #19 from David Cook  ---
(In reply to Janusz Kaczmarek from comment #18)
> (In reply to David Cook from comment #16)
> > 
> > I raised Bug 32638 a couple years ago. I'm sure there's a bunch of reports
> > about the MARC import failing silently. 
> 
> At first glance, this seems to be a different (but somehow related) problem.
> The cause of 32638 seems to lie elsewhere, not in the MARC transformation
> itself.  Am I right?

Yeah, I just meant that the MARC import doesn't surface errors/failures.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-01-20 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

--- Comment #18 from Janusz Kaczmarek  ---
(In reply to David Cook from comment #16)
> 
> I raised Bug 32638 a couple years ago. I'm sure there's a bunch of reports
> about the MARC import failing silently. 

At first glance, this seems to be a different (but somehow related) problem. 
The cause of 32638 seems to lie elsewhere, not in the MARC transformation
itself.  Am I right?

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-01-20 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

--- Comment #17 from David Cook  ---
(In reply to Jonathan Druart from comment #12)
> This is clearly not enough (could go on a separate bugs).

Yep. Step by step.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-01-20 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

--- Comment #16 from David Cook  ---
(In reply to Jonathan Druart from comment #12)
> No info on the problematic record! We should tell which record failed.

I raised Bug 32638 a couple years ago. I'm sure there's a bunch of reports
about the MARC import failing silently. 

Never been high enough priority to fix it.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-01-20 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

--- Comment #15 from Janusz Kaczmarek  ---
I've created Bug 38933 for this stage/import from UI issue.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-01-20 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

Janusz Kaczmarek  changed:

   What|Removed |Added

   See Also||https://bugs.koha-community
   ||.org/bugzilla3/show_bug.cgi
   ||?id=38933

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-01-20 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

--- Comment #14 from Janusz Kaczmarek  ---
(In reply to Jonathan Druart from comment #13)
> With this patch:
> "1 MARC records done in 81.9053399562836 seconds"
> 
> However, I have delete all biblio and background_jobs before the import and
> now I have:
> 
> MariaDB [koha_kohadev]> select count(*) from biblio\G
> count(*): 1
> 
> 
> MariaDB [koha_kohadev]> select count(*) from background_jobs\G
> count(*): 2508
> 
> Interesting!...

Well, every Koha::Item->store triggers $indexer->index_records, so no wonder --
we have 2508 952 fields in the test record :)

> > 2. Stage the file and import using the UI
> 
> Same with this patch.

This is also more or less clear.  In C4::ImportBatch::BatchCommitRecords called
by the worker we call: 

my $marc_record = MARC::Record->new_from_usmarc($rowref->{'marc'});

despite of having also the marcxml representation of the record in the
import_records table (import_records.marcxml). 

This is exactly the same issue that made David's patch die with this kind of
records.  Worker dies because if the uncaught die generated by new_from_usmarc.
This has nothing to do with the patch (and with the previous David's patch) --
just another case of a call to a function that potentially dies without any
eval / try. 


Now, if we create and save to import_records table both versions (iso2709 and
marcxml) in C4::ImportBatch::_create_import_record, why not to use marcxml
version in C4::ImportBatch::BatchCommitRecords instead of iso2709 which creates
trouble in case of oversized records?

After this little change it seems to work - I was able to import the huge test
record with UI:

diff --git a/C4/ImportBatch.pm b/C4/ImportBatch.pm
index 5aebaafacf..799b69f0ca 100644
--- a/C4/ImportBatch.pm
+++ b/C4/ImportBatch.pm
@@ -531,7 +531,7 @@ sub BatchCommitRecords {
 my $item_tag;
 my $item_subfield;
 my $dbh = C4::Context->dbh;
-my $sth = $dbh->prepare("SELECT import_records.import_record_id,
record_type, status, overlay_status, marc, encoding
+my $sth = $dbh->prepare("SELECT import_records.import_record_id,
record_type, status, overlay_status, marc, marcxml, encoding
  FROM import_records
  LEFT JOIN import_auths ON
(import_records.import_record_id=import_auths.import_record_id)
  LEFT JOIN import_biblios ON
(import_records.import_record_id=import_biblios.import_record_id)
@@ -568,7 +568,7 @@ sub BatchCommitRecords {
 } else {
 $marc_type = 'USMARC';
 }
-my $marc_record = MARC::Record->new_from_usmarc($rowref->{'marc'});
+my $marc_record = MARC::Record->new_from_xml($rowref->{'marcxml'},
$rowref->{'encoding'});

 if ($record_type eq 'biblio') {
 # remove any item tags - rely on _batchCommitItems


> 
> > 3. Now the record is in the DB, start a full reindex:
> > 
> > % koha-elasticsearch --rebuild -b  kohadev
> > UTF-8 "\xC4" does not map to Unicode at /usr/share/perl5/MARC/File/Encode.pm
> > line 35.
> > Something went wrong rebuilding indexes for kohadev
> > 
> > No info on the problematic record! We should tell which record failed.
> 
> We don't have anything in the output, which is problematic IMO.

Yes, this is problematic, because new_from_usmarc died and we didn't catch it. 
But now since we call it in eval we should be save with this.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-01-20 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

--- Comment #13 from Jonathan Druart  ---
(In reply to Jonathan Druart from comment #12)

Hey, this was test on bug 38713 not 38913, oops!
So basically what I described was the behaviour in main.

> This is clearly not enough (could go on a separate bugs).
> 
> Testing this patch I have noticed several things:
> 1. 
> $ ./misc/migration_tools/bulkmarcimport.pl -b -file test.xml -m=MARCXML
> .UTF-8 "\xC4" does not map to Unicode at
> /usr/share/perl5/MARC/File/Encode.pm line 35.
> 
> Not really useful to guess where the error is, but we know it's in the file
> so we can search in it easily

With this patch:
"1 MARC records done in 81.9053399562836 seconds"

However, I have delete all biblio and background_jobs before the import and now
I have:

MariaDB [koha_kohadev]> select count(*) from biblio\G
count(*): 1


MariaDB [koha_kohadev]> select count(*) from background_jobs\G
count(*): 2508

Interesting!...


> 2. Stage the file and import using the UI

Same with this patch.

> 3. Now the record is in the DB, start a full reindex:
> 
> % koha-elasticsearch --rebuild -b  kohadev
> UTF-8 "\xC4" does not map to Unicode at /usr/share/perl5/MARC/File/Encode.pm
> line 35.
> Something went wrong rebuilding indexes for kohadev
> 
> No info on the problematic record! We should tell which record failed.

We don't have anything in the output, which is problematic IMO.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-01-20 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

Jonathan Druart  changed:

   What|Removed |Added

 CC||jonathan.dru...@gmail.com

--- Comment #12 from Jonathan Druart  ---
This is clearly not enough (could go on a separate bugs).

Testing this patch I have noticed several things:
1. 
$ ./misc/migration_tools/bulkmarcimport.pl -b -file test.xml -m=MARCXML
.UTF-8 "\xC4" does not map to Unicode at /usr/share/perl5/MARC/File/Encode.pm
line 35.

Not really useful to guess where the error is, but we know it's in the file so
we can search in it easily

2. Stage the file and import using the UI

id: 2   
status: failed  
  progress: 0   
  size: 1   
borrowernumber: 51 
  type: marc_import_commit_batch
 queue: long_tasks  
  data:
{"report":{"import_batch_id":"1","num_items_added":null,"num_ignored":null,"num_items_replaced":null,"num_items_errored":null,"num_added":null,"num_updated":null},"import_batch_id":"1","overlay_fra
mework":null,"messages":[],"frameworkcode":""}
   context:
{"firstname":null,"surname":"koha","flags":"1","number":"51","register_name":null,"emailaddress":null,"desk_id":null,"desk_name":null,"branchname":"Centerville","register_id":null,"shibboleth":"0",
"cardnumber":"42","branch":"CPL","interface":"intranet","id":"koha"}
   enqueued_on: 2025-01-20 08:40:28 
started_on: 2025-01-20 08:40:29 
  ended_on: 2025-01-20 08:40:29   

No info on the failure.

There is something in the log, but that should go in the report of the job

==> /var/log/koha/kohadev/worker-output.log <== 
Record length of 527856 is larger than the MARC spec allows (9 bytes). at
/usr/share/perl5/MARC/File/USMARC.pm line 314.
UTF-8 "\x85" does not map to Unicode at /usr/share/perl5/MARC/File/Encode.pm
line 35.

3. Now the record is in the DB, start a full reindex:

% koha-elasticsearch --rebuild -b  kohadev
UTF-8 "\xC4" does not map to Unicode at /usr/share/perl5/MARC/File/Encode.pm
line 35.
Something went wrong rebuilding indexes for kohadev

No info on the problematic record! We should tell which record failed.

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/


[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters

2025-01-19 Thread bugzilla-daemon--- via Koha-bugs
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913

David Cook  changed:

   What|Removed |Added

Summary|Elasticsearch indexing  |Elasticsearch indexing
   |explodes with oversized |explodes with some
   |records |oversized records with
   ||UTF-8 characters

-- 
You are receiving this mail because:
You are watching all bug changes.
___
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/