[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 Fridolin Somers changed: What|Removed |Added CC||fridolin.som...@biblibre.co ||m --- Comment #30 from Fridolin Somers --- Depends on Bug 38416 not in 23.11.x -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 Alex Buckley changed: What|Removed |Added Version(s)|25.05.00,24.11.02 |25.05.00,24.11.02,24.05.07 released in|| Status|Pushed to stable|Pushed to oldstable CC||alexbuck...@catalyst.net.nz --- Comment #29 from Alex Buckley --- Backported to 24.05.x for 24.05.07 Note: The test plan of the first patch worked exactly as expected. We ran the t/db_dependent/Koha/SearchEngine/Elasticsearch. unit test before and after applying the second patch and it was successful both times. Please let us know if we should not backport the second patch to 24.05 -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 Simon Hohl changed: What|Removed |Added CC||simon.h...@dainst.org --- Comment #28 from Simon Hohl --- *** Bug 39104 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 David Cook changed: What|Removed |Added Blocks|39104 | See Also||https://bugs.koha-community ||.org/bugzilla3/show_bug.cgi ||?id=39104 Referenced Bugs: https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=39104 [Bug 39104] Elasticsearch indexing crashes with exception in catch block -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 Katrin Fischer changed: What|Removed |Added Blocks||39104 Referenced Bugs: https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=39104 [Bug 39104] Elasticsearch indexing crashes with exception in catch block -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 Paul Derscheid changed: What|Removed |Added CC||paul.dersch...@lmscloud.de Keywords|rel_24_11_candidate | -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 --- Comment #27 from Paul Derscheid --- Nice work everyone! Pushed to 24.11.x for 24.11.02 -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 Paul Derscheid changed: What|Removed |Added Version(s)|25.05.00|25.05.00,24.11.02 released in|| Status|Pushed to main |Pushed to stable -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 Tomás Cohen Arazi (tcohen) changed: What|Removed |Added Keywords||rel_24_05_candidate CC||tomasco...@gmail.com -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 Katrin Fischer changed: What|Removed |Added Version(s)||25.05.00 released in|| Status|Passed QA |Pushed to main -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 --- Comment #26 from Katrin Fischer --- Pushed for 25.05! Well done everyone, thank you! -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 Katrin Fischer changed: What|Removed |Added Keywords||rel_24_11_candidate Version|unspecified |Main -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 Michaela Sieber changed: What|Removed |Added CC||clemens.tub...@kit.edu, ||lukasz.kos...@kit.edu, ||michaela.sie...@kit.edu, ||raphael.str...@kit.edu -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 --- Comment #25 from Jonathan Druart --- (In reply to Katrin Fischer from comment #24) > I see there is still a lot of discussion gong on - is it ok to push these > patches as is and continue on another bug for remaining issues or should I > wait? You can push. -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 --- Comment #24 from Katrin Fischer --- I see there is still a lot of discussion gong on - is it ok to push these patches as is and continue on another bug for remaining issues or should I wait? -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 --- Comment #23 from Jonathan Druart --- (In reply to Janusz Kaczmarek from comment #22) > (In reply to Jonathan Druart from comment #20) > > > Well, every Koha::Item->store triggers $indexer->index_records, so no > > > wonder > > > -- we have 2508 952 fields in the test record :) > > > > Yes, the "interesting" was sarcastic, hence the "..." but that was not > > obvious, sorry. > > > > It's still a bug IMO. > > Especially with this: > > 718 $indexer->update_index( \@search_engine_record_ids, > > \@search_engine_records ) unless $skip_indexing; > > Does it mean that both in bulkmarcimport and in import staged records from > UI we should add bibliographic records with { skip_record_index => 1 } and > then add items with { skip_record_index => 1 }, and then, at the very end, > or after a certain number of records, or after each record, explicitly call: > > $indexer->index_records( $biblionumber(s), ...) ? > > Would it be a right way? Yes, see what we do in Koha::Items->batch_update. -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 --- Comment #22 from Janusz Kaczmarek --- (In reply to Jonathan Druart from comment #20) > > Well, every Koha::Item->store triggers $indexer->index_records, so no wonder > > -- we have 2508 952 fields in the test record :) > > Yes, the "interesting" was sarcastic, hence the "..." but that was not > obvious, sorry. > > It's still a bug IMO. > Especially with this: > 718 $indexer->update_index( \@search_engine_record_ids, > \@search_engine_records ) unless $skip_indexing; Does it mean that both in bulkmarcimport and in import staged records from UI we should add bibliographic records with { skip_record_index => 1 } and then add items with { skip_record_index => 1 }, and then, at the very end, or after a certain number of records, or after each record, explicitly call: $indexer->index_records( $biblionumber(s), ...) ? Would it be a right way? -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 --- Comment #21 from Martin Renvoize (ashimema) --- This all reminds me a little about: Bug 35104 - We should warn when attempting to save MARC records that contain characters invalid in XML Whilst it's not specifically about record length, it's meant to try and prevent bad data making it's way into Koha entirely. That said, it sounds like this isn't "bad" data so much as just data our MARC utilities don't deal with well. -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 --- Comment #20 from Jonathan Druart --- (In reply to Janusz Kaczmarek from comment #14) > (In reply to Jonathan Druart from comment #13) > > With this patch: > > "1 MARC records done in 81.9053399562836 seconds" > > > > However, I have delete all biblio and background_jobs before the import and > > now I have: > > > > MariaDB [koha_kohadev]> select count(*) from biblio\G > > count(*): 1 > > > > > > MariaDB [koha_kohadev]> select count(*) from background_jobs\G > > count(*): 2508 > > > > Interesting!... > > Well, every Koha::Item->store triggers $indexer->index_records, so no wonder > -- we have 2508 952 fields in the test record :) Yes, the "interesting" was sarcastic, hence the "..." but that was not obvious, sorry. It's still a bug IMO. Especially with this: 718 $indexer->update_index( \@search_engine_record_ids, \@search_engine_records ) unless $skip_indexing; -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 --- Comment #19 from David Cook --- (In reply to Janusz Kaczmarek from comment #18) > (In reply to David Cook from comment #16) > > > > I raised Bug 32638 a couple years ago. I'm sure there's a bunch of reports > > about the MARC import failing silently. > > At first glance, this seems to be a different (but somehow related) problem. > The cause of 32638 seems to lie elsewhere, not in the MARC transformation > itself. Am I right? Yeah, I just meant that the MARC import doesn't surface errors/failures. -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 --- Comment #18 from Janusz Kaczmarek --- (In reply to David Cook from comment #16) > > I raised Bug 32638 a couple years ago. I'm sure there's a bunch of reports > about the MARC import failing silently. At first glance, this seems to be a different (but somehow related) problem. The cause of 32638 seems to lie elsewhere, not in the MARC transformation itself. Am I right? -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 --- Comment #17 from David Cook --- (In reply to Jonathan Druart from comment #12) > This is clearly not enough (could go on a separate bugs). Yep. Step by step. -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 --- Comment #16 from David Cook --- (In reply to Jonathan Druart from comment #12) > No info on the problematic record! We should tell which record failed. I raised Bug 32638 a couple years ago. I'm sure there's a bunch of reports about the MARC import failing silently. Never been high enough priority to fix it. -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 --- Comment #15 from Janusz Kaczmarek --- I've created Bug 38933 for this stage/import from UI issue. -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 Janusz Kaczmarek changed: What|Removed |Added See Also||https://bugs.koha-community ||.org/bugzilla3/show_bug.cgi ||?id=38933 -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 --- Comment #14 from Janusz Kaczmarek --- (In reply to Jonathan Druart from comment #13) > With this patch: > "1 MARC records done in 81.9053399562836 seconds" > > However, I have delete all biblio and background_jobs before the import and > now I have: > > MariaDB [koha_kohadev]> select count(*) from biblio\G > count(*): 1 > > > MariaDB [koha_kohadev]> select count(*) from background_jobs\G > count(*): 2508 > > Interesting!... Well, every Koha::Item->store triggers $indexer->index_records, so no wonder -- we have 2508 952 fields in the test record :) > > 2. Stage the file and import using the UI > > Same with this patch. This is also more or less clear. In C4::ImportBatch::BatchCommitRecords called by the worker we call: my $marc_record = MARC::Record->new_from_usmarc($rowref->{'marc'}); despite of having also the marcxml representation of the record in the import_records table (import_records.marcxml). This is exactly the same issue that made David's patch die with this kind of records. Worker dies because if the uncaught die generated by new_from_usmarc. This has nothing to do with the patch (and with the previous David's patch) -- just another case of a call to a function that potentially dies without any eval / try. Now, if we create and save to import_records table both versions (iso2709 and marcxml) in C4::ImportBatch::_create_import_record, why not to use marcxml version in C4::ImportBatch::BatchCommitRecords instead of iso2709 which creates trouble in case of oversized records? After this little change it seems to work - I was able to import the huge test record with UI: diff --git a/C4/ImportBatch.pm b/C4/ImportBatch.pm index 5aebaafacf..799b69f0ca 100644 --- a/C4/ImportBatch.pm +++ b/C4/ImportBatch.pm @@ -531,7 +531,7 @@ sub BatchCommitRecords { my $item_tag; my $item_subfield; my $dbh = C4::Context->dbh; -my $sth = $dbh->prepare("SELECT import_records.import_record_id, record_type, status, overlay_status, marc, encoding +my $sth = $dbh->prepare("SELECT import_records.import_record_id, record_type, status, overlay_status, marc, marcxml, encoding FROM import_records LEFT JOIN import_auths ON (import_records.import_record_id=import_auths.import_record_id) LEFT JOIN import_biblios ON (import_records.import_record_id=import_biblios.import_record_id) @@ -568,7 +568,7 @@ sub BatchCommitRecords { } else { $marc_type = 'USMARC'; } -my $marc_record = MARC::Record->new_from_usmarc($rowref->{'marc'}); +my $marc_record = MARC::Record->new_from_xml($rowref->{'marcxml'}, $rowref->{'encoding'}); if ($record_type eq 'biblio') { # remove any item tags - rely on _batchCommitItems > > > 3. Now the record is in the DB, start a full reindex: > > > > % koha-elasticsearch --rebuild -b kohadev > > UTF-8 "\xC4" does not map to Unicode at /usr/share/perl5/MARC/File/Encode.pm > > line 35. > > Something went wrong rebuilding indexes for kohadev > > > > No info on the problematic record! We should tell which record failed. > > We don't have anything in the output, which is problematic IMO. Yes, this is problematic, because new_from_usmarc died and we didn't catch it. But now since we call it in eval we should be save with this. -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 --- Comment #13 from Jonathan Druart --- (In reply to Jonathan Druart from comment #12) Hey, this was test on bug 38713 not 38913, oops! So basically what I described was the behaviour in main. > This is clearly not enough (could go on a separate bugs). > > Testing this patch I have noticed several things: > 1. > $ ./misc/migration_tools/bulkmarcimport.pl -b -file test.xml -m=MARCXML > .UTF-8 "\xC4" does not map to Unicode at > /usr/share/perl5/MARC/File/Encode.pm line 35. > > Not really useful to guess where the error is, but we know it's in the file > so we can search in it easily With this patch: "1 MARC records done in 81.9053399562836 seconds" However, I have delete all biblio and background_jobs before the import and now I have: MariaDB [koha_kohadev]> select count(*) from biblio\G count(*): 1 MariaDB [koha_kohadev]> select count(*) from background_jobs\G count(*): 2508 Interesting!... > 2. Stage the file and import using the UI Same with this patch. > 3. Now the record is in the DB, start a full reindex: > > % koha-elasticsearch --rebuild -b kohadev > UTF-8 "\xC4" does not map to Unicode at /usr/share/perl5/MARC/File/Encode.pm > line 35. > Something went wrong rebuilding indexes for kohadev > > No info on the problematic record! We should tell which record failed. We don't have anything in the output, which is problematic IMO. -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 Jonathan Druart changed: What|Removed |Added CC||jonathan.dru...@gmail.com --- Comment #12 from Jonathan Druart --- This is clearly not enough (could go on a separate bugs). Testing this patch I have noticed several things: 1. $ ./misc/migration_tools/bulkmarcimport.pl -b -file test.xml -m=MARCXML .UTF-8 "\xC4" does not map to Unicode at /usr/share/perl5/MARC/File/Encode.pm line 35. Not really useful to guess where the error is, but we know it's in the file so we can search in it easily 2. Stage the file and import using the UI id: 2 status: failed progress: 0 size: 1 borrowernumber: 51 type: marc_import_commit_batch queue: long_tasks data: {"report":{"import_batch_id":"1","num_items_added":null,"num_ignored":null,"num_items_replaced":null,"num_items_errored":null,"num_added":null,"num_updated":null},"import_batch_id":"1","overlay_fra mework":null,"messages":[],"frameworkcode":""} context: {"firstname":null,"surname":"koha","flags":"1","number":"51","register_name":null,"emailaddress":null,"desk_id":null,"desk_name":null,"branchname":"Centerville","register_id":null,"shibboleth":"0", "cardnumber":"42","branch":"CPL","interface":"intranet","id":"koha"} enqueued_on: 2025-01-20 08:40:28 started_on: 2025-01-20 08:40:29 ended_on: 2025-01-20 08:40:29 No info on the failure. There is something in the log, but that should go in the report of the job ==> /var/log/koha/kohadev/worker-output.log <== Record length of 527856 is larger than the MARC spec allows (9 bytes). at /usr/share/perl5/MARC/File/USMARC.pm line 314. UTF-8 "\x85" does not map to Unicode at /usr/share/perl5/MARC/File/Encode.pm line 35. 3. Now the record is in the DB, start a full reindex: % koha-elasticsearch --rebuild -b kohadev UTF-8 "\xC4" does not map to Unicode at /usr/share/perl5/MARC/File/Encode.pm line 35. Something went wrong rebuilding indexes for kohadev No info on the problematic record! We should tell which record failed. -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
[Koha-bugs] [Bug 38913] Elasticsearch indexing explodes with some oversized records with UTF-8 characters
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38913 David Cook changed: What|Removed |Added Summary|Elasticsearch indexing |Elasticsearch indexing |explodes with oversized |explodes with some |records |oversized records with ||UTF-8 characters -- You are receiving this mail because: You are watching all bug changes. ___ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/