Felix Paul Kühne pushed to branch 3.0.x at VideoLAN / VLC
Commits:
76ac9b21 by Lyndon Brown at 2026-01-23T06:07:27+01:00
demux/subtitle: clarify comment
this does not just apply to srt files, it is done for all formats handled
by this demuxer. it also needed additional clarification.
(cherry picked from commit 550cd9529945a7fbe5afaedd50cb9247212d4152)
- - - - -
7006ffef by Lyndon Brown at 2026-01-23T06:07:27+01:00
demux/subtitle: remove pointless statement
we find the period for the extension and set it to null such that we can
then find the second to last period that should preceed the substring we
want to extract. there is absolutely no point in restoring the period
afterwards in the working copy which we are then just about to destroy.
the comment did not even make any sense.
(cherry picked from commit eb07c5f2f80ade696cc2d7bed13a31249d2ca4f6)
- - - - -
d7605d2e by Lyndon Brown at 2026-01-23T06:07:27+01:00
demux/subtitle: add missing alloc check
(cherry picked from commit d582bf9835eeb9236bc7084bb57a29a025dbb17b)
- - - - -
a25f04ba by Lyndon Brown at 2026-01-23T06:07:27+01:00
demux/subtitle: minor reorganisation of filename language extraction
(non-functional)
prepares for handling an alternate common pattern; removes unnecessary
variable (we can reuse the `psz_tmp` var now instead of also having
`psz_language_begin`); better readability.
(cherry picked from commit 04aed075ce0b5bfe6de2b8dd6b2ca09338ddef7d)
- - - - -
893a483b by Lyndon Brown at 2026-01-23T06:07:27+01:00
demux/subtitle: use only filename for filename language substr extraction
...and thus prevent some ugly failures.
(the string given to the function is the full filepath not just the filename).
the attempt to determine the language from subtitle filenames is based upon
a single common pattern - PATH/filename.LANG.ext. whilst it works just fine
for this pattern, it is not the only pattern commonly used, for instance
PATH/Subs/x_LANG.ext (where 'x' is an integer).
in such cases where the period for the extension is the only one in the
filename, the function could produce an ugly result should any directory in
the path happen to contain a period (if not, NULL would be returned). it
would incorrectly capture a chunk of the path as part of the substring
extraction, producing results like "FOOBAR/Subs/1_English" (or worse) which
then end up as the language name displayed under the subtitle menu and
elsewhere.
this commit strips the string processed down to filename only and thus
prevents such ugliness. the next commit will introduce proper handling for
the just mentioned alternate common pattern.
(cherry picked from commit 0984f11c58865af87c2b73676f12120702af25b3)
- - - - -
d6afaa90 by Lyndon Brown at 2026-01-23T06:07:27+01:00
demux/subtitle: handle PATH/Subs/1_English.srt type filename lang extraction
the only pattern handled was PATH/filename.LANG.ext. another common one is
PATH/Subs/x_LANG.ext which this adds handling for.
this simply replies upon falling back to trying to get the substring after
the last underscore if trying to get the substring after a period fails.
we do not explicitly require the second pattern to only occur in files
found under a 'Subs' subdir, since it is not certain that there is value in
implementing such a restriction.
(cherry picked from commit 0dd411061ba2494b072a5f18cc65e643b479948e)
- - - - -
368320b6 by Lyndon Brown at 2026-01-23T06:07:27+01:00
demux/subtitle: prepare for lang detection via codec properties
at least one subtitle format handled by this demuxer may hold the language
as a property specified within the file. we should allow the parser to
extract and use that as an alternative to the filename based substring
extraction. this sets things up to allow the parser functions to provide
that extracted property string.
(adapted from commit 58ce4b9646de0c10097be300c8ff334e86bef61c)
- - - - -
cbc5fc61 by Lyndon Brown at 2026-01-23T06:07:27+01:00
demux/subtitles: clarify debug message
the substring obtained from filename extraction in some cases is perfect
but in other cases may not be a language at all, just some portion of the
filename. stating 'detected language FOO' is a bit odd if it turns out to
not actually be a language name that we've extracted. let's fix that by
clarifying what we've actually retrieved, and thus distinguish the less
reliable filename extraction result from the likely more reliable property
available in some subtitle files.
also, enclose in quotes in both cases. for the filename based case since
this simply makes sense. in the property case, since this may be a language
code, as it is for ASS/SSA.
(adapted from commit 17b2c064fafffb34f10977bbeb4a1347a87d1da7)
- - - - -
a2900eeb by Lyndon Brown at 2026-01-23T06:07:27+01:00
demux/subtitles: capture language attribute from SSA/ASS files
... for language identification.
this info property has been supported by libass since v0.10.0. it is currently
a 2-char iso-639-1 code.
libass commit adding support:
https://github.com/libass/libass/commit/c979365946b2dc2499ede862b6f7da15f9bc0ed1
discussion about enhancing the attribute to support 3-char iso-639-2 codes,
possibly bcp-47: https://github.com/libass/libass/issues/404
(cherry picked from commit 49c7098e8070a3be857d028ec8d95930100daf80)
- - - - -
d4c431d4 by Lyndon Brown at 2026-01-23T06:07:27+01:00
demux/subtitles: avoid unnecessary allocations for SSA/ASS
we only need to allocate the `psz_text` buffer when handling `Dialogue`
and `Language` lines. restricting allocation to lines beginning with 'D'
or 'L' is a simple way of avoiding most/all that are unnecessary.
(cherry picked from commit d7d8cff680540cdf9fbdb38276ae7c7e99035860)
- - - - -
1 changed file:
- modules/demux/subtitle.c
Changes:
=====================================
modules/demux/subtitle.c
=====================================
@@ -143,6 +143,7 @@ typedef struct
int64_t i_microsecperframe;
char *psz_header; /* SSA */
+ char *psz_lang;
struct
{
@@ -344,6 +345,7 @@ static int Open ( vlc_object_t *p_this )
p_sys->subtitles.p_array = NULL;
p_sys->props.psz_header = NULL;
+ p_sys->props.psz_lang = NULL;
p_sys->props.i_microsecperframe = 40000;
p_sys->props.jss.b_inited = false;
p_sys->props.mpsub.b_inited = false;
@@ -720,15 +722,20 @@ static int Open ( vlc_object_t *p_this )
else
es_format_Init( &fmt, SPU_ES, VLC_CODEC_SUBT );
- /* Stupid language detection in the filename */
- char * psz_language = get_language_from_filename( p_demux->psz_file );
-
- if( psz_language )
+ if( p_sys->props.psz_lang )
{
- fmt.psz_language = psz_language;
- msg_Dbg( p_demux, "detected language %s of subtitle: %s", psz_language,
+ fmt.psz_language = p_sys->props.psz_lang;
+ p_sys->props.psz_lang = NULL;
+ msg_Dbg( p_demux, "detected language '%s' of subtitle: %s",
fmt.psz_language,
p_demux->psz_location );
}
+ else
+ {
+ fmt.psz_language = get_language_from_filename( p_demux->psz_file );
+ if( fmt.psz_language )
+ msg_Dbg( p_demux, "selected '%s' as possible filename language
substring of subtitle: %s",
+ fmt.psz_language, p_demux->psz_location );
+ }
char *psz_description = var_InheritString( p_demux, "sub-description" );
if( psz_description && *psz_description )
@@ -1259,12 +1266,25 @@ static int ParseSSA( vlc_object_t *p_obj,
subs_properties_t *p_props,
* Dialogue: Layer#,0:02:40.65,0:02:41.79,Wolf
main,Cher,0000,0000,0000,,Et les enregistrements de ses ondes delta ?
*/
- /* The output text is - at least, not removing numbers - 18 chars
shorter than the input text. */
- psz_text = malloc( strlen(s) );
- if( !psz_text )
- return VLC_ENOMEM;
+ psz_text = NULL;
+ if( s[0] == 'D' || s[0] == 'L' )
+ {
+ /* The output text is always shorter than the input text. */
+ psz_text = malloc( strlen(s) );
+ if( !psz_text )
+ return VLC_ENOMEM;
+ }
- if( sscanf( s,
+ /* Try to capture the language property */
+ if( s[0] == 'L' &&
+ sscanf( s, "Language: %[^\r\n]", psz_text ) == 1 )
+ {
+ free( p_props->psz_lang ); /* just in case of multiple instances */
+ p_props->psz_lang = psz_text;
+ psz_text = NULL;
+ }
+ else if( s[0] == 'D' &&
+ sscanf( s,
"Dialogue: %15[^,],%d:%d:%d.%d,%d:%d:%d.%d,%[^\r\n]",
temp,
&h1, &m1, &s1, &c1,
@@ -2465,24 +2485,37 @@ static int ParseSCC( vlc_object_t *p_obj,
subs_properties_t *p_props,
return VLC_SUCCESS;
}
-/* Matches filename.xx.srt */
+/* Tries to extract language from common filename patterns
PATH/filename.LANG.ext
+ and PATH/Subs/x_LANG.ext (where 'x' is an integer). */
static char * get_language_from_filename( const char * psz_sub_file )
{
char *psz_ret = NULL;
- char *psz_tmp, *psz_language_begin;
+ char *psz_tmp;
+
+ if( !psz_sub_file )
+ return NULL;
- if( !psz_sub_file ) return NULL;
- char *psz_work = strdup( psz_sub_file );
+ /* Remove path */
+ const char *psz_fname = strrchr( psz_sub_file, DIR_SEP_CHAR );
+ psz_fname = (psz_fname == NULL) ? psz_sub_file : psz_fname + 1;
- /* Removing extension, but leaving the dot */
- psz_tmp = strrchr( psz_work, '.' );
+ char *psz_work = strdup( psz_fname );
+ if( !psz_work )
+ return NULL;
+
+ psz_tmp = strrchr( psz_work, '.' ); /* Find extension */
if( psz_tmp )
{
- psz_tmp[0] = '\0';
- psz_language_begin = strrchr( psz_work, '.' );
- if( psz_language_begin )
- psz_ret = strdup(++psz_language_begin);
- psz_tmp[0] = '.';
+ psz_tmp[0] = '\0'; /* Remove it */
+
+ /* Get substr after next last period - hopefully our language string */
+ psz_tmp = strrchr( psz_work, '.' );
+ /* Otherwise try substr after last underscore for alternate pattern */
+ if( !psz_tmp )
+ psz_tmp = strchr( psz_work, '_' );
+
+ if( psz_tmp )
+ psz_ret = strdup(++psz_tmp);
}
free( psz_work );
View it on GitLab:
https://code.videolan.org/videolan/vlc/-/compare/dba99ba8ed29e40dcd939ef86181b4f075258457...d4c431d4fcf73900335b3f5c0f31ec6fb3126379
--
View it on GitLab:
https://code.videolan.org/videolan/vlc/-/compare/dba99ba8ed29e40dcd939ef86181b4f075258457...d4c431d4fcf73900335b3f5c0f31ec6fb3126379
You're receiving this email because of your account on code.videolan.org.
VideoLAN code repository instance_______________________________________________
vlc-commits mailing list
[email protected]
https://mailman.videolan.org/listinfo/vlc-commits