Re: [PATCH v4 19/33] migration: Add save_live_complete_precopy_thread handler
On Thu, Feb 06, 2025 at 12:41:50PM +0100, Maciej S. Szmigiero wrote:
> On 5.02.2025 16:55, Peter Xu wrote:
> > On Wed, Feb 05, 2025 at 12:53:21PM +0100, Maciej S. Szmigiero wrote:
> > > On 4.02.2025 21:34, Peter Xu wrote:
> > > > On Tue, Feb 04, 2025 at 08:32:15PM +0100, Maciej S. Szmigiero wrote:
> > > > > On 4.02.2025 18:54, Peter Xu wrote:
> > > > > > On Thu, Jan 30, 2025 at 11:08:40AM +0100, Maciej S. Szmigiero wrote:
> > > > > > > +static int multifd_device_state_save_thread(void *opaque)
> > > > > > > +{
> > > > > > > +struct MultiFDDSSaveThreadData *data = opaque;
> > > > > > > +int ret;
> > > > > > > +
> > > > > > > +ret = data->hdlr(data->idstr, data->instance_id,
> > > > > > > &send_threads_abort,
> > > > > > > + data->handler_opaque);
> > > > > >
> > > > > > I thought we discussed somewhere and the plan was we could use
> > > > > > Error** here
> > > > > > to report errors. Would that still make sense, or maybe I lost some
> > > > > > context?
> > > > >
> > > > > That was about *load* threads, here these are *save* threads.
> > > >
> > > > Ah OK.
> > > >
> > > > >
> > > > > Save handlers do not return an Error value, neither
> > > > > save_live_iterate, nor
> > > > > save_live_complete_precopy or save_state does so.
> > > >
> > > > Let's try to make new APIs work with Error* if possible.
> > >
> > > Let's assume that these threads return an Error object.
> > >
> > > What's qemu_savevm_state_complete_precopy_iterable() supposed to do with
> > > it?
> >
> > IIUC it's not about qemu_savevm_state_complete_precopy_iterable() in this
> > context, as the Error* will be used in one of the thread of the pool, not
> > migration thread.
> >
> > The goal is to be able to set Error* with migrate_set_error(), so that when
> > migration failed, query-migrate can return the error to libvirt, so
> > migration always tries to remember the 1st error hit if ever possible.
> >
> > It's multifd_device_state_save_thread() to do migrate_set_error(), not in
> > migration thread. qemu_savevm_state_complete_*() are indeed not ready to
> > pass Errors, but it's not in the discussed stack.
>
> I understand what are you proposing now - you haven't written about using
> migrate_set_error() for save threads earlier, just about returning an Error
> object.
>
> While this might work it has tendency to uncover errors in other parts of
> the migration core - much as using it in the load threads case uncovered
> the TLS session error.
Yes, hitting the tls issue is unfortunate, thanks for finding the bug. I'm
ok if we have any workaround for that that is easily revertable, then we
fix it later. Or if Fabiano's series can land earlier. To me it seems
easier you stick with Fabiano's series and focus on the rest of the patches.
If we hit another bug, it's more unfortunate, but imho there's not much we
can do but try to fix all of them..
>
> (Speaking of which, could you please respond to the issue at the bottom of
> this message from 2 days ago?:
> https://lore.kernel.org/qemu-devel/[email protected]/
> It is blocking rework of the TLS session EOF handling in this patch set.
> Thanks.)
>
> But I can try this migrate_set_error() approach here and see if something
> breaks.
>
> (..)
> > > > >
> > > > > > Meanwhile, I still feel uneasy on having these globals
> > > > > > (send_threads_abort,
> > > > > > send_threads_ret). Can we make MultiFDDSSaveThreadData the only
> > > > > > interface
> > > > > > between migration and the threads impl? So I wonder if it can be:
> > > > > >
> > > > > > ret = data->hdlr(data);
> > > > > >
> > > > > > With extended struct like this (I added thread_error and
> > > > > > thread_quit):
> > > > > >
> > > > > > struct MultiFDDSSaveThreadData {
> > > > > >SaveLiveCompletePrecopyThreadHandler hdlr;
> > > > > >char *idstr;
> > > > > >uint32_t instance_id;
> > > > > >void *handler_opaque;
> > > > > >/*
> > > > > > * Should be NULL when struct passed over to thread, the
> > > > > > thread should
> > > > > > * set this if the handler would return false. It must be
> > > > > > kept NULL if
> > > > > > * the handler returned true / success.
> > > > > > */
> > > > > >Error *thread_error;
> > > > >
> > > > > As I mentioned above, these handlers do not generally return Error
> > > > > type,
> > > > > so this would need to be an *int;
> > > > >
> > > > > >/*
> > > > > > * Migration core would set this when it wants to notify
> > > > > > thread to
> > > > > > * quit, for example, when error occured in other threads,
> > > > > > or migration is
> > > > > > * cancelled by the user.
> > > > > > */
> > > > > >bool thread_quit;
> > > > >
> > > > > ^ I guess that was supposed to be a pointer too
> > > > > (*thread_quit).
> > > >
> > > > It's my intention to make thi
Re: [PATCH v4 19/33] migration: Add save_live_complete_precopy_thread handler
On 5.02.2025 16:55, Peter Xu wrote:
On Wed, Feb 05, 2025 at 12:53:21PM +0100, Maciej S. Szmigiero wrote:
On 4.02.2025 21:34, Peter Xu wrote:
On Tue, Feb 04, 2025 at 08:32:15PM +0100, Maciej S. Szmigiero wrote:
On 4.02.2025 18:54, Peter Xu wrote:
On Thu, Jan 30, 2025 at 11:08:40AM +0100, Maciej S. Szmigiero wrote:
+static int multifd_device_state_save_thread(void *opaque)
+{
+struct MultiFDDSSaveThreadData *data = opaque;
+int ret;
+
+ret = data->hdlr(data->idstr, data->instance_id, &send_threads_abort,
+ data->handler_opaque);
I thought we discussed somewhere and the plan was we could use Error** here
to report errors. Would that still make sense, or maybe I lost some
context?
That was about *load* threads, here these are *save* threads.
Ah OK.
Save handlers do not return an Error value, neither save_live_iterate, nor
save_live_complete_precopy or save_state does so.
Let's try to make new APIs work with Error* if possible.
Let's assume that these threads return an Error object.
What's qemu_savevm_state_complete_precopy_iterable() supposed to do with it?
IIUC it's not about qemu_savevm_state_complete_precopy_iterable() in this
context, as the Error* will be used in one of the thread of the pool, not
migration thread.
The goal is to be able to set Error* with migrate_set_error(), so that when
migration failed, query-migrate can return the error to libvirt, so
migration always tries to remember the 1st error hit if ever possible.
It's multifd_device_state_save_thread() to do migrate_set_error(), not in
migration thread. qemu_savevm_state_complete_*() are indeed not ready to
pass Errors, but it's not in the discussed stack.
I understand what are you proposing now - you haven't written about using
migrate_set_error() for save threads earlier, just about returning an Error
object.
While this might work it has tendency to uncover errors in other parts of
the migration core - much as using it in the load threads case uncovered
the TLS session error.
(Speaking of which, could you please respond to the issue at the bottom of
this message from 2 days ago?:
https://lore.kernel.org/qemu-devel/[email protected]/
It is blocking rework of the TLS session EOF handling in this patch set.
Thanks.)
But I can try this migrate_set_error() approach here and see if something
breaks.
(..)
Meanwhile, I still feel uneasy on having these globals (send_threads_abort,
send_threads_ret). Can we make MultiFDDSSaveThreadData the only interface
between migration and the threads impl? So I wonder if it can be:
ret = data->hdlr(data);
With extended struct like this (I added thread_error and thread_quit):
struct MultiFDDSSaveThreadData {
SaveLiveCompletePrecopyThreadHandler hdlr;
char *idstr;
uint32_t instance_id;
void *handler_opaque;
/*
* Should be NULL when struct passed over to thread, the thread should
* set this if the handler would return false. It must be kept NULL if
* the handler returned true / success.
*/
Error *thread_error;
As I mentioned above, these handlers do not generally return Error type,
so this would need to be an *int;
/*
* Migration core would set this when it wants to notify thread to
* quit, for example, when error occured in other threads, or migration
is
* cancelled by the user.
*/
bool thread_quit;
^ I guess that was supposed to be a pointer too (*thread_quit).
It's my intention to make this bool, to make everything managed per-thread.
But that's unnecessary since this flag is common to all these threads.
One bool would be enough, but you'll need to export another API for VFIO to
use otherwise. I suppose that's ok too.
Some context of multifd threads and how that's done there..
We started with one "quit" per thread struct, but then we switched to one
bool exactly as you said, see commit 15f3f21d598148.
If you want to stick with one bool, it's okay too, you can export something
similar in misc.h, e.g. multifd_device_state_save_thread_quitting(), then
we can avoid passing in the "quit" either as handler parameter, or
per-thread flag.
Of course I can "export" this flag via a getter function rather than passing
it as a parameter to SaveLiveCompletePrecopyThreadHandler.
It's actually what we do with multifd, these are a bunch of extra threads
to differeciate from the "IO threads" / "multifd threads".
};
Then if any multifd_device_state_save_thread() failed, for example, it
should notify all threads to quit by setting thread_quit, instead of
relying on yet another global variable to show migration needs to quit.
multifd_abort_device_state_save_threads() needs to access
send_threads_abort too.
This may need to become something like:
QLIST_FOREACH() {
MultiFDDSSaveThreadData *data = ...;
data->thread_quit = true;
}
Re: [PATCH v4 19/33] migration: Add save_live_complete_precopy_thread handler
On Wed, Feb 05, 2025 at 12:53:21PM +0100, Maciej S. Szmigiero wrote:
> On 4.02.2025 21:34, Peter Xu wrote:
> > On Tue, Feb 04, 2025 at 08:32:15PM +0100, Maciej S. Szmigiero wrote:
> > > On 4.02.2025 18:54, Peter Xu wrote:
> > > > On Thu, Jan 30, 2025 at 11:08:40AM +0100, Maciej S. Szmigiero wrote:
> > > > > +static int multifd_device_state_save_thread(void *opaque)
> > > > > +{
> > > > > +struct MultiFDDSSaveThreadData *data = opaque;
> > > > > +int ret;
> > > > > +
> > > > > +ret = data->hdlr(data->idstr, data->instance_id,
> > > > > &send_threads_abort,
> > > > > + data->handler_opaque);
> > > >
> > > > I thought we discussed somewhere and the plan was we could use Error**
> > > > here
> > > > to report errors. Would that still make sense, or maybe I lost some
> > > > context?
> > >
> > > That was about *load* threads, here these are *save* threads.
> >
> > Ah OK.
> >
> > >
> > > Save handlers do not return an Error value, neither save_live_iterate, nor
> > > save_live_complete_precopy or save_state does so.
> >
> > Let's try to make new APIs work with Error* if possible.
>
> Let's assume that these threads return an Error object.
>
> What's qemu_savevm_state_complete_precopy_iterable() supposed to do with it?
IIUC it's not about qemu_savevm_state_complete_precopy_iterable() in this
context, as the Error* will be used in one of the thread of the pool, not
migration thread.
The goal is to be able to set Error* with migrate_set_error(), so that when
migration failed, query-migrate can return the error to libvirt, so
migration always tries to remember the 1st error hit if ever possible.
It's multifd_device_state_save_thread() to do migrate_set_error(), not in
migration thread. qemu_savevm_state_complete_*() are indeed not ready to
pass Errors, but it's not in the discussed stack.
> Both it and its caller qemu_savevm_state_complete_precopy() only handle int
> errors.
>
> qemu_savevm_state_complete_precopy() in turn has 4 callers, half of which (2)
> also would need to be enlightened with Error handling somehow.
Right, we don't need to touch those, as explained above.
Generally speaking, IMHO it's always good to add new code with Error*
reports, rather than retvals in migration, even if the new code is in the
migration thread stack. It made future changes easier to switch to Error*.
>
> >
> > >
> > > > Meanwhile, I still feel uneasy on having these globals
> > > > (send_threads_abort,
> > > > send_threads_ret). Can we make MultiFDDSSaveThreadData the only
> > > > interface
> > > > between migration and the threads impl? So I wonder if it can be:
> > > >
> > > > ret = data->hdlr(data);
> > > >
> > > > With extended struct like this (I added thread_error and thread_quit):
> > > >
> > > > struct MultiFDDSSaveThreadData {
> > > > SaveLiveCompletePrecopyThreadHandler hdlr;
> > > > char *idstr;
> > > > uint32_t instance_id;
> > > > void *handler_opaque;
> > > > /*
> > > >* Should be NULL when struct passed over to thread, the thread
> > > > should
> > > >* set this if the handler would return false. It must be kept
> > > > NULL if
> > > >* the handler returned true / success.
> > > >*/
> > > > Error *thread_error;
> > >
> > > As I mentioned above, these handlers do not generally return Error type,
> > > so this would need to be an *int;
> > >
> > > > /*
> > > >* Migration core would set this when it wants to notify thread to
> > > >* quit, for example, when error occured in other threads, or
> > > > migration is
> > > >* cancelled by the user.
> > > >*/
> > > > bool thread_quit;
> > >
> > > ^ I guess that was supposed to be a pointer too
> > > (*thread_quit).
> >
> > It's my intention to make this bool, to make everything managed per-thread.
>
> But that's unnecessary since this flag is common to all these threads.
One bool would be enough, but you'll need to export another API for VFIO to
use otherwise. I suppose that's ok too.
Some context of multifd threads and how that's done there..
We started with one "quit" per thread struct, but then we switched to one
bool exactly as you said, see commit 15f3f21d598148.
If you want to stick with one bool, it's okay too, you can export something
similar in misc.h, e.g. multifd_device_state_save_thread_quitting(), then
we can avoid passing in the "quit" either as handler parameter, or
per-thread flag.
>
> > It's actually what we do with multifd, these are a bunch of extra threads
> > to differeciate from the "IO threads" / "multifd threads".
> >
> > >
> > > > };
> > > >
> > > > Then if any multifd_device_state_save_thread() failed, for example, it
> > > > should notify all threads to quit by setting thread_quit, instead of
> > > > relying on yet another global variable to show migration needs to quit.
> > >
> > > multifd_abort_device_state_save_threads() needs to ac
Re: [PATCH v4 19/33] migration: Add save_live_complete_precopy_thread handler
On 4.02.2025 21:34, Peter Xu wrote:
On Tue, Feb 04, 2025 at 08:32:15PM +0100, Maciej S. Szmigiero wrote:
On 4.02.2025 18:54, Peter Xu wrote:
On Thu, Jan 30, 2025 at 11:08:40AM +0100, Maciej S. Szmigiero wrote:
+static int multifd_device_state_save_thread(void *opaque)
+{
+struct MultiFDDSSaveThreadData *data = opaque;
+int ret;
+
+ret = data->hdlr(data->idstr, data->instance_id, &send_threads_abort,
+ data->handler_opaque);
I thought we discussed somewhere and the plan was we could use Error** here
to report errors. Would that still make sense, or maybe I lost some
context?
That was about *load* threads, here these are *save* threads.
Ah OK.
Save handlers do not return an Error value, neither save_live_iterate, nor
save_live_complete_precopy or save_state does so.
Let's try to make new APIs work with Error* if possible.
Let's assume that these threads return an Error object.
What's qemu_savevm_state_complete_precopy_iterable() supposed to do with it?
Both it and its caller qemu_savevm_state_complete_precopy() only handle int
errors.
qemu_savevm_state_complete_precopy() in turn has 4 callers, half of which (2)
also would need to be enlightened with Error handling somehow.
Meanwhile, I still feel uneasy on having these globals (send_threads_abort,
send_threads_ret). Can we make MultiFDDSSaveThreadData the only interface
between migration and the threads impl? So I wonder if it can be:
ret = data->hdlr(data);
With extended struct like this (I added thread_error and thread_quit):
struct MultiFDDSSaveThreadData {
SaveLiveCompletePrecopyThreadHandler hdlr;
char *idstr;
uint32_t instance_id;
void *handler_opaque;
/*
* Should be NULL when struct passed over to thread, the thread should
* set this if the handler would return false. It must be kept NULL if
* the handler returned true / success.
*/
Error *thread_error;
As I mentioned above, these handlers do not generally return Error type,
so this would need to be an *int;
/*
* Migration core would set this when it wants to notify thread to
* quit, for example, when error occured in other threads, or migration is
* cancelled by the user.
*/
bool thread_quit;
^ I guess that was supposed to be a pointer too (*thread_quit).
It's my intention to make this bool, to make everything managed per-thread.
But that's unnecessary since this flag is common to all these threads.
It's actually what we do with multifd, these are a bunch of extra threads
to differeciate from the "IO threads" / "multifd threads".
};
Then if any multifd_device_state_save_thread() failed, for example, it
should notify all threads to quit by setting thread_quit, instead of
relying on yet another global variable to show migration needs to quit.
multifd_abort_device_state_save_threads() needs to access
send_threads_abort too.
This may need to become something like:
QLIST_FOREACH() {
MultiFDDSSaveThreadData *data = ...;
data->thread_quit = true;
}
At the most basic level that's turning O(1) operation into O(n).
Besides, it creates a question now who now owns these MultiFDDSSaveThreadData
structures - they could be owned by either thread pool or the
multifd_device_state code.
Currently the ownership is simple - the multifd_device_state code
allocates such per-thread structure in multifd_spawn_device_state_save_thread()
and immediately passes its ownership to the thread pool which
takes care to free it once it no longer needs it.
Now, with the list implementation if the thread pool were to free
that MultiFDDSSaveThreadData it would also need to release it from
the list.
Which in turn would need appropriate locking around this removal
operation and probably also each time the list is iterated over.
On the other hand if the multifd_device_state code were to own
that MultiFDDSSaveThreadData then it would linger around until
multifd_device_state_send_cleanup() cleans it up even though its
associated thread might be long gone.
We may want to double check qmp 'migrate_cancel' will work when save
threads are running, but this can also be done for later.
And multifd_join_device_state_save_threads() needs to access
send_threads_ret.
Then this one becomes:
thread_pool_wait(send_threads);
QLIST_FOREACH() {
MultiFDDSSaveThreadData *data = ...;
if (data->thread_error) {
return false;
}
}
return true;
Same here, having a common error return would save us from having
to iterate over a list (or having a list in the first place).
These variables ultimately will have to be stored somewhere since
there can be multiple save threads and so multiple instances of
MultiFDDSSaveThreadData.
So these need to be stored somewhere where
multifd_spawn_device_state_save_thread() can reach them to assign
their addresses to MultiFDDSSaveThreadData member
Re: [PATCH v4 19/33] migration: Add save_live_complete_precopy_thread handler
On Tue, Feb 04, 2025 at 08:32:15PM +0100, Maciej S. Szmigiero wrote:
> On 4.02.2025 18:54, Peter Xu wrote:
> > On Thu, Jan 30, 2025 at 11:08:40AM +0100, Maciej S. Szmigiero wrote:
> > > +static int multifd_device_state_save_thread(void *opaque)
> > > +{
> > > +struct MultiFDDSSaveThreadData *data = opaque;
> > > +int ret;
> > > +
> > > +ret = data->hdlr(data->idstr, data->instance_id, &send_threads_abort,
> > > + data->handler_opaque);
> >
> > I thought we discussed somewhere and the plan was we could use Error** here
> > to report errors. Would that still make sense, or maybe I lost some
> > context?
>
> That was about *load* threads, here these are *save* threads.
Ah OK.
>
> Save handlers do not return an Error value, neither save_live_iterate, nor
> save_live_complete_precopy or save_state does so.
Let's try to make new APIs work with Error* if possible.
>
> > Meanwhile, I still feel uneasy on having these globals (send_threads_abort,
> > send_threads_ret). Can we make MultiFDDSSaveThreadData the only interface
> > between migration and the threads impl? So I wonder if it can be:
> >
> >ret = data->hdlr(data);
> >
> > With extended struct like this (I added thread_error and thread_quit):
> >
> > struct MultiFDDSSaveThreadData {
> > SaveLiveCompletePrecopyThreadHandler hdlr;
> > char *idstr;
> > uint32_t instance_id;
> > void *handler_opaque;
> > /*
> > * Should be NULL when struct passed over to thread, the thread should
> > * set this if the handler would return false. It must be kept NULL if
> > * the handler returned true / success.
> > */
> > Error *thread_error;
>
> As I mentioned above, these handlers do not generally return Error type,
> so this would need to be an *int;
>
> > /*
> > * Migration core would set this when it wants to notify thread to
> > * quit, for example, when error occured in other threads, or
> > migration is
> > * cancelled by the user.
> > */
> > bool thread_quit;
>
> ^ I guess that was supposed to be a pointer too (*thread_quit).
It's my intention to make this bool, to make everything managed per-thread.
It's actually what we do with multifd, these are a bunch of extra threads
to differeciate from the "IO threads" / "multifd threads".
>
> > };
> >
> > Then if any multifd_device_state_save_thread() failed, for example, it
> > should notify all threads to quit by setting thread_quit, instead of
> > relying on yet another global variable to show migration needs to quit.
>
> multifd_abort_device_state_save_threads() needs to access
> send_threads_abort too.
This may need to become something like:
QLIST_FOREACH() {
MultiFDDSSaveThreadData *data = ...;
data->thread_quit = true;
}
We may want to double check qmp 'migrate_cancel' will work when save
threads are running, but this can also be done for later.
>
> And multifd_join_device_state_save_threads() needs to access
> send_threads_ret.
Then this one becomes:
thread_pool_wait(send_threads);
QLIST_FOREACH() {
MultiFDDSSaveThreadData *data = ...;
if (data->thread_error) {
return false;
}
}
return true;
>
> These variables ultimately will have to be stored somewhere since
> there can be multiple save threads and so multiple instances of
> MultiFDDSSaveThreadData.
>
> So these need to be stored somewhere where
> multifd_spawn_device_state_save_thread() can reach them to assign
> their addresses to MultiFDDSSaveThreadData members.
Then multifd_spawn_device_state_save_thread() will need to manage the
qlist, making sure migration core remembers what jobs it submitted. It
sounds good to have that bookkeeping when I think about it, instead of
throw the job to the thread pool and forget it..
>
> However, at that point multifd_device_state_save_thread() can
> access them too so it does not need to have them passed via
> MultiFDDSSaveThreadData.
>
> However, nothing prevents putting send_threads* variables
> into a global struct (with internal linkage - "static", just as
> these separate ones are) if you like such construct more.
This should be better than the current global vars indeed, but less
favoured if the per-thread way could work above.
Thanks,
--
Peter Xu
Re: [PATCH v4 19/33] migration: Add save_live_complete_precopy_thread handler
On 4.02.2025 18:54, Peter Xu wrote:
On Thu, Jan 30, 2025 at 11:08:40AM +0100, Maciej S. Szmigiero wrote:
+static int multifd_device_state_save_thread(void *opaque)
+{
+struct MultiFDDSSaveThreadData *data = opaque;
+int ret;
+
+ret = data->hdlr(data->idstr, data->instance_id, &send_threads_abort,
+ data->handler_opaque);
I thought we discussed somewhere and the plan was we could use Error** here
to report errors. Would that still make sense, or maybe I lost some
context?
That was about *load* threads, here these are *save* threads.
Save handlers do not return an Error value, neither save_live_iterate, nor
save_live_complete_precopy or save_state does so.
Meanwhile, I still feel uneasy on having these globals (send_threads_abort,
send_threads_ret). Can we make MultiFDDSSaveThreadData the only interface
between migration and the threads impl? So I wonder if it can be:
ret = data->hdlr(data);
With extended struct like this (I added thread_error and thread_quit):
struct MultiFDDSSaveThreadData {
SaveLiveCompletePrecopyThreadHandler hdlr;
char *idstr;
uint32_t instance_id;
void *handler_opaque;
/*
* Should be NULL when struct passed over to thread, the thread should
* set this if the handler would return false. It must be kept NULL if
* the handler returned true / success.
*/
Error *thread_error;
As I mentioned above, these handlers do not generally return Error type,
so this would need to be an *int;
/*
* Migration core would set this when it wants to notify thread to
* quit, for example, when error occured in other threads, or migration is
* cancelled by the user.
*/
bool thread_quit;
^ I guess that was supposed to be a pointer too (*thread_quit).
};
Then if any multifd_device_state_save_thread() failed, for example, it
should notify all threads to quit by setting thread_quit, instead of
relying on yet another global variable to show migration needs to quit.
multifd_abort_device_state_save_threads() needs to access
send_threads_abort too.
And multifd_join_device_state_save_threads() needs to access
send_threads_ret.
These variables ultimately will have to be stored somewhere since
there can be multiple save threads and so multiple instances of
MultiFDDSSaveThreadData.
So these need to be stored somewhere where
multifd_spawn_device_state_save_thread() can reach them to assign
their addresses to MultiFDDSSaveThreadData members.
However, at that point multifd_device_state_save_thread() can
access them too so it does not need to have them passed via
MultiFDDSSaveThreadData.
However, nothing prevents putting send_threads* variables
into a global struct (with internal linkage - "static", just as
these separate ones are) if you like such construct more.
Thanks,
Thanks,
Maciej
Re: [PATCH v4 19/33] migration: Add save_live_complete_precopy_thread handler
On Thu, Jan 30, 2025 at 11:08:40AM +0100, Maciej S. Szmigiero wrote:
> +static int multifd_device_state_save_thread(void *opaque)
> +{
> +struct MultiFDDSSaveThreadData *data = opaque;
> +int ret;
> +
> +ret = data->hdlr(data->idstr, data->instance_id, &send_threads_abort,
> + data->handler_opaque);
I thought we discussed somewhere and the plan was we could use Error** here
to report errors. Would that still make sense, or maybe I lost some
context?
Meanwhile, I still feel uneasy on having these globals (send_threads_abort,
send_threads_ret). Can we make MultiFDDSSaveThreadData the only interface
between migration and the threads impl? So I wonder if it can be:
ret = data->hdlr(data);
With extended struct like this (I added thread_error and thread_quit):
struct MultiFDDSSaveThreadData {
SaveLiveCompletePrecopyThreadHandler hdlr;
char *idstr;
uint32_t instance_id;
void *handler_opaque;
/*
* Should be NULL when struct passed over to thread, the thread should
* set this if the handler would return false. It must be kept NULL if
* the handler returned true / success.
*/
Error *thread_error;
/*
* Migration core would set this when it wants to notify thread to
* quit, for example, when error occured in other threads, or migration is
* cancelled by the user.
*/
bool thread_quit;
};
Then if any multifd_device_state_save_thread() failed, for example, it
should notify all threads to quit by setting thread_quit, instead of
relying on yet another global variable to show migration needs to quit.
Thanks,
> +if (ret && !qatomic_read(&send_threads_ret)) {
> +/*
> + * Racy with the above read but that's okay - which thread error
> + * return we report is purely arbitrary anyway.
> + */
> +qatomic_set(&send_threads_ret, ret);
> +}
> +
> +return 0;
> +}
--
Peter Xu
