-horenchu...@bytedance.com
> [1/2]
> https://lkml.kernel.org/r/20240405000707.2670063-2-horenchu...@bytedance.com
> [1/2]
> https://lkml.kernel.org/r/20240405000707.2670063-3-horenchu...@bytedance.com
>
> Signed-off-by: Ho-Ren (Jack) Chuang
> Suggested-by: Jonathan Cameron
ith some other memory that isn't DRAM where the granularity
> doesn't match) the CPU nodes contain no DRAM but rather it's one node away.
> Handling that can be a job for another day though.
>
> Why does this need to be computed here? Why not do it in
> hmat_set_default_dram_perf? Doesn't seem to be used anywhere else.
IMO, which node is default dram node is a general concept instead of
HMAT specific. So, I think that it's better to decide that in the
general code (memory-tiers.c).
>>
>> hotplug_memory_notifier(memtier_hotplug_callback, MEMTIER_HOTPLUG_PRI);
>> return 0;
--
Best Regards,
Huang, Ying
itialized
> - * after firmware and devices are initialized.
> - */
> - continue;
> -
> - memtier = set_node_memory_tier(node);
> - if (IS_ERR(memtier))
> - /*
> - * Continue with memtiers we are able to setup
> - */
> - break;
> - }
> - establish_demotion_targets();
> - mutex_unlock(&memory_tier_lock);
> + /* Record nodes with memory and CPU to set default DRAM performance. */
> + nodes_and(default_dram_nodes, node_states[N_MEMORY],
> + node_states[N_CPU]);
>
> hotplug_memory_notifier(memtier_hotplug_callback, MEMTIER_HOTPLUG_PRI);
> return 0;
--
Best Regards,
Huang, Ying
"Ho-Ren Chuang" writes:
> June 24, 2024 at 1:27 AM, "Huang, Ying" wrote:
>
> Hi Huang, Ying,
>
> Thanks for your feedback. Replies inlined.
>
>>
>> Hi, Jack,
>>
>> Thanks for patch!
>>
>> "Ho-Ren (Jack) Ch
))
> - /*
> - * Defer memory tier initialization on
> - * CPUless numa nodes. These will be initialized
> - * after firmware and devices are initialized.
> - */
> - continue;
> -
> - memtier = set_node_memory_tier(node);
> - if (IS_ERR(memtier))
> - /*
> - * Continue with memtiers we are able to setup
> - */
> - break;
> - }
> - establish_demotion_targets();
> - mutex_unlock(&memory_tier_lock);
> + for_each_node_state(node, N_MEMORY)
> + if (node_state(node, N_CPU))
> + node_set(node, default_dram_nodes);
Why not use
nodes_andnot(default_dram_nodes, node_states[N_MEMORY],
node_states[N_CPU]);
> hotplug_memory_notifier(memtier_hotplug_callback, MEMTIER_HOTPLUG_PRI);
> return 0;
--
Best Regards,
Huang, Ying
it@redhat.com/
When developing the Linux kernel patchset "[PATCH v3 0/3] cxl/region:
Support to calculate memory tier abstract distance" as in [1].
[1]
https://lore.kernel.org/linux-cxl/20240618084639.1419629-1-ying.hu...@intel.com/
I use this patchset to test my kernel patchset a
l/74e2845c5f95b0c139c79233ddb65bb17f2dd679.1710282274.git@redhat.com/
>
Thanks a lot for your work!
I need this to test some memory tiering kernel patches. I found the
following git branch,
https://gitlab.com/jic23/qemu/-/commits/cxl-2024-03-05/?ref_type=heads
Can I use that branch directly?
And, can you share an example qemu command line to setup Genport, CDAT,
and HMAT?
--
Best Regards,
Huang, Ying
memory types that are not
>> > initialized by device drivers.
>> > Because late initialized memory and default DRAM memory need to be managed,
>> > a default memory type is created for storing all memory types that are
>> > not initialized by device drivers and as
types that are
> not initialized by device drivers and as a fallback.
>
> Signed-off-by: Ho-Ren (Jack) Chuang
> Signed-off-by: Hao Xiang
> Reviewed-by: "Huang, Ying"
> ---
> mm/memory-tiers.c | 94 +++
> 1 file chan
"node_memory_types[nid].memtype"
will be !NULL. And it's possible (in theory) that some nodes becomes
"node_state(nid, N_CPU) == true" between memory_tier_init() and
memory_tier_late_init().
Otherwise, Looks good to me. Feel free to add
Reviewed-by: "Huang, Y
> &default_memory_types);
> if (IS_ERR(default_dram_type))
> panic("%s() failed to allocate default DRAM tier\n", __func__);
>
> @@ -868,6 +919,14 @@ static int __init memory_tier_init(void)
>* types assigned.
>*/
> for_each_node_state(node, N_MEMORY) {
> + if (!node_state(node, N_CPU))
> + /*
> + * Defer memory tier initialization on CPUless numa
> nodes.
> + * These will be initialized after firmware and devices
> are
> + * initialized.
> + */
> + continue;
> +
> memtier = set_node_memory_tier(node);
> if (IS_ERR(memtier))
> /*
--
Best Regards,
Huang, Ying
"Ho-Ren (Jack) Chuang" writes:
> On Fri, Mar 22, 2024 at 1:41 AM Huang, Ying wrote:
>>
>> "Ho-Ren (Jack) Chuang" writes:
>>
>> > The current implementation treats emulated memory devices, such as
>> > CXL1.1 type3 mem
_dram_type = mt_find_alloc_memory_type(MEMTIER_ADISTANCE_DRAM,
> +
> &default_memory_types);
> if (IS_ERR(default_dram_type))
> panic("%s() failed to allocate default DRAM tier\n", __func__);
>
> @@ -868,6 +913,14 @@ static int __init memory_tier_init(void)
>* types assigned.
>*/
> for_each_node_state(node, N_MEMORY) {
> + if (!node_state(node, N_CPU))
> + /*
> + * Defer memory tier initialization on CPUless numa
> nodes.
> + * These will be initialized after firmware and devices
> are
> + * initialized.
> + */
> + continue;
> +
> memtier = set_node_memory_tier(node);
> if (IS_ERR(memtier))
> /*
--
Best Regards,
Huang, Ying
* For now we can have 4 faster memory tiers with smaller adistance
>* than default DRAM tier.
>*/
> - default_dram_type = alloc_memory_type(MEMTIER_ADISTANCE_DRAM);
> + default_dram_type = mt_find_alloc_memory_type(
> + MEMTIER_ADISTANCE_DRAM,
> &default_memory_types);
> if (IS_ERR(default_dram_type))
> panic("%s() failed to allocate default DRAM tier\n", __func__);
>
> @@ -836,6 +908,14 @@ static int __init memory_tier_init(void)
>* types assigned.
>*/
> for_each_node_state(node, N_MEMORY) {
> + if (!node_state(node, N_CPU))
> + /*
> + * Defer memory tier initialization on CPUless numa
> nodes.
> + * These will be initialized after firmware and devices
> are
> + * initialized.
> + */
> + continue;
> +
> memtier = set_node_memory_tier(node);
> if (IS_ERR(memtier))
> /*
--
Best Regards,
Huang, Ying
"Ho-Ren (Jack) Chuang" writes:
> On Tue, Mar 12, 2024 at 2:21 AM Huang, Ying wrote:
>>
>> "Ho-Ren (Jack) Chuang" writes:
>>
>> > The current implementation treats emulated memory devices, such as
>> > CXL1.1 type3 mem
> default_dram_perf.write_latency) *
> (default_dram_perf.read_bandwidth +
> default_dram_perf.write_bandwidth) /
> (perf->read_bandwidth + perf->write_bandwidth);
> - mutex_unlock(&memory_tier_lock);
> + mutex_unlock(&mt_perf_lock);
>
> return 0;
> }
> @@ -836,6 +890,14 @@ static int __init memory_tier_init(void)
>* types assigned.
>*/
> for_each_node_state(node, N_MEMORY) {
> + if (!node_state(node, N_CPU))
> + /*
> + * Defer memory tier initialization on CPUless numa
> nodes.
> + * These will be initialized when HMAT information is
HMAT is platform specific, we should avoid to mention it in general code
if possible.
> + * available.
> + */
> + continue;
> +
> memtier = set_node_memory_tier(node);
> if (IS_ERR(memtier))
> /*
--
Best Regards,
Huang, Ying
"Ho-Ren (Jack) Chuang" writes:
> On Sun, Mar 3, 2024 at 6:42 PM Huang, Ying wrote:
>>
>> Hi, Jack,
>>
>> "Ho-Ren (Jack) Chuang" writes:
>>
>> > * Introduce `mt_init_with_hmat()`
>> > We defer memory tier ini
"Ho-Ren (Jack) Chuang" writes:
> On Mon, Mar 4, 2024 at 10:36 PM Huang, Ying wrote:
>>
>> "Ho-Ren (Jack) Chuang" writes:
>>
>> > On Sun, Mar 3, 2024 at 6:47 PM Huang, Ying wrote:
>> >>
>> >> "Ho-Ren (Jac
"Ho-Ren (Jack) Chuang" writes:
> On Sun, Mar 3, 2024 at 6:47 PM Huang, Ying wrote:
>>
>> "Ho-Ren (Jack) Chuang" writes:
>>
>> > The memory tiering component in the kernel is functionally useless for
>> > CPUless memory/non-DRAM devices
s/acpi/numa/hmat.c | 3 ++
> include/linux/memory-tiers.h | 6 +++
> mm/memory-tiers.c | 76 ++++
> 3 files changed, 77 insertions(+), 8 deletions(-)
--
Best Regards,
Huang, Ying
> if (!IS_ERR(memtier))
> establish_demotion_targets();
> mutex_unlock(&memory_tier_lock);
> @@ -836,7 +888,15 @@ static int __init memory_tier_init(void)
>* types assigned.
>*/
> for_each_node_state(node, N_MEMORY) {
> - memtier = set_node_memory_tier(node);
> + if (!node_state(node, N_CPU))
> + /*
> + * Defer memory tier initialization on CPUless numa
> nodes.
> + * These will be initialized when HMAT information is
> + * available.
> + */
> + continue;
> +
> + memtier = set_node_memory_tier(node, default_dram_type);
On system with HMAT, how to fall back CPU-less node to
default_dram_type? I found your description, but I don't find it in code.
> if (IS_ERR(memtier))
> /*
>* Continue with memtiers we are able to setup
--
Best Regards,
Huang, Ying
:
- Add QMP support
Signed-off-by: Max Asbock
Signed-off-by: Jiajia Zheng
Signed-off-by: Huang Ying
---
hmp-commands.hx | 15 +++
monitor.c | 22 ++
2 files changed, 37 insertions(+)
--- a/monitor.c
+++ b/monitor.c
@@ -2713,6 +2713,28 @@ static void
On Thu, 2011-02-10 at 16:52 +0800, Jan Kiszka wrote:
> On 2011-02-10 01:27, Huang Ying wrote:
> >>> @@ -1882,6 +1919,7 @@ int kvm_arch_on_sigbus_vcpu(CPUState *en
> >>> hardware_memory_error();
> >>> }
> >>>
On Wed, 2011-02-09 at 16:00 +0800, Jan Kiszka wrote:
> On 2011-02-09 04:00, Huang Ying wrote:
> > In Linux kernel HWPoison processing implementation, the virtual
> > address in processes mapping the error physical memory page is marked
> > as HWPoison. So that, the fur
new page to recover the issue.
Signed-off-by: Huang Ying
---
target-i386/kvm.c | 39 +++
1 file changed, 39 insertions(+)
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -508,6 +508,42 @@ static int kvm_get_supported_msrs(KVMSta
return ret
qemu_ram_remap() unmaps the specified RAM pages, then re-maps these
pages again. This is used by KVM HWPoison support to clear HWPoisoned
page tables across guest rebooting, so that a new page may be
allocated later to recover the memory error.
Signed-off-by: Huang Ying
---
cpu-all.h|4
On Fri, 2011-01-14 at 16:38 +0800, Jan Kiszka wrote:
> Am 14.01.2011 02:51, Huang Ying wrote:
> > On Thu, 2011-01-13 at 17:01 +0800, Jan Kiszka wrote:
> >> Am 13.01.2011 09:34, Huang Ying wrote:
[snip]
> >>> +
> >>> +void kvm_unpoison_all(void *param)
&
On Thu, 2011-01-13 at 17:01 +0800, Jan Kiszka wrote:
> Am 13.01.2011 09:34, Huang Ying wrote:
> > In Linux kernel HWPoison processing implementation, the virtual
> > address in processes mapping the error physical memory page is marked
> > as HWPoison. So that, the fur
On Fri, 2011-01-14 at 05:14 +0800, Blue Swirl wrote:
> On Thu, Jan 13, 2011 at 8:34 AM, Huang Ying wrote:
> > qemu_ram_remap() unmaps the specified RAM pages, then re-maps these
> > pages again. This is used by KVM HWPoison support to clear HWPoisoned
> > page tables acros
qemu_ram_remap() unmaps the specified RAM pages, then re-maps these
pages again. This is used by KVM HWPoison support to clear HWPoisoned
page tables across guest rebooting, so that a new page may be
allocated later to recover the memory error.
Signed-off-by: Huang Ying
---
cpu-all.h|4
page to recover the issue.
Signed-off-by: Huang Ying
---
kvm.h |2 ++
target-i386/kvm.c | 39 +++
2 files changed, 41 insertions(+)
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -580,6 +580,42 @@ static int kvm_get_supported_msrs(void
On Wed, 2011-01-05 at 16:07 +0800, Jan Kiszka wrote:
> Am 05.01.2011 07:42, Huang Ying wrote:
> > On Tue, 2011-01-04 at 16:32 +0800, Jan Kiszka wrote:
> >> From: Jan Kiszka
> >>
> >> There is no need to restrict writing back MCE MSRs to reset or full
> >&
On Fri, 2010-12-31 at 17:10 +0800, Jan Kiszka wrote:
> Am 31.12.2010 06:22, Huang Ying wrote:
> > In Linux kernel HWPoison processing implementation, the virtual
> > address in processes mapping the error physical memory page is marked
> > as HWPoison. So that, the fur
G_STATUS, so
their content should be kept. And the following sequence may set
uncorrected value in MCE registers.
savevm -> loadvm -> (OS clear MCE registers) -> reset -> (MCE registers
has new (uncorrected) value)
Best Regards,
Huang Ying
> Signed-off-by: Jan Kiszka
> CC: Huang Yi
qemu_ram_remap() unmaps the specified RAM pages, then re-maps these
pages again. This is used by KVM HWPoison support to clear HWPoisoned
page tables across guest rebooting, so that a new page may be
allocated later to recover the memory error.
Signed-off-by: Huang Ying
---
cpu-all.h|4
new page to recover the issue.
Signed-off-by: Huang Ying
---
kvm.h |2 ++
qemu-kvm.c| 37 +
target-i386/kvm.c |2 ++
3 files changed, 41 insertions(+)
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1803,6 +1803,7 @@ int
On Sun, 2010-12-19 at 00:25 +0800, Andreas Färber wrote:
> softfloat.h's uint64 type has least-width semantics,
> which seems unintended here since uint64_t is used in helpers.
>
> v3:
> * Split off.
>
> Cc: Huang Ying
> Cc: Juan Quintela
> Signed-off-by: Andreas Färber
Acked-by: Huang Ying
; One problem here is that SRAR is not broadcasted.
> The guest might observe the event differently, like "some cpus
> don't enter machine check."
Yes. SRAR "Broadcast" follows spec better.
Best Regards,
Huang Ying
d of being sent via touching poisoned virtual address.
> > I would think that if the the bad page can't be sidelined, such that
> > the newly booting guest can't use it, then the new guest shouldn't be
> > allowed to boot. But perhaps there is some merit in letting it try to
> > boot and see if one gets 'lucky'.
>
> In case of booting a real machine in real world, hardware and firmware
> usually (or often) do self-test before passing control to OS.
> Some platform can boot OS with degraded configuration (for example,
> fewer memory) if it has trouble on its component. Some BIOS may
> stop booting and show messages like "please reseat [component]" on the
> screen. So we could implement/request qemu to have such mechanism.
>
> I can understand the merit you mentioned here, in some degree. But I
> think it is hard to say "unlucky" to customer in business...
Because the contents of poisoned pages are not relevant after reboot.
Qemu can replace the poisoned pages with good pages when reboot guest.
Do you think that is good.
Best Regards,
Huang Ying
ives VAL|UC|!PCC and RIPV event
> > from virtual processor that doesn't have SER_P.
>
> Dean also noted this. I don't think it was deliberate choice to not
> expose SER_P. Huang?
In fact, that should be a BUG. I will fix it as soon as possible.
Best Regards,
Huang Ying
real hardware.
This patch fixes this via set env->mcg_status to 0 during system reset.
Signed-off-by: Huang Ying
---
target-i386/helper.c |2 ++
1 file changed, 2 insertions(+)
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -617,6 +617,8 @@ void cpu_reset(CPUX86State *
41 matches
Mail list logo