public inbox for linux-next@vger.kernel.org 
 help / color / mirror / Atom feed
From: "Aithal, Srikanth" <sraithal@amd•com>
To: Sean Christopherson <seanjc@google•com>,
	Zheyun Shen <szy0127@sjtu•edu.cn>
Cc: linux-next@vger•kernel.org, kvm@vger•kernel.org,
	linux-kernel@vger•kernel.org
Subject: Re: [BUG] NULL pointer dereference in sev_writeback_caches during KVM SEV migration kselftest on AMD platform
Date: Tue, 15 Jul 2025 12:07:13 +0530	[thread overview]
Message-ID: <5e2d0b4c-fd1b-41f0-a099-64c897b7ec6b@amd.com> (raw)
In-Reply-To: <aHWB-JPG8r_x2w-A@google.com>

On 7/15/2025 3:47 AM, Sean Christopherson wrote:
> On Mon, Jul 14, 2025, Sean Christopherson wrote:
>> So as much as I want to avoid allocating another cpumask (ugh), it's the right
>> thing to do.  And practically speaking, I doubt many real world users of SEV will
>> be using MAXSMP, i.e. the allocations don't exist anyways.
>>
>> Unless someone objects and/or has a better idea, I'll squash this:
>>
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index 95668e84ab86..e39726d258b8 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -2072,6 +2072,17 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
>>          if (ret)
>>                  goto out_source_vcpu;
>>   
>> +       /*
>> +        * Allocate a new have_run_cpus for the destination, i.e. don't copy
>> +        * the set of CPUs from the source.  If a CPU was used to run a vCPU in
>> +        * the source VM but is never used for the destination VM, then the CPU
>> +        * can only have cached memory that was accessible to the source VM.
>> +        */
>> +       if (!zalloc_cpumask_var(&dst_sev->have_run_cpus, GFP_KERNEL_ACCOUNT)) {
>> +               ret = -ENOMEM;
>> +               goto out_source_vcpu;
>> +       }
>> +
>>          sev_migrate_from(kvm, source_kvm);
>>          kvm_vm_dead(source_kvm);
>>          cg_cleanup_sev = src_sev;
>> @@ -2771,13 +2782,18 @@ int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
>>                  goto e_unlock;
>>          }
>>   
>> +       mirror_sev = to_kvm_sev_info(kvm);
>> +       if (!zalloc_cpumask_var(&mirror_sev->have_run_cpus, GFP_KERNEL_ACCOUNT)) {
>> +               ret = -ENOMEM;
>> +               goto e_unlock;
>> +       }
>> +
>>          /*
>>           * The mirror kvm holds an enc_context_owner ref so its asid can't
>>           * disappear until we're done with it
>>           */
>>          source_sev = to_kvm_sev_info(source_kvm);
>>          kvm_get_kvm(source_kvm);
>> -       mirror_sev = to_kvm_sev_info(kvm);
>>          list_add_tail(&mirror_sev->mirror_entry, &source_sev->mirror_vms);
>>   
>>          /* Set enc_context_owner and copy its encryption context over */
> 
> This isn't quite right either, because sev_vm_destroy() won't free the cpumask
> for mirror VMs.
> 
> Aha!  And KVM will also unnecessarily leak have_run_cpus if SNP decomission
> fails (though that should be an extremely rare error scecnario).
> 
> KVM is guaranteed to have blasted WBINVD before reaching sev_vm_destroy() (see
> commit 7e00013bd339 "KVM: SVM: Remove wbinvd in sev_vm_destroy()"), so unless I'm
> missing something, KVM can simply free have_run_cpus at the start of sev_vm_destroy().
> 
> Ooh, side topic!  The fact that sev_vm_destroy() wasn't blasting WBINVD would
> have been a bug if not for kvm_arch_guest_memory_reclaimed() and
> kvm_arch_gmem_invalidate() taking care of mirror VMs.
> 
> New hash for the patch:
> 
>    KVM: SVM: Flush cache only on CPUs running SEV guest
>    https://github.com/kvm-x86/linux/commit/6f38f8c57464


kselftest sev_migrate_tests passes with current 
https://github.com/kvm-x86/linux/tree/next (head 2a046f6), which has 
commit 6f38f8c.


Reported-by: Srikanth Aithal <sraithal@amd•com>
Tested-by: Srikanth Aithal <sraithal@amd•com>


> 
> And the full contexts of what I force-pushed:
> 
> --
> From: Zheyun Shen <szy0127@sjtu•edu.cn>
> Date: Thu, 22 May 2025 16:37:32 -0700
> Subject: [PATCH] KVM: SVM: Flush cache only on CPUs running SEV guest
> 
> On AMD CPUs without ensuring cache consistency, each memory page
> reclamation in an SEV guest triggers a call to do WBNOINVD/WBINVD on all
> CPUs, thereby affecting the performance of other programs on the host.
> 
> Typically, an AMD server may have 128 cores or more, while the SEV guest
> might only utilize 8 of these cores. Meanwhile, host can use qemu-affinity
> to bind these 8 vCPUs to specific physical CPUs.
> 
> Therefore, keeping a record of the physical core numbers each time a vCPU
> runs can help avoid flushing the cache for all CPUs every time.
> 
> Take care to allocate the cpumask used to track which CPUs have run a
> vCPU when copying or moving an "encryption context", as nothing guarantees
> memory in a mirror VM is a strict subset of the ASID owner, and the
> destination VM for intrahost migration needs to maintain it's own set of
> CPUs.  E.g. for intrahost migration, if a CPU was used for the source VM
> but not the destination VM, then it can only have cached memory that was
> accessible to the source VM.  And a CPU that was run in the source is also
> used by the destination is no different than a CPU that was run in the
> destination only.
> 
> Note, KVM is guaranteed to do flush caches prior to sev_vm_destroy(),
> thanks to kvm_arch_guest_memory_reclaimed for SEV and SEV-ES, and
> kvm_arch_gmem_invalidate() for SEV-SNP.  I.e. it's safe to free the
> cpumask prior to unregistering encrypted regions and freeing the ASID.
> 
> Cc: Srikanth Aithal <sraithal@amd•com>
> Reviewed-by: Tom Lendacky <thomas.lendacky@amd•com>
> Signed-off-by: Zheyun Shen <szy0127@sjtu•edu.cn>
> Co-developed-by: Sean Christopherson <seanjc@google•com>
> Link: https://lore.kernel.org/r/20250522233733.3176144-9-seanjc@google.com
> Link: https://lore.kernel.org/all/935a82e3-f7ad-47d7-aaaf-f3d2b62ed768@amd.com
> Signed-off-by: Sean Christopherson <seanjc@google•com>
> ---
>   arch/x86/kvm/svm/sev.c | 71 ++++++++++++++++++++++++++++++++++++------
>   arch/x86/kvm/svm/svm.h |  1 +
>   2 files changed, 63 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index ed39f8a4d9df..a62cd27a4f45 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -447,7 +447,12 @@ static int __sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp,
>   	init_args.probe = false;
>   	ret = sev_platform_init(&init_args);
>   	if (ret)
> -		goto e_free;
> +		goto e_free_asid;
> +
> +	if (!zalloc_cpumask_var(&sev->have_run_cpus, GFP_KERNEL_ACCOUNT)) {
> +		ret = -ENOMEM;
> +		goto e_free_asid;
> +	}
>   
>   	/* This needs to happen after SEV/SNP firmware initialization. */
>   	if (vm_type == KVM_X86_SNP_VM) {
> @@ -465,6 +470,8 @@ static int __sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp,
>   	return 0;
>   
>   e_free:
> +	free_cpumask_var(sev->have_run_cpus);
> +e_free_asid:
>   	argp->error = init_args.error;
>   	sev_asid_free(sev);
>   	sev->asid = 0;
> @@ -709,16 +716,31 @@ static void sev_clflush_pages(struct page *pages[], unsigned long npages)
>   	}
>   }
>   
> -static void sev_writeback_caches(void)
> +static void sev_writeback_caches(struct kvm *kvm)
>   {
> +	/*
> +	 * Note, the caller is responsible for ensuring correctness if the mask
> +	 * can be modified, e.g. if a CPU could be doing VMRUN.
> +	 */
> +	if (cpumask_empty(to_kvm_sev_info(kvm)->have_run_cpus))
> +		return;
> +
>   	/*
>   	 * Ensure that all dirty guest tagged cache entries are written back
>   	 * before releasing the pages back to the system for use.  CLFLUSH will
>   	 * not do this without SME_COHERENT, and flushing many cache lines
>   	 * individually is slower than blasting WBINVD for large VMs, so issue
> -	 * WBNOINVD (or WBINVD if the "no invalidate" variant is unsupported).
> +	 * WBNOINVD (or WBINVD if the "no invalidate" variant is unsupported)
> +	 * on CPUs that have done VMRUN, i.e. may have dirtied data using the
> +	 * VM's ASID.
> +	 *
> +	 * For simplicity, never remove CPUs from the bitmap.  Ideally, KVM
> +	 * would clear the mask when flushing caches, but doing so requires
> +	 * serializing multiple calls and having responding CPUs (to the IPI)
> +	 * mark themselves as still running if they are running (or about to
> +	 * run) a vCPU for the VM.
>   	 */
> -	wbnoinvd_on_all_cpus();
> +	wbnoinvd_on_cpus_mask(to_kvm_sev_info(kvm)->have_run_cpus);
>   }
>   
>   static unsigned long get_num_contig_pages(unsigned long idx,
> @@ -2046,6 +2068,17 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
>   	if (ret)
>   		goto out_source_vcpu;
>   
> +	/*
> +	 * Allocate a new have_run_cpus for the destination, i.e. don't copy
> +	 * the set of CPUs from the source.  If a CPU was used to run a vCPU in
> +	 * the source VM but is never used for the destination VM, then the CPU
> +	 * can only have cached memory that was accessible to the source VM.
> +	 */
> +	if (!zalloc_cpumask_var(&dst_sev->have_run_cpus, GFP_KERNEL_ACCOUNT)) {
> +		ret = -ENOMEM;
> +		goto out_source_vcpu;
> +	}
> +
>   	sev_migrate_from(kvm, source_kvm);
>   	kvm_vm_dead(source_kvm);
>   	cg_cleanup_sev = src_sev;
> @@ -2707,7 +2740,7 @@ int sev_mem_enc_unregister_region(struct kvm *kvm,
>   		goto failed;
>   	}
>   
> -	sev_writeback_caches();
> +	sev_writeback_caches(kvm);
>   
>   	__unregister_enc_region_locked(kvm, region);
>   
> @@ -2749,13 +2782,18 @@ int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
>   		goto e_unlock;
>   	}
>   
> +	mirror_sev = to_kvm_sev_info(kvm);
> +	if (!zalloc_cpumask_var(&mirror_sev->have_run_cpus, GFP_KERNEL_ACCOUNT)) {
> +		ret = -ENOMEM;
> +		goto e_unlock;
> +	}
> +
>   	/*
>   	 * The mirror kvm holds an enc_context_owner ref so its asid can't
>   	 * disappear until we're done with it
>   	 */
>   	source_sev = to_kvm_sev_info(source_kvm);
>   	kvm_get_kvm(source_kvm);
> -	mirror_sev = to_kvm_sev_info(kvm);
>   	list_add_tail(&mirror_sev->mirror_entry, &source_sev->mirror_vms);
>   
>   	/* Set enc_context_owner and copy its encryption context over */
> @@ -2817,7 +2855,13 @@ void sev_vm_destroy(struct kvm *kvm)
>   
>   	WARN_ON(!list_empty(&sev->mirror_vms));
>   
> -	/* If this is a mirror_kvm release the enc_context_owner and skip sev cleanup */
> +	free_cpumask_var(sev->have_run_cpus);
> +
> +	/*
> +	 * If this is a mirror VM, remove it from the owner's list of a mirrors
> +	 * and skip ASID cleanup (the ASID is tied to the lifetime of the owner).
> +	 * Note, mirror VMs don't support registering encrypted regions.
> +	 */
>   	if (is_mirroring_enc_context(kvm)) {
>   		struct kvm *owner_kvm = sev->enc_context_owner;
>   
> @@ -3106,7 +3150,7 @@ static void sev_flush_encrypted_page(struct kvm_vcpu *vcpu, void *va)
>   	return;
>   
>   do_sev_writeback_caches:
> -	sev_writeback_caches();
> +	sev_writeback_caches(vcpu->kvm);
>   }
>   
>   void sev_guest_memory_reclaimed(struct kvm *kvm)
> @@ -3119,7 +3163,7 @@ void sev_guest_memory_reclaimed(struct kvm *kvm)
>   	if (!sev_guest(kvm) || sev_snp_guest(kvm))
>   		return;
>   
> -	sev_writeback_caches();
> +	sev_writeback_caches(kvm);
>   }
>   
>   void sev_free_vcpu(struct kvm_vcpu *vcpu)
> @@ -3451,6 +3495,15 @@ int pre_sev_run(struct vcpu_svm *svm, int cpu)
>   	if (sev_es_guest(kvm) && !VALID_PAGE(svm->vmcb->control.vmsa_pa))
>   		return -EINVAL;
>   
> +	/*
> +	 * To optimize cache flushes when memory is reclaimed from an SEV VM,
> +	 * track physical CPUs that enter the guest for SEV VMs and thus can
> +	 * have encrypted, dirty data in the cache, and flush caches only for
> +	 * CPUs that have entered the guest.
> +	 */
> +	if (!cpumask_test_cpu(cpu, to_kvm_sev_info(kvm)->have_run_cpus))
> +		cpumask_set_cpu(cpu, to_kvm_sev_info(kvm)->have_run_cpus);
> +
>   	/* Assign the asid allocated with this SEV guest */
>   	svm->asid = asid;
>   
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index e6f3c6a153a0..a7c6f07260cf 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -113,6 +113,7 @@ struct kvm_sev_info {
>   	void *guest_req_buf;    /* Bounce buffer for SNP Guest Request input */
>   	void *guest_resp_buf;   /* Bounce buffer for SNP Guest Request output */
>   	struct mutex guest_req_mutex; /* Must acquire before using bounce buffers */
> +	cpumask_var_t have_run_cpus; /* CPUs that have done VMRUN for this VM. */
>   };
>   
>   #define SEV_POLICY_NODBG	BIT_ULL(0)
> 
> base-commit: a77896eea33db6fe393d1db1380e2e52f74546a2
> --


      reply	other threads:[~2025-07-15  6:37 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-14  5:21 [BUG] NULL pointer dereference in sev_writeback_caches during KVM SEV migration kselftest on AMD platform Aithal, Srikanth
2025-07-14 14:12 ` Zheyun Shen
2025-07-14 14:48   ` Sean Christopherson
2025-07-14 14:56     ` Zheyun Shen
2025-07-14 15:14       ` Sean Christopherson
2025-07-14 16:50         ` Sean Christopherson
2025-07-14 22:17           ` Sean Christopherson
2025-07-15  6:37             ` Aithal, Srikanth [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5e2d0b4c-fd1b-41f0-a099-64c897b7ec6b@amd.com \
    --to=sraithal@amd$(echo .)com \
    --cc=kvm@vger$(echo .)kernel.org \
    --cc=linux-kernel@vger$(echo .)kernel.org \
    --cc=linux-next@vger$(echo .)kernel.org \
    --cc=seanjc@google$(echo .)com \
    --cc=szy0127@sjtu$(echo .)edu.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox