-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync kvm patch for loongarch #635
base: linux-6.6.y
Are you sure you want to change the base?
Conversation
Upstream: no On LoongArch, the host and guest have their own PMU CSRs registers and they share PMU hardware resources. A set of PMU CSRs consists of a CTRL register and a CNTR register. We can set which PMU CSRs are used by the guest by writing to the GCFG register [24: 26] bits. On KVM side: - Save the host PMU CSRs into structure kvm_context. - If the host supports the PMU feature. - When entering guest mode. save the host PMU CSRs and restore the guest PMU CSRs. - When exiting guest mode, save the guest PMU CSRs and restore the host PMU CSRs. Signed-off-by: Song Gao <gaosong@loongson.cn> Link: https://lore.kernel.org/all/20240613120539.41021-1-gaosong@loongson.cn/ Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Upstream: no Add iocsr and mmio memory read and write simulation to the kernel. When the VM accesses the device address space through iocsr instructions or mmio, it does not need to return to the qemu user mode but directly completes the access in the kernel mode. Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Upstream: no Added device model for IPI interrupt controller, implemented basic create destroy interface, and registered device model to kvm device table. Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Upstream: no Implementation of IPI interrupt controller address space read and write function simulation. Signed-off-by: Min Zhou <zhoumin@loongson.cn> Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Upstream: no Implements the communication interface between the user mode program and the kernel in IPI interrupt control simulation, which is used to obtain or send the simulation data of the interrupt controller in the user mode process, and is used in VM migration or VM saving and restoration. Signed-off-by: Min Zhou <zhoumin@loongson.cn> Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Upstream: no Added device model for EXTIOI interrupt controller, implemented basic create destroy interface, and registered device model to kvm device table. Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Upstream: no Implementation of EXTIOI interrupt controller address space read and write function simulation. Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Upstream: no Implements the communication interface between the user mode program and the kernel in EXTIOI interrupt control simulation, which is used to obtain or send the simulation data of the interrupt controller in the user mode process, and is used in VM migration or VM saving and restoration. Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Upstream: no Added device model for PCHPIC interrupt controller, implemented basic create destroy interface, and registered device model to kvm device table. Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Upstream: no Implementation of IPI interrupt controller address space read and write function simulation. Implement interrupt injection interface under loongarch. Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Upstream: no Implements the communication interface between the user mode program and the kernel in PCHPIC interrupt control simulation, which is used to obtain or send the simulation data of the interrupt controller in the user mode process, and is used in VM migration or VM saving and restoration. Signed-off-by: Tianrui Zhao <zhaotianrui@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Upstream: no Enable the KVM_IRQ_ROUTING KVM_IRQCHIP KVM_MSI configuration item, increase the KVM_CAP_IRQCHIP capability, and implement the query interface of the kernel irqchip. Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Upstream: no This bug from the PR: https://gitee.com/anolis/cloud-kernel/pulls/3517 [...] arch/loongarch/kvm/vcpu.c:1253:13: error: redefinition of 'kvm_lose_pmu' 1253 | static void kvm_lose_pmu(struct kvm_vcpu *vcpu) | ^~~~~~~~~~~~ arch/loongarch/kvm/vcpu.c:202:13: note: previous definition of 'kvm_lose_pmu' with type 'void(struct kvm_vcpu *)' 202 | static void kvm_lose_pmu(struct kvm_vcpu *vcpu) | ^~~~~~~~~~~~ arch/loongarch/kvm/vcpu.c: In function 'kvm_lose_pmu': arch/loongarch/kvm/vcpu.c:1257:38: error: 'KVM_LARCH_PERF' undeclared (first use in this function); did you mean 'KVM_LARCH_LSX'? 1257 | if (!(vcpu->arch.aux_inuse & KVM_LARCH_PERF)) | ^~~~~~~~~~~~~~ | KVM_LARCH_LSX arch/loongarch/kvm/vcpu.c:1280:35: error: 'KVM_PMU_PLV_ENABLE' undeclared (first use in this function); did you mean 'KVM_PV_ENABLE'? 1280 | & KVM_PMU_PLV_ENABLE) == 0) | ^~~~~~~~~~~~~~~~~~ | KVM_PV_ENABLE [..] Signed-off-by: Song Gao <gaosong@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
commit a2cd37863518 (""LoongArch: KVM: Add PV IPI support on host side) Conflict: none Backport-reason: Synchronize upstream linux loongarch kvm patch to support loongarch virtualization. Checkpatch: no, to be consistent with upstream commit. On LoongArch system, IPI hw uses iocsr registers. There are one iocsr register access on IPI sending, and two iocsr access on IPI receiving for the IPI interrupt handler. In VM mode all iocsr accessing will cause VM to trap into hypervisor. So with one IPI hw notification there will be three times of trap. In this patch PV IPI is added for VM, hypercall instruction is used for IPI sender, and hypervisor will inject an SWI to the destination vcpu. During the SWI interrupt handler, only CSR.ESTAT register is written to clear irq. CSR.ESTAT register access will not trap into hypervisor, so with PV IPI supported, there is one trap with IPI sender, and no trap with IPI receiver, there is only one trap with IPI notification. Also this patch adds IPI multicast support, the method is similar with x86. With IPI multicast support, IPI notification can be sent to at most 128 vcpus at one time. It greatly reduces the times of trapping into hypervisor. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
commit 30cf03a606b7 (""LoongArch: KVM: Add PV IPI support on guest side) Conflict: none Backport-reason: Synchronize upstream linux loongarch kvm patch to support loongarch virtualization. Checkpatch: no, to be consistent with upstream commit. PARAVIRT config option and PV IPI is added for the guest side, function pv_ipi_init() is used to add IPI sending and IPI receiving hooks. This function firstly checks whether system runs in VM mode, and if kernel runs in VM mode, it will call function kvm_para_available() to detect the current hypervirsor type (now only KVM type detection is supported). The paravirt functions can work only if current hypervisor type is KVM, since there is only KVM supported on LoongArch now. PV IPI uses virtual IPI sender and virtual IPI receiver functions. With virtual IPI sender, IPI message is stored in memory rather than emulated HW. IPI multicast is also supported, and 128 vcpus can received IPIs at the same time like X86 KVM method. Hypercall method is used for IPI sending. With virtual IPI receiver, HW SWI0 is used rather than real IPI HW. Since VCPU has separate HW SWI0 like HW timer, there is no trap in IPI interrupt acknowledge. Since IPI message is stored in memory, there is no trap in getting IPI message. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
commit bfd2ecade039 ("LoongArch: KVM: Add software breakpoint support") Conflict: none Backport-reason: Synchronize upstream linux loongarch kvm patch to support loongarch virtualization. Checkpatch: no, to be consistent with upstream commit. When VM runs in kvm mode, system will not exit to host mode when executing a general software breakpoint instruction such as INSN_BREAK, trap exception happens in guest mode rather than host mode. In order to debug guest kernel on host side, one mechanism should be used to let VM exit to host mode. Here a hypercall instruction with a special code is used for software breakpoint usage. VM exits to host mode and kvm hypervisor identifies the special hypercall code and sets exit_reason with KVM_EXIT_DEBUG. And then let qemu handle it. Idea comes from ppc kvm, one api KVM_REG_LOONGARCH_DEBUG_INST is added to get the hypercall code. VMM needs get sw breakpoint instruction with this api and set the corresponding sw break point for guest kernel. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
commit 4bf3b972cad4 ("LoongArch: KVM: Add mmio trace events support") Conflict: none Backport-reason: Synchronize upstream linux loongarch kvm patch to support loongarch virtualization. Checkpatch: no, to be consistent with upstream commit. Add mmio trace events support, currently generic mmio events KVM_TRACE_MMIO_WRITE/xxx_READ/xx_READ_UNSATISFIED are added here. Also vcpu id field is added for all kvm trace events, since perf KVM tool parses vcpu id information for kvm entry event. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
commit 2584b0859859 ("LoongArch: KVM: Sync pending interrupt when getting ESTAT from user mode") Conflict: none Backport-reason: Synchronize upstream linux loongarch kvm patch to support loongarch virtualization. Checkpatch: no, to be consistent with upstream commit. Currently interrupts are posted and cleared with the asynchronous mode, meanwhile they are saved in SW state vcpu::arch::irq_pending and vcpu:: arch::irq_clear. When vcpu is ready to run, pending interrupt is written back to CSR.ESTAT register from SW state vcpu::arch::irq_pending at the guest entrance. During VM migration stage, vcpu is put into stopped state, however pending interrupts are not synced to CSR.ESTAT register. So there will be interrupt lost when VCPU is migrated to another host machines. Here in this patch when ESTAT CSR register is read from VMM user mode, pending interrupts are synchronized to ESTAT also. So that VMM can get correct pending interrupts. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
commit 846f1299b2e2 ("LoongArch: KVM: Delay secondary mmu tlb flush until guest entry") Conflict: none Backport-reason: Synchronize upstream linux loongarch kvm patch to support loongarch virtualization. Checkpatch: no, to be consistent with upstream commit. With hardware assisted virtualization, there are two level HW mmu, one is GVA to GPA mapping, the other is GPA to HPA mapping which is called secondary mmu in generic. If there is page fault for secondary mmu, there needs tlb flush operation indexed with fault GPA address and VMID. VMID is stored at register CSR.GSTAT and will be reload or recalculated before guest entry. Currently CSR.GSTAT is not saved and restored during VCPU context switch, instead it is recalculated during guest entry. So CSR.GSTAT is effective only when a VCPU runs in guest mode, however it may not be effective if the VCPU exits to host mode. Since register CSR.GSTAT may be stale, it may records the VMID of the last schedule-out VCPU, rather than the current VCPU. Function kvm_flush_tlb_gpa() should be called with its real VMID, so here move it to the guest entrance. Also an arch-specific request id KVM_REQ_TLB_FLUSH_GPA is added to flush tlb for secondary mmu, and it can be optimized if VMID is updated, since all guest tlb entries will be invalid if VMID is updated. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
commit 038f365107ed ("LoongArch: KVM: Select huge page only if secondary mmu supports it") Conflict: none Backport-reason: Synchronize upstream linux loongarch kvm patch to support loongarch virtualization. Checkpatch: no, to be consistent with upstream commit. Currently page level selection about secondary mmu depends on memory slot and page level about host mmu. There will be problems if page level of secondary mmu is zero already. Huge page cannot be selected if there is normal page mapped in secondary mmu already, since it is not supported to merge normal pages into huge pages now. So page level selection should depend on the following three conditions. 1. Memslot is aligned for huge page and vm is not migrating. 2. Page level of host mmu is also huge page. 3. Page level of secondary mmu is suituable for huge page. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
commit 1706dcacd354 ("LoongArch: KVM: Discard dirty page tracking on readonly memslot") Conflict: none Backport-reason: Synchronize upstream linux loongarch kvm patch to support loongarch virtualization. Checkpatch: no, to be consistent with upstream commit. For readonly memslot such as UEFI BIOS or UEFI var space, guest cannot write this memory space directly. So it is not necessary to track dirty pages for readonly memslot. Here we make such optimization in function kvm_arch_commit_memory_region(). Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
commit 21d29364e1d1 ("LoongArch: KVM: Add memory barrier before update pmd entry") Conflict: none Backport-reason: Synchronize upstream linux loongarch kvm patch to support loongarch virtualization. Checkpatch: no, to be consistent with upstream commit. When updating pmd entry such as allocating new pmd page or splitting huge page into normal page, it is necessary to firstly update all pte entries, and then update pmd entry. It is weak order with LoongArch system, there will be problem if other VCPUs see pmd update firstly while ptes are not updated. Here smp_wmb() is added to assure this. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
commit e31e55145f76 (""LoongArch: KVM: Add dirty bitmap initially all set support) Conflict: none Backport-reason: Synchronize upstream linux loongarch kvm patch to support loongarch virtualization. Checkpatch: no, to be consistent with upstream commit. Add KVM_DIRTY_LOG_INITIALLY_SET support on LoongArch system, this feature comes from other architectures like x86 and arm64. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
commit dfd5f11227e6 ("LoongArch: KVM: Mark page accessed and dirty with page ref added") Conflict: none Backport-reason: Synchronize upstream linux loongarch kvm patch to support loongarch virtualization. Checkpatch: no, to be consistent with upstream commit. Function kvm_map_page_fast() is fast path of secondary mmu page fault flow, pfn is parsed from secondary mmu page table walker. However the corresponding page reference is not added, it is dangerious to access page out of mmu_lock. Here page ref is added inside mmu_lock, function kvm_set_pfn_accessed() and kvm_set_pfn_dirty() is called with page ref added, so that the page will not be freed by others. Also kvm_set_pfn_accessed() is removed here since it is called in the following function kvm_release_pfn_clean(). Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
commit d808899636ea ("LoongArch: KVM: always make pte young in page map's fast path") Conflict: none Backport-reason: Synchronize upstream linux loongarch kvm patch to support loongarch virtualization. Checkpatch: no, to be consistent with upstream commit. It seems redundant to check if pte is young before the call to kvm_pte_mkyoung() in kvm_map_page_fast(). Just remove the check. Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Jia Qingtong <jiaqingtong97@gmail.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
commit 40e12dbc794b ("LoongArch: KVM: Add PV steal time support in host side") Conflict: none Backport-reason: Synchronize upstream linux loongarch kvm patch to support loongarch virtualization. Checkpatch: no, to be consistent with upstream commit. Add ParaVirt steal time feature in host side, VM can search supported features provided by KVM hypervisor, a feature KVM_FEATURE_STEAL_TIME is added here. Like x86, steal time structure is saved in guest memory, one hypercall function KVM_HCALL_FUNC_NOTIFY is added to notify KVM to enable this feature. One CPU attr ioctl command KVM_LOONGARCH_VCPU_PVTIME_CTRL is added to save and restore the base address of steal time structure when a VM is migrated. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
commit 00c8fa6c61c5 (""LoongArch: KVM: Add PV steal time support in guest side) Conflict: none Backport-reason: Synchronize upstream linux loongarch kvm patch to support loongarch virtualization. Checkpatch: no, to be consistent with upstream commit. Per-cpu struct kvm_steal_time is added here, its size is 64 bytes and also defined as 64 bytes, so that the whole structure is in one physical page. When a VCPU is online, function pv_enable_steal_time() is called. This function will pass guest physical address of struct kvm_steal_time and tells hypervisor to enable steal time. When a vcpu is offline, physical address is set as 0 and tells hypervisor to disable steal time. Here is an output of vmstat on guest when there is workload on both host and guest. It shows steal time stat information. procs -----------memory---------- -----io---- -system-- ------cpu----- r b swpd free inact active bi bo in cs us sy id wa st 15 1 0 7583616 184112 72208 20 0 162 52 31 6 43 0 20 17 0 0 7583616 184704 72192 0 0 6318 6885 5 60 8 5 22 16 0 0 7583616 185392 72144 0 0 1766 1081 0 49 0 1 50 16 0 0 7583616 184816 72304 0 0 6300 6166 4 62 12 2 20 18 0 0 7583632 184480 72240 0 0 2814 1754 2 58 4 1 35 Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
commit 0063265ee183 ("perf kvm: Add kvm-stat for loongarch64") Conflict: none Backport-reason: Synchronize upstream linux loongarch kvm patch to support loongarch virtualization. Checkpatch: no, to be consistent with upstream commit. Add support for 'perf kvm stat' on loongarch64 platform, now only kvm exit event is supported. Here is example output about "perf kvm --host stat report" command Event name Samples Sample% Time (ns) Time% Mean Time (ns) Mem Store 83969 51.00% 625697070 8.00% 7451 Mem Read 37641 22.00% 112485730 1.00% 2988 Interrupt 15542 9.00% 20620190 0.00% 1326 IOCSR 15207 9.00% 94296190 1.00% 6200 Hypercall 4873 2.00% 12265280 0.00% 2516 Idle 3713 2.00% 6322055860 87.00% 1702681 FPU 1819 1.00% 2750300 0.00% 1511 Inst Fetch 502 0.00% 1341740 0.00% 2672 Mem Modify 324 0.00% 602240 0.00% 1858 CPUCFG 55 0.00% 77610 0.00% 1411 CSR 12 0.00% 19690 0.00% 1640 LASX 3 0.00% 4870 0.00% 1623 LSX 2 0.00% 2100 0.00% 1050 Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
commit 05086585ea4f ("KVM: Discard zero mask with function kvm_dirty_ring_reset") Conflict: none Backport-reason: Synchronize upstream linux loongarch kvm patch to support loongarch virtualization. Checkpatch: no, to be consistent with upstream commit. Function kvm_reset_dirty_gfn may be called with parameters cur_slot / cur_offset / mask are all zero, it does not represent real dirty page. It is not necessary to clear dirty page in this condition. Also return value of macro __fls() is undefined if mask is zero which is called in funciton kvm_reset_dirty_gfn(). Here just return. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Message-ID: <20240613122803.1031511-1-maobibo@loongson.cn> [Move the conditional inside kvm_reset_dirty_gfn; suggested by Sean Christopherson. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
commit 0c9fa3e92629 ("LoongArch: KVM: Invalidate guest steal time address on vCPU reset") Conflict: none Backport-reason: Synchronize upstream linux loongarch kvm patch to support loongarch virtualization. Checkpatch: no, to be consistent with upstream commit. If ParaVirt steal time feature is enabled, there is a percpu gpa address passed from guest vCPU and host modifies guest memory space with this gpa address. When vCPU is reset normally, it will notify host and invalidate gpa address. However if VM is crashed and VMM reboots VM forcely, the vCPU reboot notification callback will not be called in VM. Host needs invalidate the gpa address, else host will modify guest memory during VM reboots. Here it is invalidated from the vCPU KVM_REG_LOONGARCH_VCPU_RESET ioctl interface. Also funciton kvm_reset_timer() is removed at vCPU reset stage, since SW emulated timer is only used in vCPU block state. When a vCPU is removed from the block waiting queue, kvm_restore_timer() is called and SW timer is cancelled. And the timer register is also cleared at VMM when a vCPU is reset. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Upstream: no Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Upstream: no On LoongArch system, there are two places to set cpu numa node. One is in arch specified function smp_prepare_boot_cpu(), the other is in generic function early_numa_node_init(). The latter will overwrite the numa node information. With hot-added cpu without numa information, cpu_logical_map() fails to its physical cpuid at beginning since it is not enabled in ACPI MADT table. So function early_cpu_to_node() also fails to get its numa node for hot-added cpu, and generic function early_numa_node_init() will overwrite with incorrect numa node. APIs topo_get_cpu() and topo_add_cpu() is added here, like other architectures logic cpu is allocated when parsing MADT table. When parsing SRAT table or hot-add cpu, logic cpu is acquired by searching all allocated logical cpu with matched physical id. It solves such problems such as: 1. Boot cpu is not the first entry in MADT table, the first entry will be overwritten with later boot cpu. 2. Physical cpu id not presented in MADT table is invalid, in later SRAT/hot-add cpu parsing, invalid physical cpu detected is added 3. For hot-add cpu, its logic cpu is allocated in MADT table parsing, so early_cpu_to_node() can be used for hot-add cpu and cpu_to_node() is correct for hot-add cpu. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Upstream: no Fix pch pic spinlock deadlock Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Upstream: no Added iommu support for loongarch Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Upstream: no When the virtual machine is restarted, the data in extioi is not zeroed, and there is a residual set interrupt bit, resulting in a hang Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Upstream: no Fixed 16 vfio devices cannot be pass-through to VMS. Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Upstream: no Register LLBCTL is separated CSR register from host, host exception eret instruction will clear host LLBCTL CSR register, guest exception will clear guest LLBCTL CSR register. VCPU0 atomic64_fetch_add_unless VCPU1 atomic64_fetch_add_unless 0: ll.d %[p], %[c] beq %[p], %[u], 1f Here secondary mmu mapping is changed, hpa page is is replaced with new page. And VCPU1 executed atomic instruction on new page. 0: ll.d %[p], %[c] beq %[p], %[u], 1f add.d %[rc], %[p], %[a] sc.d %[rc], %[c] add.d %[rc], %[p], %[a] sc.d %[rc], %[c] LLBCTL is on and it represents the memory is not modified, sc.d will modify the memory directly. Here clear guest LLBCTL_WCLLB register when mapping is the changed. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Upstream: no Add the ptw feature bit to cpucfg Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Upstream: no Delete duplicate function definitions Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Upstream: no We need to empty the data in irqchip when the VM restarts. Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Upstream: no Fixed the probability of physical machine crashing during repeated restarts after virtual machine passthrough Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Upstream: no Before ptw is enabled, when a virtual machine writes data to a physical page, a page modification exception will be triggered. In the exception processing, the dirty position of pte is set, and the page dirty bitmap of kvm is set. During migration, the page dirty bitmap is used for dirty page migration. After ptw is enabled, when the virtual machine writes data to the physical page, the ptw hardware directly writes the dirty bit of the pte without triggering page modification exceptions. kvm cannot set page dirty bitmap correctly, resulting in partial data loss and migration failure. In order to solve this problem, we use the write bit and dirty bit of pte to mark whether the current page needs to be migrated, that is, the write bit and dirty bit of pte are cleared to zero at the beginning of the migration, so that even when ptw is enabled, the page modification exception will be triggered. In this way, the correct dirty page marking process is entered to complete the correct migration of memory, and the 50bit of pte is used to record the original write property in order to restore the correct write property of pte after the dirty page marking is completed. Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
Upstream: no The extioi controller can send interrupts to only four cpus and cannot send interrupts to other cpus. This patch enables extioi to send interrupts to a maximum of 256 cpus. Signed-off-by: Xianglai Li <lixianglai@loongson.cn>
🧙 Sourcery has finished reviewing your pull request! Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @lixianglai. Thanks for your PR. I'm waiting for a deepin-community member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @lixianglai - I've reviewed your changes - here's some feedback:
Overall Comments:
- The PMU save/restore functions could be simplified by using a loop instead of unrolling the registers.
- Consider using helper functions to improve the readability of the MMIO read/write emulation code.
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟡 Complexity: 5 issues found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
return ret; | ||
|
||
/* Register loongarch pch pic interrupt controller interface. */ | ||
ret = kvm_loongarch_register_pch_pic_device(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Device registration return value handling.
Since the registration of pch_pic (and the other devices) is performed sequentially, ensure that on failure, any previously registered devices are correctly cleaned up to avoid resource leaks or inconsistent state.
Suggested implementation:
ret = kvm_loongarch_register_ipi_device();
if (ret)
goto fail_ipi;
/* Register loongarch extioi interrupt controller interface. */
ret = kvm_loongarch_register_extioi_device();
if (ret)
goto fail_extioi;
/* Register loongarch pch pic interrupt controller interface. */
ret = kvm_loongarch_register_pch_pic_device();
if (ret)
goto fail_pch_pic;
return 0;
fail_pch_pic:
/* Clean up previously registered extioi device */
kvm_loongarch_unregister_extioi_device();
fail_extioi:
/* Clean up previously registered ipi device */
kvm_loongarch_unregister_ipi_device();
fail_ipi:
return ret;
}
Ensure that the functions kvm_loongarch_unregister_ipi_device() and kvm_loongarch_unregister_extioi_device() exist and correctly reverse the initialization performed during registration.
} | ||
|
||
#define KVM_IPI_CLUSTER_SIZE (2 * BITS_PER_LONG) | ||
#define KVM_IPI_CLUSTER_SIZE (2 * BITS_PER_LONG) | ||
|
||
static void pv_send_ipi_mask(const struct cpumask *mask, unsigned int action) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider extracting the bit-twiddling and clustering logic in pv_send_ipi_mask()
into a helper function to improve readability and reduce nested conditionals within the loop.
Consider extracting the bit‐twiddling and clustering logic in
`pv_send_ipi_mask()` into a helper function to isolate the “cluster update”
logic. This would reduce the number of nested conditionals in the loop and
make the code easier to follow. For example, you can create a helper that
updates the cluster variables and the bitmap:
```c
static void update_cluster(int cpu, int *min, int *max, __uint128_t *bitmap)
{
if (!*bitmap) {
*min = *max = cpu;
} else if (cpu < *min && cpu > (*max - KVM_IPI_CLUSTER_SIZE)) {
*bitmap <<= (*min - cpu);
*min = cpu;
} else if (cpu > *min && cpu < (*min + KVM_IPI_CLUSTER_SIZE)) {
*max = cpu > *max ? cpu : *max;
} else {
/* Cluster full; send IPI with current bitmap and start new cluster */
kvm_hypercall3(KVM_HCALL_FUNC_IPI, (unsigned long)*bitmap,
(unsigned long)(*bitmap >> BITS_PER_LONG), *min);
*min = *max = cpu;
*bitmap = 0;
}
__set_bit(cpu - *min, (unsigned long *)bitmap);
}
Then in your loop in pv_send_ipi_mask()
, simply call:
for_each_cpu(i, mask) {
info = &per_cpu(irq_stat, i);
old = atomic_fetch_or(action, &info->message);
if (old)
continue;
cpu = cpu_logical_map(i);
update_cluster(cpu, &min, &max, &bitmap);
}
This refactoring isolates the complex decision‐logic and keeps the loop more
readable while retaining all functionality.
arch/loongarch/mm/pgtable.c
Outdated
@@ -116,6 +116,26 @@ void pud_init(void *addr) | |||
EXPORT_SYMBOL_GPL(pud_init); | |||
#endif | |||
|
|||
void kernel_pte_init(void *addr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider using a simple loop or memset instead of an unrolled loop with manual assignments to improve code clarity and maintainability
Consider replacing the unrolled loop with a simple, clear loop. For example, instead of mixing pointer arithmetic like:
p[-3] = _PAGE_GLOBAL;
p[-2] = _PAGE_GLOBAL;
p[-1] = _PAGE_GLOBAL;
you could write:
for (int i = 0; i < NUM_ENTRIES; i++) {
p[i] = _PAGE_GLOBAL;
}
If all entries share the same value in a contiguous memory block, even a `memset()` (or a helper wrapping it) could make the intent clearer. This change preserves functionality without the non-intuitive backward indexing, improving readability and maintainability.
} | ||
} | ||
|
||
static int loongarch_extioi_writeb(struct kvm_vcpu *vcpu, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider extracting common bit-manipulation loops and update routines into helper functions to improve readability and reduce code duplication, such as the IRQ update loop using ffs().
Consider extracting common bit-manipulation loops and update routines as helper functions to reduce nested inline functions and duplicated control flow. For instance, the loops that iterate with ffs()
(or equivalent) to update IRQ state appear in multiple places. Consolidating this code into a well—named helper function will make the intent clearer and decouple the locking/update logic.
For example, you could refactor the IRQ update loop into a helper function:
static void update_pending_irq(struct loongarch_extioi *s, int base_irq, u8 bitmask, int level)
{
int irq = ffs(bitmask);
while (irq) {
extioi_update_irq(s, base_irq + irq - 1, level);
bitmask &= ~(1 << (irq - 1));
irq = ffs(bitmask);
}
}
Then in your write functions you would replace patterns like:
irq = ffs(coreisr);
while (irq != 0) {
extioi_update_irq(s, irq - 1 + index * 8, 0);
coreisr &= ~(1 << (irq - 1));
irq = ffs(coreisr);
}
with:
update_pending_irq(s, index * 8, coreisr, 0);
Similarly, extract routines for common bit fiddling when handling enable/disable transitions. This keeps functionality intact while improving readability and reducing nesting.
#include <linux/count_zeros.h> | ||
|
||
/* update the isr according to irq level and route irq to extioi */ | ||
static void pch_pic_update_irq(struct loongarch_pch_pic *s, int irq, int level) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider simplifying the 'pch_pic_update_irq' function by using early returns and using the 'for_each_set_bit()' macro in 'pch_pic_update_batch_irqs' to reduce nested logic and repeated code.
It would help to simplify the duplicated branches by “failing‐fast” in the update function. For example, you can remove the duplicated code by using early returns rather than nested if/else. Also consider using the kernel helper macros (e.g. for_each_set_bit()
) in your batch update.
Suggested refactoring for pch_pic_update_irq
:
static void pch_pic_update_irq(struct loongarch_pch_pic *s, int irq, int level)
{
u64 mask = 1ULL << irq;
if (level) {
if (!(mask & s->irr & ~s->mask))
return;
s->isr |= mask;
} else {
if (!(mask & s->isr & ~s->irr))
return;
s->isr &= ~mask;
}
irq = s->htmsi_vector[irq];
extioi_set_irq(s->kvm->arch.extioi, irq, level);
}
And for pch_pic_update_batch_irqs
:
static void pch_pic_update_batch_irqs(struct loongarch_pch_pic *s, u64 irq_mask, int level)
{
unsigned int irq;
for_each_set_bit(irq, &irq_mask, sizeof(irq_mask) * 8)
pch_pic_update_irq(s, irq, level);
}
These changes reduce nested logic and repeated code while preserving the original behavior.
spin_unlock(&vcpu->arch.ipi_state.lock); | ||
} | ||
|
||
static uint64_t read_mailbox(struct kvm_vcpu *vcpu, int offset, int len) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider refactoring common lock-acquire/release and mailbox access patterns to reduce code duplication and improve readability by extracting helper functions and consolidating logic into a single function with a boolean write parameter, then creating simple wrapper functions for read and write operations, and applying similar factoring to other areas with repeated lock patterns or switch statements..
Consider refactoring common lock‐acquire/release and mailbox access patterns. For example, instead of duplicating similar locking and pointer arithmetic in both read_mailbox and write_mailbox, you can extract a helper:
static inline void *get_mailbox_ptr(struct kvm_vcpu *vcpu, int offset)
{
return (void *)vcpu->arch.ipi_state.buf + (offset - 0x20);
}
Then rewrite the two functions to only differ in the read/write part:
```c
static uint64_t mailbox_access(struct kvm_vcpu *vcpu, int offset, int len,
bool write, uint64_t data)
{
void *pbuf;
uint64_t ret = 0;
spin_lock(&vcpu->arch.ipi_state.lock);
pbuf = get_mailbox_ptr(vcpu, offset);
if (write) {
if (len == 1)
*(unsigned char *)pbuf = (unsigned char)data;
else if (len == 2)
*(unsigned short *)pbuf = (unsigned short)data;
else if (len == 4)
*(unsigned int *)pbuf = (unsigned int)data;
else if (len == 8)
*(unsigned long *)pbuf = (unsigned long)data;
else
kvm_err("%s: unknown data len: %d\n", __func__, len);
} else {
if (len == 1)
ret = *(unsigned char *)pbuf;
else if (len == 2)
ret = *(unsigned short *)pbuf;
else if (len == 4)
ret = *(unsigned int *)pbuf;
else if (len == 8)
ret = *(unsigned long *)pbuf;
else
kvm_err("%s: unknown data len: %d\n", __func__, len);
}
spin_unlock(&vcpu->arch.ipi_state.lock);
return ret;
}
Then redefine the read and write versions as simple wrappers:
static uint64_t read_mailbox(struct kvm_vcpu *vcpu, int offset, int len)
{
return mailbox_access(vcpu, offset, len, false, 0);
}
static void write_mailbox(struct kvm_vcpu *vcpu, int offset,
uint64_t data, int len)
{
(void)mailbox_access(vcpu, offset, len, true, data);
}
Similarly, you could factor out the repeated lock patterns in ipi_send/ipi_clear or in the switch statements (e.g. by mapping register offsets to member pointers) to reduce duplicate code paths.
These are non‐breaking changes that consolidate duplicated logic and lower cognitive complexity.
8e66951
to
d8f7ae1
Compare
case 8: | ||
ret = loongarch_extioi_writel(vcpu, extioi, addr, len, val); | ||
break; | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default时ret未赋值。
} | ||
|
||
loongarch_ext_irq_unlock(extioi, flags); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
多余的空行?
case 8: | ||
ret = loongarch_extioi_readl(vcpu, extioi, addr, len, val); | ||
break; | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default时ret未赋值。
__setup("loongarch_iommu=", la_iommu_setup); | ||
|
||
static const struct pci_device_id loongson_iommu_pci_tbl[] = { | ||
{ PCI_DEVICE(0x14, 0x3c0f) }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
此处是否使用PCI_VENDOR_ID_LOONGSON?
return ret; | ||
} | ||
|
||
if (h->length == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
此处if (h->type == la_iommu_target_ivhd_type) 是否需要放在if (h->type == la_iommu_target_ivhd_type)前面?
if (h->length == 0) | ||
break; | ||
|
||
if (*p == la_iommu_target_ivhd_type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
另外一个地方用的 if (h->type == la_iommu_target_ivhd_type) 能不能统一?
Summary by Sourcery
This pull request synchronizes the KVM patch for the LoongArch architecture. It introduces support for performance monitoring units (PMU), LBT, stolen time, and IPI. It also includes changes to the MMU, timer, and other core components to improve virtualization performance and stability.
New Features: