* [PATCH v2 0/2] iommu/arm-smmu-v3: Tegra264 invalidation workaround
@ 2026-05-29 14:08 Ashish Mhetre
2026-05-29 14:08 ` [PATCH v2 1/2] iommu/arm-smmu-v3: Detect Tegra264 erratum Ashish Mhetre
2026-05-29 14:08 ` [PATCH v2 2/2] iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264 Ashish Mhetre
0 siblings, 2 replies; 7+ messages in thread
From: Ashish Mhetre @ 2026-05-29 14:08 UTC (permalink / raw)
To: will, robin.murphy, joro, jgg, nicolinc
Cc: linux-arm-kernel, iommu, linux-kernel, linux-tegra, Ashish Mhetre
Nvidia Tegra264 SMMUs are affected by an erratum where a TLB entry can
survive an invalidation that races with concurrent traffic targeting
the same entry. The hardware-recommended software workaround is to
issue every CFGI/TLBI command (each followed by CMD_SYNC) twice. The
second issue must execute only after the first issue's CMD_SYNC has
completed, giving the sequence:
TLBI/CFGI ... CMD_SYNC TLBI/CFGI ... CMD_SYNC
ATC_INV is not affected and must not be doubled.
This series implements the workaround by hooking the duplication into
the single chokepoint that every synchronous submission flows through,
arm_smmu_cmdq_issue_cmdlist().
Patch 1 detects affected instances using the existing
"nvidia,tegra264-smmu" compatible string and exposes the condition
via a new ARM_SMMU_OPT_TLBI_TWICE option bit.
Patch 2 wires the option into the CMDQ submission path: when @sync is
true and the cmdlist carries a CFGI/TLBI opcode, the same cmdlist is
re-issued a second time. The batch capacity-rollover path is also
adjusted to force a SYNC on chunks that carry CFGI/TLBI commands so
each flushed chunk is correctly doubled.
The series is based on Jason Gunthorpe's "Remove SMMUv3
struct arm_smmu_cmdq_ent" series [1], specifically commit 13428b0bf794
("iommu/arm-smmu-v3: Directly encode TLBI commands") which is the
final patch of that series in linux-next.
[1] https://lore.kernel.org/all/177919957385.1012282.14787407041669291032.b4-ty@kernel.org/
Changes since v1:
- Patch 1: Add IIDR/IDR/ACPI rationale to the commit message, explaining
why the erratum is detected from the device tree compatible string
rather than from a HW register or ACPI/IORT.
- Rebased onto the publicly accessible base 13428b0bf794 (final commit
of Jason's series in linux-next) so that the base-commit is resolvable
on lore.kernel.org.
- Patch 2: Picked up Reviewed-by: Jason Gunthorpe <jgg@nvidia•com>.
- No code changes since v1.
Ashish Mhetre (2):
iommu/arm-smmu-v3: Detect Tegra264 erratum
iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 70 +++++++++++++++++++--
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 8 +++
2 files changed, 72 insertions(+), 6 deletions(-)
base-commit: 13428b0bf7947098daf9a1db14a74d33eb1b5079
--
2.50.1
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH v2 1/2] iommu/arm-smmu-v3: Detect Tegra264 erratum 2026-05-29 14:08 [PATCH v2 0/2] iommu/arm-smmu-v3: Tegra264 invalidation workaround Ashish Mhetre @ 2026-05-29 14:08 ` Ashish Mhetre 2026-05-29 21:30 ` Nicolin Chen 2026-05-29 14:08 ` [PATCH v2 2/2] iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264 Ashish Mhetre 1 sibling, 1 reply; 7+ messages in thread From: Ashish Mhetre @ 2026-05-29 14:08 UTC (permalink / raw) To: will, robin.murphy, joro, jgg, nicolinc Cc: linux-arm-kernel, iommu, linux-kernel, linux-tegra, Ashish Mhetre Tegra264 SMMU is affected by erratum where a TLB entry can survive an invalidation that races with concurrent traffic targeting the same entry. The hardware-recommended software workaround is to issue every CFGI/TLBI command (each followed by CMD_SYNC) twice. The second issue is guaranteed to evict the entry. ATC_INV is not affected and must not be doubled. The erratum is not flagged by any SMMUv3 IDR/IIDR register, so it cannot be detected from hardware register. Tegra264 boots from device tree only and has no ACPI/IORT support, so detection is through device tree only. Add the ARM_SMMU_OPT_TLBI_TWICE option and set it on instances matching the existing "nvidia,tegra264-smmu" compatible. No callers consume the option yet, next patch wires the workaround into the CMDQ issue paths. Signed-off-by: Ashish Mhetre <amhetre@nvidia•com> --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 4 +++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 8 ++++++++ 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 9be589d14a3b..88296c0a5337 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -5229,8 +5229,10 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev, if (of_dma_is_coherent(dev->of_node)) smmu->features |= ARM_SMMU_FEAT_COHERENCY; - if (of_device_is_compatible(dev->of_node, "nvidia,tegra264-smmu")) + if (of_device_is_compatible(dev->of_node, "nvidia,tegra264-smmu")) { tegra_cmdqv_dt_probe(dev->of_node, smmu); + smmu->options |= ARM_SMMU_OPT_TLBI_TWICE; + } return ret; } diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 16353596e08a..08d1abaf31ae 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -928,6 +928,14 @@ struct arm_smmu_device { #define ARM_SMMU_OPT_MSIPOLL (1 << 2) #define ARM_SMMU_OPT_CMDQ_FORCE_SYNC (1 << 3) #define ARM_SMMU_OPT_TEGRA241_CMDQV (1 << 4) +/* + * Tegra264 erratum: a TLB entry can survive an invalidation that races + * with concurrent traffic targeting the same entry. The software + * workaround is to issue every CFGI/TLBI command twice, each followed + * by CMD_SYNC. The second issue is guaranteed to evict the entry. + * ATC_INV commands are not affected and must not be doubled. + */ +#define ARM_SMMU_OPT_TLBI_TWICE (1 << 5) u32 options; struct arm_smmu_cmdq cmdq; -- 2.50.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/2] iommu/arm-smmu-v3: Detect Tegra264 erratum 2026-05-29 14:08 ` [PATCH v2 1/2] iommu/arm-smmu-v3: Detect Tegra264 erratum Ashish Mhetre @ 2026-05-29 21:30 ` Nicolin Chen 2026-06-01 9:04 ` Ashish Mhetre 0 siblings, 1 reply; 7+ messages in thread From: Nicolin Chen @ 2026-05-29 21:30 UTC (permalink / raw) To: Ashish Mhetre Cc: will, robin.murphy, joro, jgg, linux-arm-kernel, iommu, linux-kernel, linux-tegra On Fri, May 29, 2026 at 02:08:29PM +0000, Ashish Mhetre wrote: > Tegra264 SMMU is affected by erratum where a TLB entry can survive an > invalidation that races with concurrent traffic targeting the same > entry. The hardware-recommended software workaround is to issue every > CFGI/TLBI command (each followed by CMD_SYNC) twice. The second issue is > guaranteed to evict the entry. ATC_INV is not affected and must not be > doubled. > > The erratum is not flagged by any SMMUv3 IDR/IIDR register, so it cannot > be detected from hardware register. Tegra264 boots from device tree only > and has no ACPI/IORT support, so detection is through device tree only. > > Add the ARM_SMMU_OPT_TLBI_TWICE option and set it on instances matching > the existing "nvidia,tegra264-smmu" compatible. No callers consume the > option yet, next patch wires the workaround into the CMDQ issue paths. I was told to avoid "patch": once a patch is applied it becomes a commit. So, maybe "a subsequent change will wire". > > Signed-off-by: Ashish Mhetre <amhetre@nvidia•com> Reviewed-by: Nicolin Chen <nicolinc@nvidia•com> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/2] iommu/arm-smmu-v3: Detect Tegra264 erratum 2026-05-29 21:30 ` Nicolin Chen @ 2026-06-01 9:04 ` Ashish Mhetre 0 siblings, 0 replies; 7+ messages in thread From: Ashish Mhetre @ 2026-06-01 9:04 UTC (permalink / raw) To: Nicolin Chen Cc: will, robin.murphy, joro, jgg, linux-arm-kernel, iommu, linux-kernel, linux-tegra On 5/30/2026 3:00 AM, Nicolin Chen wrote: > On Fri, May 29, 2026 at 02:08:29PM +0000, Ashish Mhetre wrote: >> Tegra264 SMMU is affected by erratum where a TLB entry can survive an >> invalidation that races with concurrent traffic targeting the same >> entry. The hardware-recommended software workaround is to issue every >> CFGI/TLBI command (each followed by CMD_SYNC) twice. The second issue is >> guaranteed to evict the entry. ATC_INV is not affected and must not be >> doubled. >> >> The erratum is not flagged by any SMMUv3 IDR/IIDR register, so it cannot >> be detected from hardware register. Tegra264 boots from device tree only >> and has no ACPI/IORT support, so detection is through device tree only. >> >> Add the ARM_SMMU_OPT_TLBI_TWICE option and set it on instances matching >> the existing "nvidia,tegra264-smmu" compatible. No callers consume the >> option yet, next patch wires the workaround into the CMDQ issue paths. > I was told to avoid "patch": once a patch is applied it becomes > a commit. So, maybe "a subsequent change will wire". Sure, I'll update it in next version. >> Signed-off-by: Ashish Mhetre <amhetre@nvidia•com> > Reviewed-by: Nicolin Chen <nicolinc@nvidia•com> Thanks Nicolin. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 2/2] iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264 2026-05-29 14:08 [PATCH v2 0/2] iommu/arm-smmu-v3: Tegra264 invalidation workaround Ashish Mhetre 2026-05-29 14:08 ` [PATCH v2 1/2] iommu/arm-smmu-v3: Detect Tegra264 erratum Ashish Mhetre @ 2026-05-29 14:08 ` Ashish Mhetre 2026-05-29 21:10 ` Nicolin Chen 1 sibling, 1 reply; 7+ messages in thread From: Ashish Mhetre @ 2026-05-29 14:08 UTC (permalink / raw) To: will, robin.murphy, joro, jgg, nicolinc Cc: linux-arm-kernel, iommu, linux-kernel, linux-tegra, Ashish Mhetre, Jason Gunthorpe Apply the workaround for Tegra264 erratum by issuing every CFGI/TLBI command twice on affected SMMU instances, with CMD_SYNC after each. The erratum requires this exact sequencing: TLBI/CFGI ... CMD_SYNC TLBI/CFGI ... CMD_SYNC To get this sequence with minimal surgery, hook the workaround into arm_smmu_cmdq_issue_cmdlist(). Rename the original function to __arm_smmu_cmdq_issue_cmdlist() and add a thin wrapper that, on affected SMMUs and when @sync is true, re-issues the same cmdlist a second time. A new arm_smmu_cmd_needs_tlbi_twice() helper classifies which opcodes need the doubling: CFGI_* and TLBI_*. For batches that exceed CMDQ_BATCH_ENTRIES commands, arm_smmu_cmdq_batch_add_cmd_p() normally flushes the full buffer with sync=false, deferring the SYNC to the eventual batch_submit(). On affected SMMUs this would leave the first chunk's commands issued only once, since the WAR hook in arm_smmu_cmdq_issue_cmdlist() only fires on synced submissions. Force a SYNC on the capacity rollover when the buffer carries CFGI/TLBI commands so every flushed chunk is correctly doubled. Signed-off-by: Ashish Mhetre <amhetre@nvidia•com> Reviewed-by: Jason Gunthorpe <jgg@nvidia•com> --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 66 +++++++++++++++++++-- 1 file changed, 61 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 88296c0a5337..38d45f175a2c 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -698,10 +698,10 @@ static void arm_smmu_cmdq_write_entries(struct arm_smmu_cmdq *cmdq, * insert their own list of commands then all of the commands from one * CPU will appear before any of the commands from the other CPU. */ -int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, - struct arm_smmu_cmdq *cmdq, - struct arm_smmu_cmd *cmds, int n, - bool sync) +static int __arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, + struct arm_smmu_cmdq *cmdq, + struct arm_smmu_cmd *cmds, int n, + bool sync) { struct arm_smmu_cmd cmd_sync; u32 prod; @@ -820,6 +820,52 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, return ret; } +/* + * Returns true if @opcode is a CFGI_* or TLBI_* command, i.e. one of the + * invalidations covered by Tegra264 erratum (see ARM_SMMU_OPT_TLBI_TWICE). + */ +static bool arm_smmu_cmd_needs_tlbi_twice(u8 opcode) +{ + switch (opcode) { + case CMDQ_OP_CFGI_STE: + case CMDQ_OP_CFGI_ALL: + case CMDQ_OP_CFGI_CD: + case CMDQ_OP_CFGI_CD_ALL: + case CMDQ_OP_TLBI_NH_ALL: + case CMDQ_OP_TLBI_NH_ASID: + case CMDQ_OP_TLBI_NH_VA: + case CMDQ_OP_TLBI_NH_VAA: + case CMDQ_OP_TLBI_EL2_ALL: + case CMDQ_OP_TLBI_EL2_ASID: + case CMDQ_OP_TLBI_EL2_VA: + case CMDQ_OP_TLBI_S12_VMALL: + case CMDQ_OP_TLBI_S2_IPA: + case CMDQ_OP_TLBI_NSNH_ALL: + return true; + default: + return false; + } +} + +int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, + struct arm_smmu_cmdq *cmdq, + struct arm_smmu_cmd *cmds, int n, + bool sync) +{ + int ret = __arm_smmu_cmdq_issue_cmdlist(smmu, cmdq, cmds, n, sync); + + /* + * The driver's batch invariants keep a single submission's + * opcode class uniform, so checking the first command is enough. + */ + if (!ret && sync && (smmu->options & ARM_SMMU_OPT_TLBI_TWICE) && + arm_smmu_cmd_needs_tlbi_twice(FIELD_GET(CMDQ_0_OP, + cmds[0].data[0]))) + ret = __arm_smmu_cmdq_issue_cmdlist(smmu, cmdq, cmds, n, sync); + + return ret; +} + static int arm_smmu_cmdq_issue_cmd_p(struct arm_smmu_device *smmu, struct arm_smmu_cmd *cmd, bool sync) { @@ -863,8 +909,18 @@ static void arm_smmu_cmdq_batch_add_cmd_p(struct arm_smmu_device *smmu, } if (cmds->num == CMDQ_BATCH_ENTRIES) { + /* + * Force a SYNC only when the batch carries commands that + * have to be doubled (see ARM_SMMU_OPT_TLBI_TWICE). + * The batch holds a uniform opcode class, so checking + * the first command is sufficient. + */ + bool need_sync = (smmu->options & ARM_SMMU_OPT_TLBI_TWICE) && + arm_smmu_cmd_needs_tlbi_twice(FIELD_GET(CMDQ_0_OP, + cmds->cmds[0].data[0])); + arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmdq, cmds->cmds, - cmds->num, false); + cmds->num, need_sync); arm_smmu_cmdq_batch_init_cmd(smmu, cmds, cmd); } -- 2.50.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2 2/2] iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264 2026-05-29 14:08 ` [PATCH v2 2/2] iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264 Ashish Mhetre @ 2026-05-29 21:10 ` Nicolin Chen 2026-06-01 9:20 ` Ashish Mhetre 0 siblings, 1 reply; 7+ messages in thread From: Nicolin Chen @ 2026-05-29 21:10 UTC (permalink / raw) To: Ashish Mhetre Cc: will, robin.murphy, joro, jgg, linux-arm-kernel, iommu, linux-kernel, linux-tegra, Jason Gunthorpe On Fri, May 29, 2026 at 02:08:30PM +0000, Ashish Mhetre wrote: > +int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, > + struct arm_smmu_cmdq *cmdq, > + struct arm_smmu_cmd *cmds, int n, > + bool sync) > +{ > + int ret = __arm_smmu_cmdq_issue_cmdlist(smmu, cmdq, cmds, n, sync); > + > + /* > + * The driver's batch invariants keep a single submission's > + * opcode class uniform, so checking the first command is enough. > + */ > + if (!ret && sync && (smmu->options & ARM_SMMU_OPT_TLBI_TWICE) && > + arm_smmu_cmd_needs_tlbi_twice(FIELD_GET(CMDQ_0_OP, > + cmds[0].data[0]))) > + ret = __arm_smmu_cmdq_issue_cmdlist(smmu, cmdq, cmds, n, sync); https://sashiko.dev/#/patchset/20260529140830.629738-1-amhetre%40nvidia.com Sashiko pointed out that the iommufd path might mix commands when calling arm_smmu_cmdq_issue_cmdlist(), which is valid I think. > static int arm_smmu_cmdq_issue_cmd_p(struct arm_smmu_device *smmu, > struct arm_smmu_cmd *cmd, bool sync) > { > @@ -863,8 +909,18 @@ static void arm_smmu_cmdq_batch_add_cmd_p(struct arm_smmu_device *smmu, > } > > if (cmds->num == CMDQ_BATCH_ENTRIES) { > + /* > + * Force a SYNC only when the batch carries commands that > + * have to be doubled (see ARM_SMMU_OPT_TLBI_TWICE). > + * The batch holds a uniform opcode class, so checking > + * the first command is sufficient. > + */ > + bool need_sync = (smmu->options & ARM_SMMU_OPT_TLBI_TWICE) && > + arm_smmu_cmd_needs_tlbi_twice(FIELD_GET(CMDQ_0_OP, > + cmds->cmds[0].data[0])); > + Also, given that this does "force a sync", I think it might be nicer to go to the force_sync path. One of my ongoing series also needs to add another force_sync condition, so I think it would be cleaner to have a helper function. Maybe try the following diff. That arm_smmu_cmdq_batch_force_sync() might be added with a preparatory patch, but it's up to you. -------------------------------------------------------------------- diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c index 1e9f7d2de3441..4c9ce974d31a8 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c @@ -350,6 +350,18 @@ static int arm_vsmmu_convert_user_cmd(struct arm_vsmmu *vsmmu, return 0; } +static bool arm_vsmmu_can_batch_cmd(struct arm_smmu_device *smmu, + struct arm_vsmmu_invalidation_cmd *last, + struct arm_vsmmu_invalidation_cmd *next) +{ + struct arm_smmu_cmd next_cmd = { + .data[0] = le64_to_cpu(next->ucmd.cmd[0]), + }; + + return arm_smmu_cmd_needs_tlbi_twice(smmu, &last->cmd) == + arm_smmu_cmd_needs_tlbi_twice(smmu, &next_cmd); +} + int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu, struct iommu_user_data_array *array) { @@ -382,7 +394,8 @@ int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu, /* FIXME work in blocks of CMDQ_BATCH_ENTRIES and copy each block? */ cur++; - if (cur != end && (cur - last) != CMDQ_BATCH_ENTRIES - 1) + if (cur != end && (cur - last) != CMDQ_BATCH_ENTRIES - 1 && + arm_vsmmu_can_batch_cmd(smmu, last, cur)) continue; /* FIXME always uses the main cmdq rather than trying to group by type */ diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index a63155e9e7f28..9b150e3145034 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -820,33 +820,6 @@ static int __arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, return ret; } -/* - * Returns true if @opcode is a CFGI_* or TLBI_* command, i.e. one of the - * invalidations covered by Tegra264 erratum (see ARM_SMMU_OPT_TLBI_TWICE). - */ -static bool arm_smmu_cmd_needs_tlbi_twice(u8 opcode) -{ - switch (opcode) { - case CMDQ_OP_CFGI_STE: - case CMDQ_OP_CFGI_ALL: - case CMDQ_OP_CFGI_CD: - case CMDQ_OP_CFGI_CD_ALL: - case CMDQ_OP_TLBI_NH_ALL: - case CMDQ_OP_TLBI_NH_ASID: - case CMDQ_OP_TLBI_NH_VA: - case CMDQ_OP_TLBI_NH_VAA: - case CMDQ_OP_TLBI_EL2_ALL: - case CMDQ_OP_TLBI_EL2_ASID: - case CMDQ_OP_TLBI_EL2_VA: - case CMDQ_OP_TLBI_S12_VMALL: - case CMDQ_OP_TLBI_S2_IPA: - case CMDQ_OP_TLBI_NSNH_ALL: - return true; - default: - return false; - } -} - int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, struct arm_smmu_cmdq *cmdq, struct arm_smmu_cmd *cmds, int n, @@ -858,9 +831,7 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, * The driver's batch invariants keep a single submission's * opcode class uniform, so checking the first command is enough. */ - if (!ret && sync && (smmu->options & ARM_SMMU_OPT_TLBI_TWICE) && - arm_smmu_cmd_needs_tlbi_twice(FIELD_GET(CMDQ_0_OP, - cmds[0].data[0]))) + if (!ret && sync && arm_smmu_cmd_needs_tlbi_twice(smmu, &cmds[0])) ret = __arm_smmu_cmdq_issue_cmdlist(smmu, cmdq, cmds, n, sync); return ret; @@ -893,34 +864,48 @@ static void arm_smmu_cmdq_batch_init_cmd(struct arm_smmu_device *smmu, cmds->cmdq = arm_smmu_get_cmdq(smmu, cmd); } +static bool arm_smmu_cmdq_batch_force_sync(struct arm_smmu_device *smmu, + struct arm_smmu_cmdq_batch *cmds, + struct arm_smmu_cmd *cmd) +{ + if (!cmds->num) + return false; + + /* The batch's pre-assigned cmdq doesn't support the new command */ + if (!arm_smmu_cmdq_supports_cmd(cmds->cmdq, cmd)) + return true; + + /* Arm erratum 2812531 */ + if (cmds->num == CMDQ_BATCH_ENTRIES - 1 && + (smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC)) + return true; + + /* + * Tegra264 erratum (see ARM_SMMU_OPT_TLBI_TWICE). The batch holds a + * uniform opcode class, so checking the first command is enough. + */ + if ((cmds->num == CMDQ_BATCH_ENTRIES) && + arm_smmu_cmd_needs_tlbi_twice(smmu, &cmds->cmds[0])) + return true; + + return false; +} + static void arm_smmu_cmdq_batch_add_cmd_p(struct arm_smmu_device *smmu, struct arm_smmu_cmdq_batch *cmds, struct arm_smmu_cmd *cmd) { - bool force_sync = (cmds->num == CMDQ_BATCH_ENTRIES - 1) && - (smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC); - bool unsupported_cmd; + bool force_sync = arm_smmu_cmdq_batch_force_sync(smmu, cmds, cmd); - unsupported_cmd = !arm_smmu_cmdq_supports_cmd(cmds->cmdq, cmd); - if (force_sync || unsupported_cmd) { + if (force_sync) { arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmdq, cmds->cmds, cmds->num, true); arm_smmu_cmdq_batch_init_cmd(smmu, cmds, cmd); } if (cmds->num == CMDQ_BATCH_ENTRIES) { - /* - * Force a SYNC only when the batch carries commands that - * have to be doubled (see ARM_SMMU_OPT_TLBI_TWICE). - * The batch holds a uniform opcode class, so checking - * the first command is sufficient. - */ - bool need_sync = (smmu->options & ARM_SMMU_OPT_TLBI_TWICE) && - arm_smmu_cmd_needs_tlbi_twice(FIELD_GET(CMDQ_0_OP, - cmds->cmds[0].data[0])); - arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmdq, cmds->cmds, - cmds->num, need_sync); + cmds->num, false); arm_smmu_cmdq_batch_init_cmd(smmu, cmds, cmd); } diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 08d1abaf31ae2..e6afc832c0078 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -1219,6 +1219,37 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, struct arm_smmu_cmd *cmds, int n, bool sync); +/* + * Returns true if @cmd opcode is a CFGI_* or TLBI_* command, i.e. one of the + * invalidations covered by Tegra264 erratum (see ARM_SMMU_OPT_TLBI_TWICE). + */ +static inline bool arm_smmu_cmd_needs_tlbi_twice(struct arm_smmu_device *smmu, + struct arm_smmu_cmd *cmd) +{ + if (!(smmu->options & ARM_SMMU_OPT_TLBI_TWICE)) + return false; + + switch (FIELD_GET(CMDQ_0_OP, cmd->data[0])) { + case CMDQ_OP_CFGI_STE: + case CMDQ_OP_CFGI_ALL: + case CMDQ_OP_CFGI_CD: + case CMDQ_OP_CFGI_CD_ALL: + case CMDQ_OP_TLBI_NH_ALL: + case CMDQ_OP_TLBI_NH_ASID: + case CMDQ_OP_TLBI_NH_VA: + case CMDQ_OP_TLBI_NH_VAA: + case CMDQ_OP_TLBI_EL2_ALL: + case CMDQ_OP_TLBI_EL2_ASID: + case CMDQ_OP_TLBI_EL2_VA: + case CMDQ_OP_TLBI_S12_VMALL: + case CMDQ_OP_TLBI_S2_IPA: + case CMDQ_OP_TLBI_NSNH_ALL: + return true; + default: + return false; + } +} + #ifdef CONFIG_ARM_SMMU_V3_SVA bool arm_smmu_sva_supported(struct arm_smmu_device *smmu); void arm_smmu_sva_notifier_synchronize(void); -------------------------------------------------------------------- Nicolinc ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2 2/2] iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264 2026-05-29 21:10 ` Nicolin Chen @ 2026-06-01 9:20 ` Ashish Mhetre 0 siblings, 0 replies; 7+ messages in thread From: Ashish Mhetre @ 2026-06-01 9:20 UTC (permalink / raw) To: Nicolin Chen Cc: will, robin.murphy, joro, jgg, linux-arm-kernel, iommu, linux-kernel, linux-tegra, Jason Gunthorpe On 5/30/2026 2:40 AM, Nicolin Chen wrote: > On Fri, May 29, 2026 at 02:08:30PM +0000, Ashish Mhetre wrote: >> +int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, >> + struct arm_smmu_cmdq *cmdq, >> + struct arm_smmu_cmd *cmds, int n, >> + bool sync) >> +{ >> + int ret = __arm_smmu_cmdq_issue_cmdlist(smmu, cmdq, cmds, n, sync); >> + >> + /* >> + * The driver's batch invariants keep a single submission's >> + * opcode class uniform, so checking the first command is enough. >> + */ >> + if (!ret && sync && (smmu->options & ARM_SMMU_OPT_TLBI_TWICE) && >> + arm_smmu_cmd_needs_tlbi_twice(FIELD_GET(CMDQ_0_OP, >> + cmds[0].data[0]))) >> + ret = __arm_smmu_cmdq_issue_cmdlist(smmu, cmdq, cmds, n, sync); > https://sashiko.dev/#/patchset/20260529140830.629738-1-amhetre%40nvidia.com > Sashiko pointed out that the iommufd path might mix commands when > calling arm_smmu_cmdq_issue_cmdlist(), which is valid I think. Okay, I'll update the batching for iommufd path as suggested by you in next version. >> static int arm_smmu_cmdq_issue_cmd_p(struct arm_smmu_device *smmu, >> struct arm_smmu_cmd *cmd, bool sync) >> { >> @@ -863,8 +909,18 @@ static void arm_smmu_cmdq_batch_add_cmd_p(struct arm_smmu_device *smmu, >> } >> >> if (cmds->num == CMDQ_BATCH_ENTRIES) { >> + /* >> + * Force a SYNC only when the batch carries commands that >> + * have to be doubled (see ARM_SMMU_OPT_TLBI_TWICE). >> + * The batch holds a uniform opcode class, so checking >> + * the first command is sufficient. >> + */ >> + bool need_sync = (smmu->options & ARM_SMMU_OPT_TLBI_TWICE) && >> + arm_smmu_cmd_needs_tlbi_twice(FIELD_GET(CMDQ_0_OP, >> + cmds->cmds[0].data[0])); >> + > Also, given that this does "force a sync", I think it might be nicer > to go to the force_sync path. One of my ongoing series also needs to > add another force_sync condition, so I think it would be cleaner to > have a helper function. > > Maybe try the following diff. That arm_smmu_cmdq_batch_force_sync() > might be added with a preparatory patch, but it's up to you. > > -------------------------------------------------------------------- > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c > index 1e9f7d2de3441..4c9ce974d31a8 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c > @@ -350,6 +350,18 @@ static int arm_vsmmu_convert_user_cmd(struct arm_vsmmu *vsmmu, > return 0; > } > > +static bool arm_vsmmu_can_batch_cmd(struct arm_smmu_device *smmu, > + struct arm_vsmmu_invalidation_cmd *last, > + struct arm_vsmmu_invalidation_cmd *next) > +{ > + struct arm_smmu_cmd next_cmd = { > + .data[0] = le64_to_cpu(next->ucmd.cmd[0]), > + }; > + > + return arm_smmu_cmd_needs_tlbi_twice(smmu, &last->cmd) == > + arm_smmu_cmd_needs_tlbi_twice(smmu, &next_cmd); > +} > + > int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu, > struct iommu_user_data_array *array) > { > @@ -382,7 +394,8 @@ int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu, > > /* FIXME work in blocks of CMDQ_BATCH_ENTRIES and copy each block? */ > cur++; > - if (cur != end && (cur - last) != CMDQ_BATCH_ENTRIES - 1) > + if (cur != end && (cur - last) != CMDQ_BATCH_ENTRIES - 1 && > + arm_vsmmu_can_batch_cmd(smmu, last, cur)) > continue; > > /* FIXME always uses the main cmdq rather than trying to group by type */ > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > index a63155e9e7f28..9b150e3145034 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > @@ -820,33 +820,6 @@ static int __arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, > return ret; > } > > -/* > - * Returns true if @opcode is a CFGI_* or TLBI_* command, i.e. one of the > - * invalidations covered by Tegra264 erratum (see ARM_SMMU_OPT_TLBI_TWICE). > - */ > -static bool arm_smmu_cmd_needs_tlbi_twice(u8 opcode) > -{ > - switch (opcode) { > - case CMDQ_OP_CFGI_STE: > - case CMDQ_OP_CFGI_ALL: > - case CMDQ_OP_CFGI_CD: > - case CMDQ_OP_CFGI_CD_ALL: > - case CMDQ_OP_TLBI_NH_ALL: > - case CMDQ_OP_TLBI_NH_ASID: > - case CMDQ_OP_TLBI_NH_VA: > - case CMDQ_OP_TLBI_NH_VAA: > - case CMDQ_OP_TLBI_EL2_ALL: > - case CMDQ_OP_TLBI_EL2_ASID: > - case CMDQ_OP_TLBI_EL2_VA: > - case CMDQ_OP_TLBI_S12_VMALL: > - case CMDQ_OP_TLBI_S2_IPA: > - case CMDQ_OP_TLBI_NSNH_ALL: > - return true; > - default: > - return false; > - } > -} > - > int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, > struct arm_smmu_cmdq *cmdq, > struct arm_smmu_cmd *cmds, int n, > @@ -858,9 +831,7 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, > * The driver's batch invariants keep a single submission's > * opcode class uniform, so checking the first command is enough. > */ > - if (!ret && sync && (smmu->options & ARM_SMMU_OPT_TLBI_TWICE) && > - arm_smmu_cmd_needs_tlbi_twice(FIELD_GET(CMDQ_0_OP, > - cmds[0].data[0]))) > + if (!ret && sync && arm_smmu_cmd_needs_tlbi_twice(smmu, &cmds[0])) > ret = __arm_smmu_cmdq_issue_cmdlist(smmu, cmdq, cmds, n, sync); > > return ret; > @@ -893,34 +864,48 @@ static void arm_smmu_cmdq_batch_init_cmd(struct arm_smmu_device *smmu, > cmds->cmdq = arm_smmu_get_cmdq(smmu, cmd); > } > > +static bool arm_smmu_cmdq_batch_force_sync(struct arm_smmu_device *smmu, > + struct arm_smmu_cmdq_batch *cmds, > + struct arm_smmu_cmd *cmd) > +{ > + if (!cmds->num) > + return false; > + > + /* The batch's pre-assigned cmdq doesn't support the new command */ > + if (!arm_smmu_cmdq_supports_cmd(cmds->cmdq, cmd)) > + return true; > + > + /* Arm erratum 2812531 */ > + if (cmds->num == CMDQ_BATCH_ENTRIES - 1 && > + (smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC)) > + return true; > + > + /* > + * Tegra264 erratum (see ARM_SMMU_OPT_TLBI_TWICE). The batch holds a > + * uniform opcode class, so checking the first command is enough. > + */ > + if ((cmds->num == CMDQ_BATCH_ENTRIES) && > + arm_smmu_cmd_needs_tlbi_twice(smmu, &cmds->cmds[0])) > + return true; > + > + return false; > +} > + > static void arm_smmu_cmdq_batch_add_cmd_p(struct arm_smmu_device *smmu, > struct arm_smmu_cmdq_batch *cmds, > struct arm_smmu_cmd *cmd) > { > - bool force_sync = (cmds->num == CMDQ_BATCH_ENTRIES - 1) && > - (smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC); > - bool unsupported_cmd; > + bool force_sync = arm_smmu_cmdq_batch_force_sync(smmu, cmds, cmd); > > - unsupported_cmd = !arm_smmu_cmdq_supports_cmd(cmds->cmdq, cmd); > - if (force_sync || unsupported_cmd) { > + if (force_sync) { > arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmdq, cmds->cmds, > cmds->num, true); > arm_smmu_cmdq_batch_init_cmd(smmu, cmds, cmd); > } > > if (cmds->num == CMDQ_BATCH_ENTRIES) { > - /* > - * Force a SYNC only when the batch carries commands that > - * have to be doubled (see ARM_SMMU_OPT_TLBI_TWICE). > - * The batch holds a uniform opcode class, so checking > - * the first command is sufficient. > - */ > - bool need_sync = (smmu->options & ARM_SMMU_OPT_TLBI_TWICE) && > - arm_smmu_cmd_needs_tlbi_twice(FIELD_GET(CMDQ_0_OP, > - cmds->cmds[0].data[0])); > - > arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmdq, cmds->cmds, > - cmds->num, need_sync); > + cmds->num, false); > arm_smmu_cmdq_batch_init_cmd(smmu, cmds, cmd); > } > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > index 08d1abaf31ae2..e6afc832c0078 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > @@ -1219,6 +1219,37 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, > struct arm_smmu_cmd *cmds, int n, > bool sync); > > +/* > + * Returns true if @cmd opcode is a CFGI_* or TLBI_* command, i.e. one of the > + * invalidations covered by Tegra264 erratum (see ARM_SMMU_OPT_TLBI_TWICE). > + */ > +static inline bool arm_smmu_cmd_needs_tlbi_twice(struct arm_smmu_device *smmu, > + struct arm_smmu_cmd *cmd) > +{ > + if (!(smmu->options & ARM_SMMU_OPT_TLBI_TWICE)) > + return false; > + > + switch (FIELD_GET(CMDQ_0_OP, cmd->data[0])) { > + case CMDQ_OP_CFGI_STE: > + case CMDQ_OP_CFGI_ALL: > + case CMDQ_OP_CFGI_CD: > + case CMDQ_OP_CFGI_CD_ALL: > + case CMDQ_OP_TLBI_NH_ALL: > + case CMDQ_OP_TLBI_NH_ASID: > + case CMDQ_OP_TLBI_NH_VA: > + case CMDQ_OP_TLBI_NH_VAA: > + case CMDQ_OP_TLBI_EL2_ALL: > + case CMDQ_OP_TLBI_EL2_ASID: > + case CMDQ_OP_TLBI_EL2_VA: > + case CMDQ_OP_TLBI_S12_VMALL: > + case CMDQ_OP_TLBI_S2_IPA: > + case CMDQ_OP_TLBI_NSNH_ALL: > + return true; > + default: > + return false; > + } > +} > + > #ifdef CONFIG_ARM_SMMU_V3_SVA > bool arm_smmu_sva_supported(struct arm_smmu_device *smmu); > void arm_smmu_sva_notifier_synchronize(void); > -------------------------------------------------------------------- > > Nicolinc Ack, I'll add force_sync patch by you as a preparatory patch and rebase my changes on top of it and send it in V3. Thanks, Ashish Mhetre ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-06-01 9:20 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-29 14:08 [PATCH v2 0/2] iommu/arm-smmu-v3: Tegra264 invalidation workaround Ashish Mhetre 2026-05-29 14:08 ` [PATCH v2 1/2] iommu/arm-smmu-v3: Detect Tegra264 erratum Ashish Mhetre 2026-05-29 21:30 ` Nicolin Chen 2026-06-01 9:04 ` Ashish Mhetre 2026-05-29 14:08 ` [PATCH v2 2/2] iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264 Ashish Mhetre 2026-05-29 21:10 ` Nicolin Chen 2026-06-01 9:20 ` Ashish Mhetre
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox