public inbox for linux-arm-kernel@lists.infradead.org 
 help / color / mirror / Atom feed
* [PATCH 0/3] iommu: Add PCI vendor:device ID to IOMMU fault logs
@ 2026-05-06 15:05 Yigit Oguz
  2026-05-06 15:05 ` [PATCH 1/3] iommu/arm-smmu-v3: Print PCI vendor:device ID in SMMU translation " Yigit Oguz
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Yigit Oguz @ 2026-05-06 15:05 UTC (permalink / raw)
  To: joro, will, robin.murphy, baolu.lu, dwmw2, suravee.suthikulpanit
  Cc: jgg, nicolinc, iommu, linux-arm-kernel, linux-kernel, Yigit Oguz

IOMMU fault and event logs currently identify devices using only their
PCI segment/bus/device/function (SSSS:BB:DD.F). While mapping a single
BDF to a device type is straightforward, doing so at scale across many
hosts and thousands of fault events requires additional tooling and
manual cross-referencing. Including the vendor:device ID directly in
the log line makes each event self-contained and immediately actionable
without any post-processing.

This series adds vendor:device ID (VVVV:DDDD) to IOMMU event logs for
ARM SMMUv3, Intel VT-d and AMD IOMMU.

Before:
  arm-smmu-v3 arm-smmu-v3.0.auto: event: F_TRANSLATION client: 0000:2b:11.6
    sid: 0x158e ssid: 0x0 iova: 0x280000000000 ipa: 0x0
  DMAR: [DMA Write NO_PASID] Request device [86:00.0] fault addr 0xe0000000
    [fault reason 0x05] PTE Write access is not set
  AMD-Vi: Event logged [IO_PAGE_FAULT device=0000:41:00.0 domain=0x000a
    address=0xe0000000 flags=0x0020]

After:
  arm-smmu-v3 arm-smmu-v3.0.auto: event: F_TRANSLATION client: 0000:2b:11.6 [8086:1533]
    sid: 0x158e ssid: 0x0 iova: 0x280000000000 ipa: 0x0
  DMAR: [DMA Write NO_PASID] Request device [0000:86:00.0 8086:1533] fault addr 0xe0000000
    [fault reason 0x05] PTE Write access is not set
  AMD-Vi: Event logged [IO_PAGE_FAULT device=0000:41:00.0 8086:1533 domain=0x000a
    address=0xe0000000 flags=0x0020]

Patch 1 adds vendor:device ID to ARM SMMUv3 translation fault logs.
Patch 2 adds PCI segment and vendor:device ID to Intel VT-d DMAR
        fault logs.
Patch 3 adds a devid_str helper and vendor:device ID to all AMD IOMMU
        event log paths.

Testing:
Build-tested against mainline Linux (torvalds/master).

Runtime-tested on a custom downstream branch on ARM SMMUv3, Intel VT-d and
AMD IOMMU hosts. Translation faults were induced in a virtualized setup
by removing DMA mappings for an in-use region, causing the assigned device's
subsequent DMA transactions to hit unmapped IOVAs and produce
translation fault events. The resulting log lines were verified to
contain the PCI vendor:device ID on all three platforms.

Lilit Janpoladyan (1):
  iommu/arm-smmu-v3: Print PCI vendor:device ID in SMMU translation
    fault logs

Yigit Oguz (2):
  iommu/vt-d: Add PCI segment and vendor:device ID to DMAR fault logs
  iommu/amd: Add vendor:device ID to AMD IOMMU event logs

 drivers/iommu/amd/iommu.c                   | 94 +++++++++++++--------
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29 ++++++-
 drivers/iommu/intel/dmar.c                  | 33 +++++---
 3 files changed, 104 insertions(+), 52 deletions(-)

-- 
2.47.3




Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] iommu/arm-smmu-v3: Print PCI vendor:device ID in SMMU translation fault logs
  2026-05-06 15:05 [PATCH 0/3] iommu: Add PCI vendor:device ID to IOMMU fault logs Yigit Oguz
@ 2026-05-06 15:05 ` Yigit Oguz
  2026-05-07 17:01   ` Pranjal Shrivastava
  2026-05-06 15:05 ` [PATCH 2/3] iommu/vt-d: Add PCI segment and vendor:device ID to DMAR " Yigit Oguz
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Yigit Oguz @ 2026-05-06 15:05 UTC (permalink / raw)
  To: joro, will, robin.murphy, baolu.lu, dwmw2, suravee.suthikulpanit
  Cc: jgg, nicolinc, iommu, linux-arm-kernel, linux-kernel,
	Lilit Janpoladyan, Yigit Oguz

From: Lilit Janpoladyan <lilitj@amazon•com>

For translation, address-size, access, and permission faults, look up
the pci_dev from the event and append the PCI vendor:device ID after
the device name, e.g.:

  event: F_TRANSLATION client: 0001:02:02.4 [1d0f:8061] sid: ...

For non-PCI devices or unassigned SIDs the output is unchanged.

Signed-off-by: Lilit Janpoladyan <lilitj@amazon•com>
Signed-off-by: Yigit Oguz <yigitogu@amazon•de>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29 ++++++++++++++++++---
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index e8d7dbe495f0..ab1afa36965a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2213,12 +2213,30 @@ static void arm_smmu_dump_raw_event(struct arm_smmu_device *smmu, u64 *raw,
 
 #define ARM_SMMU_EVT_KNOWN(e)	((e)->id < ARRAY_SIZE(event_str) && event_str[(e)->id])
 #define ARM_SMMU_LOG_EVT_STR(e) ARM_SMMU_EVT_KNOWN(e) ? event_str[(e)->id] : "UNKNOWN"
-#define ARM_SMMU_LOG_CLIENT(e)	(e)->dev ? dev_name((e)->dev) : "(unassigned sid)"
+
+/* "SSSS:BB:DD.F [VVVV:DDDD]\0" — 12 + 1 + 11 + 1 = 25; round to power of 2 */
+#define ARM_SMMU_CLIENT_LEN	32
+
+static const char *arm_smmu_fmt_client(struct arm_smmu_event *e, char *buf, size_t sz)
+{
+	struct pci_dev *p;
+
+	if (!e->dev)
+		return "(unassigned sid)";
+	if (!dev_is_pci(e->dev))
+		return dev_name(e->dev);
+
+	p = to_pci_dev(e->dev);
+	snprintf(buf, sz, "%s [%04x:%04x]", dev_name(e->dev), p->vendor, p->device);
+	return buf;
+}
 
 static void arm_smmu_dump_event(struct arm_smmu_device *smmu, u64 *raw,
 				struct arm_smmu_event *evt,
 				struct ratelimit_state *rs)
 {
+	char clientbuf[ARM_SMMU_CLIENT_LEN];
+
 	if (!__ratelimit(rs))
 		return;
 
@@ -2230,7 +2248,8 @@ static void arm_smmu_dump_event(struct arm_smmu_device *smmu, u64 *raw,
 	case EVT_ID_ACCESS_FAULT:
 	case EVT_ID_PERMISSION_FAULT:
 		dev_err(smmu->dev, "event: %s client: %s sid: %#x ssid: %#x iova: %#llx ipa: %#llx",
-			ARM_SMMU_LOG_EVT_STR(evt), ARM_SMMU_LOG_CLIENT(evt),
+			ARM_SMMU_LOG_EVT_STR(evt),
+			arm_smmu_fmt_client(evt, clientbuf, ARM_SMMU_CLIENT_LEN),
 			evt->sid, evt->ssid, evt->iova, evt->ipa);
 
 		dev_err(smmu->dev, "%s %s %s %s \"%s\"%s%s stag: %#x",
@@ -2247,14 +2266,16 @@ static void arm_smmu_dump_event(struct arm_smmu_device *smmu, u64 *raw,
 	case EVT_ID_CD_FETCH_FAULT:
 	case EVT_ID_VMS_FETCH_FAULT:
 		dev_err(smmu->dev, "event: %s client: %s sid: %#x ssid: %#x fetch_addr: %#llx",
-			ARM_SMMU_LOG_EVT_STR(evt), ARM_SMMU_LOG_CLIENT(evt),
+			ARM_SMMU_LOG_EVT_STR(evt),
+			arm_smmu_fmt_client(evt, clientbuf, ARM_SMMU_CLIENT_LEN),
 			evt->sid, evt->ssid, evt->fetch_addr);
 
 		break;
 
 	default:
 		dev_err(smmu->dev, "event: %s client: %s sid: %#x ssid: %#x",
-			ARM_SMMU_LOG_EVT_STR(evt), ARM_SMMU_LOG_CLIENT(evt),
+			ARM_SMMU_LOG_EVT_STR(evt),
+			arm_smmu_fmt_client(evt, clientbuf, ARM_SMMU_CLIENT_LEN),
 			evt->sid, evt->ssid);
 	}
 }
-- 
2.47.3




Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] iommu/vt-d: Add PCI segment and vendor:device ID to DMAR fault logs
  2026-05-06 15:05 [PATCH 0/3] iommu: Add PCI vendor:device ID to IOMMU fault logs Yigit Oguz
  2026-05-06 15:05 ` [PATCH 1/3] iommu/arm-smmu-v3: Print PCI vendor:device ID in SMMU translation " Yigit Oguz
@ 2026-05-06 15:05 ` Yigit Oguz
  2026-05-07 19:21   ` Pranjal Shrivastava
  2026-05-06 15:05 ` [PATCH 3/3] iommu/amd: Add vendor:device ID to AMD IOMMU event logs Yigit Oguz
  2026-05-08 10:45 ` [PATCH 0/3] iommu: Add PCI vendor:device ID to IOMMU fault logs Robin Murphy
  3 siblings, 1 reply; 11+ messages in thread
From: Yigit Oguz @ 2026-05-06 15:05 UTC (permalink / raw)
  To: joro, will, robin.murphy, baolu.lu, dwmw2, suravee.suthikulpanit
  Cc: jgg, nicolinc, iommu, linux-arm-kernel, linux-kernel, Yigit Oguz,
	Lilit Janpoladyan

Include the full SSSS:BB:DD.F address with PCI segment and
vendor:device ID (VVVV:DDDD) in DMAR fault messages. Uses
iommu->segment for the PCI domain and pci_get_domain_bus_and_slot
to look up the pci_dev. Falls back to segment:BDF without
vendor:device if the device is not found.

This brings Intel IOMMU fault logging in line with the ARM SMMUv3
event decoding, making it easier to identify faulting devices
(e.g. after FLR) without cross-referencing lspci.

Before:
  DMAR: [DMA Write NO_PASID] Request device [86:00.0] fault addr 0xe0000000
	[fault reason 0x05] PTE Write access is not set

After:
  DMAR: [DMA Write NO_PASID] Request device [0000:86:00.0 8086:1533] fault addr 0xe0000000
  	[fault reason 0x05] PTE Write access is not set

Signed-off-by: Yigit Oguz <yigitogu@amazon•de>
Signed-off-by: Lilit Janpoladyan <lilitj@amazon•com>
Assisted-by: Claude:claude-4.6-opus
---
 drivers/iommu/intel/dmar.c | 33 +++++++++++++++++++++------------
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index d33c119a935e..225fa498d714 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1890,30 +1890,39 @@ static int dmar_fault_do_one(struct intel_iommu *iommu, int type,
 {
 	const char *reason;
 	int fault_type;
+	u8 bus = source_id >> 8;
+	u8 devfn = source_id & 0xFF;
+	struct pci_dev *pdev;
+	char devid[48];
 
 	reason = dmar_get_fault_reason(fault_reason, &fault_type);
 
+	pdev = pci_get_domain_bus_and_slot(iommu->segment, bus, devfn);
+	if (pdev) {
+		snprintf(devid, sizeof(devid), "%04x:%02x:%02x.%d %04x:%04x",
+			 iommu->segment, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+			 pdev->vendor, pdev->device);
+		pci_dev_put(pdev);
+	} else {
+		snprintf(devid, sizeof(devid), "%04x:%02x:%02x.%d",
+			 iommu->segment, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
+	}
+
 	if (fault_type == INTR_REMAP) {
-		pr_err("[INTR-REMAP] Request device [%02x:%02x.%d] fault index 0x%llx [fault reason 0x%02x] %s\n",
-		       source_id >> 8, PCI_SLOT(source_id & 0xFF),
-		       PCI_FUNC(source_id & 0xFF), addr >> 48,
-		       fault_reason, reason);
+		pr_err("[INTR-REMAP] Request device [%s] fault index 0x%llx [fault reason 0x%02x] %s\n",
+		       devid, addr >> 48, fault_reason, reason);
 
 		return 0;
 	}
 
 	if (pasid == IOMMU_PASID_INVALID)
-		pr_err("[%s NO_PASID] Request device [%02x:%02x.%d] fault addr 0x%llx [fault reason 0x%02x] %s\n",
+		pr_err("[%s NO_PASID] Request device [%s] fault addr 0x%llx [fault reason 0x%02x] %s\n",
 		       type ? "DMA Read" : "DMA Write",
-		       source_id >> 8, PCI_SLOT(source_id & 0xFF),
-		       PCI_FUNC(source_id & 0xFF), addr,
-		       fault_reason, reason);
+		       devid, addr, fault_reason, reason);
 	else
-		pr_err("[%s PASID 0x%x] Request device [%02x:%02x.%d] fault addr 0x%llx [fault reason 0x%02x] %s\n",
+		pr_err("[%s PASID 0x%x] Request device [%s] fault addr 0x%llx [fault reason 0x%02x] %s\n",
 		       type ? "DMA Read" : "DMA Write", pasid,
-		       source_id >> 8, PCI_SLOT(source_id & 0xFF),
-		       PCI_FUNC(source_id & 0xFF), addr,
-		       fault_reason, reason);
+		       devid, addr, fault_reason, reason);
 
 	dmar_fault_dump_ptes(iommu, source_id, addr, pasid);
 
-- 
2.47.3




Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] iommu/amd: Add vendor:device ID to AMD IOMMU event logs
  2026-05-06 15:05 [PATCH 0/3] iommu: Add PCI vendor:device ID to IOMMU fault logs Yigit Oguz
  2026-05-06 15:05 ` [PATCH 1/3] iommu/arm-smmu-v3: Print PCI vendor:device ID in SMMU translation " Yigit Oguz
  2026-05-06 15:05 ` [PATCH 2/3] iommu/vt-d: Add PCI segment and vendor:device ID to DMAR " Yigit Oguz
@ 2026-05-06 15:05 ` Yigit Oguz
  2026-05-07 19:52   ` Pranjal Shrivastava
  2026-05-08 10:45 ` [PATCH 0/3] iommu: Add PCI vendor:device ID to IOMMU fault logs Robin Murphy
  3 siblings, 1 reply; 11+ messages in thread
From: Yigit Oguz @ 2026-05-06 15:05 UTC (permalink / raw)
  To: joro, will, robin.murphy, baolu.lu, dwmw2, suravee.suthikulpanit
  Cc: jgg, nicolinc, iommu, linux-arm-kernel, linux-kernel, Yigit Oguz,
	Lilit Janpoladyan

Add amd_iommu_devid_str() helper that formats PCI device identity as
SSSS:BB:DD.F VVVV:DDDD by looking up the pci_dev via
pci_get_domain_bus_and_slot. Falls back to SSSS:BB:DD.F when the
device is not found.

Before:
  AMD-Vi: Event logged [IO_PAGE_FAULT device=0000:41:00.0 domain=0x000a
  	address=0xe0000000 flags=0x0020]

After:
  AMD-Vi: Event logged [IO_PAGE_FAULT device=0000:41:00.0 8086:1533 domain=0x000a
  	address=0xe0000000 flags=0x0020]

Signed-off-by: Yigit Oguz <yigitogu@amazon•de>
Signed-off-by: Lilit Janpoladyan <lilitj@amazon•com>
Assisted-by: Claude:claude-4.6-opus
---
 drivers/iommu/amd/iommu.c | 94 ++++++++++++++++++++++++---------------
 1 file changed, 58 insertions(+), 36 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 01171361f9bc..441b4a7e85d5 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -779,11 +779,34 @@ static void dump_command(unsigned long phys_addr)
 		pr_err("CMD[%d]: %08x\n", i, cmd->data[i]);
 }
 
+#define AMD_IOMMU_DEVID_SIZE	48
+
+static void amd_iommu_devid_str(struct amd_iommu *iommu, u16 devid, char *buf,
+				size_t size)
+{
+	struct pci_dev *pdev;
+
+	pdev = pci_get_domain_bus_and_slot(iommu->pci_seg->id,
+					   PCI_BUS_NUM(devid), devid & 0xff);
+	if (pdev) {
+		snprintf(buf, size, "%04x:%02x:%02x.%x %04x:%04x",
+			 iommu->pci_seg->id, PCI_BUS_NUM(devid),
+			 PCI_SLOT(devid), PCI_FUNC(devid),
+			 pdev->vendor, pdev->device);
+		pci_dev_put(pdev);
+	} else {
+		snprintf(buf, size, "%04x:%02x:%02x.%x",
+			 iommu->pci_seg->id, PCI_BUS_NUM(devid),
+			 PCI_SLOT(devid), PCI_FUNC(devid));
+	}
+}
+
 static void amd_iommu_report_rmp_hw_error(struct amd_iommu *iommu, volatile u32 *event)
 {
 	struct iommu_dev_data *dev_data = NULL;
 	int devid, vmg_tag, flags;
 	struct pci_dev *pdev;
+	char devid_str[AMD_IOMMU_DEVID_SIZE];
 	u64 spa;
 
 	devid   = (event[0] >> EVENT_DEVID_SHIFT) & EVENT_DEVID_MASK;
@@ -796,15 +819,16 @@ static void amd_iommu_report_rmp_hw_error(struct amd_iommu *iommu, volatile u32
 	if (pdev)
 		dev_data = dev_iommu_priv_get(&pdev->dev);
 
+	amd_iommu_devid_str(iommu, devid, devid_str, sizeof(devid_str));
+
 	if (dev_data) {
 		if (__ratelimit(&dev_data->rs)) {
-			pci_err(pdev, "Event logged [RMP_HW_ERROR vmg_tag=0x%04x, spa=0x%llx, flags=0x%04x]\n",
-				vmg_tag, spa, flags);
+			pci_err(pdev, "Event logged [RMP_HW_ERROR device=%s vmg_tag=0x%04x, spa=0x%llx, flags=0x%04x]\n",
+				devid_str, vmg_tag, spa, flags);
 		}
 	} else {
-		pr_err_ratelimited("Event logged [RMP_HW_ERROR device=%04x:%02x:%02x.%x, vmg_tag=0x%04x, spa=0x%llx, flags=0x%04x]\n",
-			iommu->pci_seg->id, PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
-			vmg_tag, spa, flags);
+		pr_err_ratelimited("Event logged [RMP_HW_ERROR device=%s vmg_tag=0x%04x, spa=0x%llx, flags=0x%04x]\n",
+			devid_str, vmg_tag, spa, flags);
 	}
 
 	if (pdev)
@@ -816,6 +840,7 @@ static void amd_iommu_report_rmp_fault(struct amd_iommu *iommu, volatile u32 *ev
 	struct iommu_dev_data *dev_data = NULL;
 	int devid, flags_rmp, vmg_tag, flags;
 	struct pci_dev *pdev;
+	char devid_str[AMD_IOMMU_DEVID_SIZE];
 	u64 gpa;
 
 	devid     = (event[0] >> EVENT_DEVID_SHIFT) & EVENT_DEVID_MASK;
@@ -831,13 +856,12 @@ static void amd_iommu_report_rmp_fault(struct amd_iommu *iommu, volatile u32 *ev
 
 	if (dev_data) {
 		if (__ratelimit(&dev_data->rs)) {
-			pci_err(pdev, "Event logged [RMP_PAGE_FAULT vmg_tag=0x%04x, gpa=0x%llx, flags_rmp=0x%04x, flags=0x%04x]\n",
-				vmg_tag, gpa, flags_rmp, flags);
+			pci_err(pdev, "Event logged [RMP_PAGE_FAULT device=%s vmg_tag=0x%04x, gpa=0x%llx, flags_rmp=0x%04x, flags=0x%04x]\n",
+				devid_str, vmg_tag, gpa, flags_rmp, flags);
 		}
 	} else {
-		pr_err_ratelimited("Event logged [RMP_PAGE_FAULT device=%04x:%02x:%02x.%x, vmg_tag=0x%04x, gpa=0x%llx, flags_rmp=0x%04x, flags=0x%04x]\n",
-			iommu->pci_seg->id, PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
-			vmg_tag, gpa, flags_rmp, flags);
+		pr_err_ratelimited("Event logged [RMP_PAGE_FAULT device=%s vmg_tag=0x%04x, gpa=0x%llx, flags_rmp=0x%04x, flags=0x%04x]\n",
+			devid_str, vmg_tag, gpa, flags_rmp, flags);
 	}
 
 	if (pdev)
@@ -856,12 +880,15 @@ static void amd_iommu_report_page_fault(struct amd_iommu *iommu,
 {
 	struct iommu_dev_data *dev_data = NULL;
 	struct pci_dev *pdev;
+	char devid_str[AMD_IOMMU_DEVID_SIZE];
 
 	pdev = pci_get_domain_bus_and_slot(iommu->pci_seg->id, PCI_BUS_NUM(devid),
 					   devid & 0xff);
 	if (pdev)
 		dev_data = dev_iommu_priv_get(&pdev->dev);
 
+	amd_iommu_devid_str(iommu, devid, devid_str, sizeof(devid_str));
+
 	if (dev_data) {
 		/*
 		 * If this is a DMA fault (for which the I(nterrupt)
@@ -872,9 +899,8 @@ static void amd_iommu_report_page_fault(struct amd_iommu *iommu,
 			/* Device not attached to domain properly */
 			if (dev_data->domain == NULL) {
 				pr_err_ratelimited("Event logged [Device not attached to domain properly]\n");
-				pr_err_ratelimited("  device=%04x:%02x:%02x.%x domain=0x%04x\n",
-						   iommu->pci_seg->id, PCI_BUS_NUM(devid), PCI_SLOT(devid),
-						   PCI_FUNC(devid), domain_id);
+				pr_err_ratelimited("  device=%s domain=0x%04x\n",
+						   devid_str, domain_id);
 				goto out;
 			}
 
@@ -887,13 +913,12 @@ static void amd_iommu_report_page_fault(struct amd_iommu *iommu,
 		}
 
 		if (__ratelimit(&dev_data->rs)) {
-			pci_err(pdev, "Event logged [IO_PAGE_FAULT domain=0x%04x address=0x%llx flags=0x%04x]\n",
-				domain_id, address, flags);
+			pci_err(pdev, "Event logged [IO_PAGE_FAULT device=%s domain=0x%04x address=0x%llx flags=0x%04x]\n",
+				devid_str, domain_id, address, flags);
 		}
 	} else {
-		pr_err_ratelimited("Event logged [IO_PAGE_FAULT device=%04x:%02x:%02x.%x domain=0x%04x address=0x%llx flags=0x%04x]\n",
-			iommu->pci_seg->id, PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
-			domain_id, address, flags);
+		pr_err_ratelimited("Event logged [IO_PAGE_FAULT device=%s domain=0x%04x address=0x%llx flags=0x%04x]\n",
+			devid_str, domain_id, address, flags);
 	}
 
 out:
@@ -909,6 +934,7 @@ static void iommu_print_event(struct amd_iommu *iommu, void *__evt)
 	int count = 0;
 	u64 address, ctrl;
 	u32 pasid;
+	char devid_str[AMD_IOMMU_DEVID_SIZE];
 
 retry:
 	type    = (event[1] >> EVENT_TYPE_SHIFT)  & EVENT_TYPE_MASK;
@@ -934,24 +960,22 @@ static void iommu_print_event(struct amd_iommu *iommu, void *__evt)
 		return;
 	}
 
+	amd_iommu_devid_str(iommu, devid, devid_str, sizeof(devid_str));
+
 	switch (type) {
 	case EVENT_TYPE_ILL_DEV:
-		dev_err(dev, "Event logged [ILLEGAL_DEV_TABLE_ENTRY device=%04x:%02x:%02x.%x pasid=0x%05x address=0x%llx flags=0x%04x]\n",
-			iommu->pci_seg->id, PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
-			pasid, address, flags);
+		dev_err(dev, "Event logged [ILLEGAL_DEV_TABLE_ENTRY device=%s pasid=0x%05x address=0x%llx flags=0x%04x]\n",
+			devid_str, pasid, address, flags);
 		dev_err(dev, "Control Reg : 0x%llx\n", ctrl);
 		dump_dte_entry(iommu, devid);
 		break;
 	case EVENT_TYPE_DEV_TAB_ERR:
-		dev_err(dev, "Event logged [DEV_TAB_HARDWARE_ERROR device=%04x:%02x:%02x.%x "
-			"address=0x%llx flags=0x%04x]\n",
-			iommu->pci_seg->id, PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
-			address, flags);
+		dev_err(dev, "Event logged [DEV_TAB_HARDWARE_ERROR device=%s address=0x%llx flags=0x%04x]\n",
+			devid_str, address, flags);
 		break;
 	case EVENT_TYPE_PAGE_TAB_ERR:
-		dev_err(dev, "Event logged [PAGE_TAB_HARDWARE_ERROR device=%04x:%02x:%02x.%x pasid=0x%04x address=0x%llx flags=0x%04x]\n",
-			iommu->pci_seg->id, PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
-			pasid, address, flags);
+		dev_err(dev, "Event logged [PAGE_TAB_HARDWARE_ERROR device=%s pasid=0x%04x address=0x%llx flags=0x%04x]\n",
+			devid_str, pasid, address, flags);
 		break;
 	case EVENT_TYPE_ILL_CMD:
 		dev_err(dev, "Event logged [ILLEGAL_COMMAND_ERROR address=0x%llx]\n", address);
@@ -962,14 +986,12 @@ static void iommu_print_event(struct amd_iommu *iommu, void *__evt)
 			address, flags);
 		break;
 	case EVENT_TYPE_IOTLB_INV_TO:
-		dev_err(dev, "Event logged [IOTLB_INV_TIMEOUT device=%04x:%02x:%02x.%x address=0x%llx]\n",
-			iommu->pci_seg->id, PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
-			address);
+		dev_err(dev, "Event logged [IOTLB_INV_TIMEOUT device=%s address=0x%llx]\n",
+			devid_str, address);
 		break;
 	case EVENT_TYPE_INV_DEV_REQ:
-		dev_err(dev, "Event logged [INVALID_DEVICE_REQUEST device=%04x:%02x:%02x.%x pasid=0x%05x address=0x%llx flags=0x%04x]\n",
-			iommu->pci_seg->id, PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
-			pasid, address, flags);
+		dev_err(dev, "Event logged [INVALID_DEVICE_REQUEST device=%s pasid=0x%05x address=0x%llx flags=0x%04x]\n",
+			devid_str, pasid, address, flags);
 		break;
 	case EVENT_TYPE_RMP_FAULT:
 		amd_iommu_report_rmp_fault(iommu, event);
@@ -980,8 +1002,8 @@ static void iommu_print_event(struct amd_iommu *iommu, void *__evt)
 	case EVENT_TYPE_INV_PPR_REQ:
 		pasid = PPR_PASID(*((u64 *)__evt));
 		tag = event[1] & 0x03FF;
-		dev_err(dev, "Event logged [INVALID_PPR_REQUEST device=%04x:%02x:%02x.%x pasid=0x%05x address=0x%llx flags=0x%04x tag=0x%03x]\n",
-			iommu->pci_seg->id, PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
+		dev_err(dev, "Event logged [INVALID_PPR_REQUEST device=%s pasid=0x%05x address=0x%llx flags=0x%04x tag=0x%03x]\n",
+			devid_str,
 			pasid, address, flags, tag);
 		break;
 	default:
-- 
2.47.3




Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] iommu/arm-smmu-v3: Print PCI vendor:device ID in SMMU translation fault logs
  2026-05-06 15:05 ` [PATCH 1/3] iommu/arm-smmu-v3: Print PCI vendor:device ID in SMMU translation " Yigit Oguz
@ 2026-05-07 17:01   ` Pranjal Shrivastava
  0 siblings, 0 replies; 11+ messages in thread
From: Pranjal Shrivastava @ 2026-05-07 17:01 UTC (permalink / raw)
  To: Yigit Oguz
  Cc: joro, will, robin.murphy, baolu.lu, dwmw2, suravee.suthikulpanit,
	jgg, nicolinc, iommu, linux-arm-kernel, linux-kernel,
	Lilit Janpoladyan

On Wed, May 06, 2026 at 03:05:37PM +0000, Yigit Oguz wrote:
> From: Lilit Janpoladyan <lilitj@amazon•com>
> 
> For translation, address-size, access, and permission faults, look up
> the pci_dev from the event and append the PCI vendor:device ID after
> the device name, e.g.:
> 
>   event: F_TRANSLATION client: 0001:02:02.4 [1d0f:8061] sid: ...
> 
> For non-PCI devices or unassigned SIDs the output is unchanged.
> 
> Signed-off-by: Lilit Janpoladyan <lilitj@amazon•com>
> Signed-off-by: Yigit Oguz <yigitogu@amazon•de>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29 ++++++++++++++++++---
>  1 file changed, 25 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index e8d7dbe495f0..ab1afa36965a 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2213,12 +2213,30 @@ static void arm_smmu_dump_raw_event(struct arm_smmu_device *smmu, u64 *raw,
>  
>  #define ARM_SMMU_EVT_KNOWN(e)	((e)->id < ARRAY_SIZE(event_str) && event_str[(e)->id])
>  #define ARM_SMMU_LOG_EVT_STR(e) ARM_SMMU_EVT_KNOWN(e) ? event_str[(e)->id] : "UNKNOWN"
> -#define ARM_SMMU_LOG_CLIENT(e)	(e)->dev ? dev_name((e)->dev) : "(unassigned sid)"
> +
> +/* "SSSS:BB:DD.F [VVVV:DDDD]\0" — 12 + 1 + 11 + 1 = 25; round to power of 2 */
> +#define ARM_SMMU_CLIENT_LEN	32
> +

Nit: s/ARM_SMMU_CLIENT_LEN/ARM_SMMU_LOG_CLIENT_LEN to maintain the
convention?

> +static const char *arm_smmu_fmt_client(struct arm_smmu_event *e, char *buf, size_t sz)
> +{
> +	struct pci_dev *p;

Minor nit: maybe we could intialized it here?

struct pci_dev *p = to_pci_dev(e->dev);

Not a strong opinion though.

> +
> +	if (!e->dev)
> +		return "(unassigned sid)";
> +	if (!dev_is_pci(e->dev))
> +		return dev_name(e->dev);
> +
> +	p = to_pci_dev(e->dev);
> +	snprintf(buf, sz, "%s [%04x:%04x]", dev_name(e->dev), p->vendor, p->device);
> +	return buf;
> +}
>  
>  static void arm_smmu_dump_event(struct arm_smmu_device *smmu, u64 *raw,
>  				struct arm_smmu_event *evt,
>  				struct ratelimit_state *rs)
>  {
> +	char clientbuf[ARM_SMMU_CLIENT_LEN];

Nit: s/clientbuf/client_str ?

I was able to test this with 7.1-rc1 & it looks good:

   [  106.880820] arm-smmu-v3 9050000.smmuv3: event: F_TRANSLATION client: 0000:00:01.0 [8086:10c9] sid: 0x8 ssid: 0x0 iova: 0xffffc000 ipa: 0x0
   [  106.880855] arm-smmu-v3 9050000.smmuv3: unpriv data read s1 "Input address caused fault" stag: 0x0
   [  106.880894] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
   [  106.880922] arm-smmu-v3 9050000.smmuv3:     0x0000000800000010
   [  106.880948] arm-smmu-v3 9050000.smmuv3:     0x0000020800000000
   [  106.880974] arm-smmu-v3 9050000.smmuv3:     0x00000000ffffc004
   [  106.881001] arm-smmu-v3 9050000.smmuv3:     0x0000000000000000
   [  106.881030] arm-smmu-v3 9050000.smmuv3: event: F_TRANSLATION client: 0000:00:01.0 [8086:10c9] sid: 0x8 ssid: 0x0 iova: 0xffffc004 ipa: 0x0
   [  106.881061] arm-smmu-v3 9050000.smmuv3: unpriv data read s1 "Input address caused fault" stag: 0x0
   [  106.881104] arm-smmu-v3 9050000.smmuv3: event 0x10 received:
   [  106.881136] arm-smmu-v3 9050000.smmuv3:     0x0000000800000010
   [  106.881163] arm-smmu-v3 9050000.smmuv3:     0x0000020800000000
   [  106.881189] arm-smmu-v3 9050000.smmuv3:     0x00000000ffffc008
   [  106.881215] arm-smmu-v3 9050000.smmuv3:     0x0000000000000000

Apart from the nits above:

Reviewed-by: Pranjal Shrivastava <praan@google•com>
Tested-by: Pranjal Shrivastava <praan@google•com>

Thanks,
Praan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] iommu/vt-d: Add PCI segment and vendor:device ID to DMAR fault logs
  2026-05-06 15:05 ` [PATCH 2/3] iommu/vt-d: Add PCI segment and vendor:device ID to DMAR " Yigit Oguz
@ 2026-05-07 19:21   ` Pranjal Shrivastava
  2026-05-22 15:45     ` Oguz, Yigit
  0 siblings, 1 reply; 11+ messages in thread
From: Pranjal Shrivastava @ 2026-05-07 19:21 UTC (permalink / raw)
  To: Yigit Oguz
  Cc: joro, will, robin.murphy, baolu.lu, dwmw2, suravee.suthikulpanit,
	jgg, nicolinc, iommu, linux-arm-kernel, linux-kernel,
	Lilit Janpoladyan

On Wed, May 06, 2026 at 03:05:38PM +0000, Yigit Oguz wrote:
> Include the full SSSS:BB:DD.F address with PCI segment and
> vendor:device ID (VVVV:DDDD) in DMAR fault messages. Uses
> iommu->segment for the PCI domain and pci_get_domain_bus_and_slot
> to look up the pci_dev. Falls back to segment:BDF without
> vendor:device if the device is not found.
> 
> This brings Intel IOMMU fault logging in line with the ARM SMMUv3
> event decoding, making it easier to identify faulting devices
> (e.g. after FLR) without cross-referencing lspci.
> 
> Before:
>   DMAR: [DMA Write NO_PASID] Request device [86:00.0] fault addr 0xe0000000
> 	[fault reason 0x05] PTE Write access is not set
> 
> After:
>   DMAR: [DMA Write NO_PASID] Request device [0000:86:00.0 8086:1533] fault addr 0xe0000000
>   	[fault reason 0x05] PTE Write access is not set
> 
> Signed-off-by: Yigit Oguz <yigitogu@amazon•de>
> Signed-off-by: Lilit Janpoladyan <lilitj@amazon•com>
> Assisted-by: Claude:claude-4.6-opus
> ---
>  drivers/iommu/intel/dmar.c | 33 +++++++++++++++++++++------------
>  1 file changed, 21 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> index d33c119a935e..225fa498d714 100644
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -1890,30 +1890,39 @@ static int dmar_fault_do_one(struct intel_iommu *iommu, int type,
>  {
>  	const char *reason;
>  	int fault_type;
> +	u8 bus = source_id >> 8;
> +	u8 devfn = source_id & 0xFF;
> +	struct pci_dev *pdev;
> +	char devid[48];

Why not have a #define for this like you have for AMD and Arm?

>  
>  	reason = dmar_get_fault_reason(fault_reason, &fault_type);
>  
> +	pdev = pci_get_domain_bus_and_slot(iommu->segment, bus, devfn);

Not an Intel iommu expert, but I have concerns about using 
pci_get_domain_bus_and_slot() in this path.

AFAICT, dmar_fault_do_one() is running in a IRQ context & the pci_get_* 
family of functions iterates the global PCI klist. It eventually calls
bus_to_subsys(), which takes a plain spin_lock(&bus_kset->list_lock) [1]
which isn't IRQ-safe. Same thing with klist_put [2] called in klist_iter_exit

> +	if (pdev) {
> +		snprintf(devid, sizeof(devid), "%04x:%02x:%02x.%d %04x:%04x",
> +			 iommu->segment, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> +			 pdev->vendor, pdev->device);
> +		pci_dev_put(pdev);

Same here, pci_dev_put call put_device which might sleep [3] and hence
shouldn't be called in hard IRQ context.

> +	} else {
> +		snprintf(devid, sizeof(devid), "%04x:%02x:%02x.%d",
> +			 iommu->segment, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
> +	}
> +
>  	if (fault_type == INTR_REMAP) {
> -		pr_err("[INTR-REMAP] Request device [%02x:%02x.%d] fault index 0x%llx [fault reason 0x%02x] %s\n",
> -		       source_id >> 8, PCI_SLOT(source_id & 0xFF),
> -		       PCI_FUNC(source_id & 0xFF), addr >> 48,
> -		       fault_reason, reason);
> +		pr_err("[INTR-REMAP] Request device [%s] fault index 0x%llx [fault reason 0x%02x] %s\n",
> +		       devid, addr >> 48, fault_reason, reason);
>  
>  		return 0;
>  	}
>  

[-------------- >8 -------------------]

Thanks,
Praan

[1] https://elixir.bootlin.com/linux/v7.0.1/source/drivers/base/bus.c#L60
[2] https://elixir.bootlin.com/linux/v7.0.1/source/lib/klist.c#L209
[3] https://elixir.bootlin.com/linux/v7.0.1/source/drivers/base/core.c#L3794



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] iommu/amd: Add vendor:device ID to AMD IOMMU event logs
  2026-05-06 15:05 ` [PATCH 3/3] iommu/amd: Add vendor:device ID to AMD IOMMU event logs Yigit Oguz
@ 2026-05-07 19:52   ` Pranjal Shrivastava
  0 siblings, 0 replies; 11+ messages in thread
From: Pranjal Shrivastava @ 2026-05-07 19:52 UTC (permalink / raw)
  To: Yigit Oguz
  Cc: joro, will, robin.murphy, baolu.lu, dwmw2, suravee.suthikulpanit,
	jgg, nicolinc, iommu, linux-arm-kernel, linux-kernel,
	Lilit Janpoladyan

On Wed, May 06, 2026 at 03:05:39PM +0000, Yigit Oguz wrote:
> Add amd_iommu_devid_str() helper that formats PCI device identity as
> SSSS:BB:DD.F VVVV:DDDD by looking up the pci_dev via
> pci_get_domain_bus_and_slot. Falls back to SSSS:BB:DD.F when the
> device is not found.
> 
> Before:
>   AMD-Vi: Event logged [IO_PAGE_FAULT device=0000:41:00.0 domain=0x000a
>   	address=0xe0000000 flags=0x0020]
> 
> After:
>   AMD-Vi: Event logged [IO_PAGE_FAULT device=0000:41:00.0 8086:1533 domain=0x000a
>   	address=0xe0000000 flags=0x0020]
> 
> Signed-off-by: Yigit Oguz <yigitogu@amazon•de>
> Signed-off-by: Lilit Janpoladyan <lilitj@amazon•com>
> Assisted-by: Claude:claude-4.6-opus
> ---
>  drivers/iommu/amd/iommu.c | 94 ++++++++++++++++++++++++---------------
>  1 file changed, 58 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 01171361f9bc..441b4a7e85d5 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -779,11 +779,34 @@ static void dump_command(unsigned long phys_addr)
>  		pr_err("CMD[%d]: %08x\n", i, cmd->data[i]);
>  }
>  
> +#define AMD_IOMMU_DEVID_SIZE	48
> +
> +static void amd_iommu_devid_str(struct amd_iommu *iommu, u16 devid, char *buf,
> +				size_t size)
> +{
> +	struct pci_dev *pdev;
> +
> +	pdev = pci_get_domain_bus_and_slot(iommu->pci_seg->id,
> +					   PCI_BUS_NUM(devid), devid & 0xff);
> +	if (pdev) {
> +		snprintf(buf, size, "%04x:%02x:%02x.%x %04x:%04x",
> +			 iommu->pci_seg->id, PCI_BUS_NUM(devid),
> +			 PCI_SLOT(devid), PCI_FUNC(devid),
> +			 pdev->vendor, pdev->device);
> +		pci_dev_put(pdev);

From a quick glance it looks like we call this in bottom halves, which
looks fine.

> +	} else {
> +		snprintf(buf, size, "%04x:%02x:%02x.%x",
> +			 iommu->pci_seg->id, PCI_BUS_NUM(devid),
> +			 PCI_SLOT(devid), PCI_FUNC(devid));
> +	}
> +}
> +
>  static void amd_iommu_report_rmp_hw_error(struct amd_iommu *iommu, volatile u32 *event)
>  {
>  	struct iommu_dev_data *dev_data = NULL;
>  	int devid, vmg_tag, flags;
>  	struct pci_dev *pdev;
> +	char devid_str[AMD_IOMMU_DEVID_SIZE];
>  	u64 spa;
>  
>  	devid   = (event[0] >> EVENT_DEVID_SHIFT) & EVENT_DEVID_MASK;
> @@ -796,15 +819,16 @@ static void amd_iommu_report_rmp_hw_error(struct amd_iommu *iommu, volatile u32
>  	if (pdev)
>  		dev_data = dev_iommu_priv_get(&pdev->dev);
>  
> +	amd_iommu_devid_str(iommu, devid, devid_str, sizeof(devid_str));

Will this iterate the global pci dev list for EVERY event? I'm wondering
if we could improve that somehow?

[------------- >8 ---------------]

Thanks,
Praan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] iommu: Add PCI vendor:device ID to IOMMU fault logs
  2026-05-06 15:05 [PATCH 0/3] iommu: Add PCI vendor:device ID to IOMMU fault logs Yigit Oguz
                   ` (2 preceding siblings ...)
  2026-05-06 15:05 ` [PATCH 3/3] iommu/amd: Add vendor:device ID to AMD IOMMU event logs Yigit Oguz
@ 2026-05-08 10:45 ` Robin Murphy
  3 siblings, 0 replies; 11+ messages in thread
From: Robin Murphy @ 2026-05-08 10:45 UTC (permalink / raw)
  To: Yigit Oguz, joro, will, baolu.lu, dwmw2, suravee.suthikulpanit
  Cc: jgg, nicolinc, iommu, linux-arm-kernel, linux-kernel

On 2026-05-06 4:05 pm, Yigit Oguz wrote:
> IOMMU fault and event logs currently identify devices using only their
> PCI segment/bus/device/function (SSSS:BB:DD.F). While mapping a single
> BDF to a device type is straightforward, doing so at scale across many
> hosts and thousands of fault events requires additional tooling and
> manual cross-referencing. Including the vendor:device ID directly in
> the log line makes each event self-contained and immediately actionable
> without any post-processing.

Sorry, but why are unexpected DMA faults happening "at scale" in the 
first place? If you have so many broken drivers that disambiguating them 
needs help from the kernel, something seems fundamentally wrong with 
that picture. Conversely if these are devices assigned to userspace then 
we should perhaps reconsider their ability to spam up the host kernel 
log at will anyway.

I'm not saying I necessarily have anything against this change in 
particular, but it has a strong smell of effort being spent on the wrong 
thing...

(And even then AFAICS it only really helps in the specific scenario of 
having only one of each type of device, otherwise you're back to still 
needing per-system knowledge of how BDFs map to physical instances to 
know what's what.)

Thanks,
Robin.

> This series adds vendor:device ID (VVVV:DDDD) to IOMMU event logs for
> ARM SMMUv3, Intel VT-d and AMD IOMMU.
> 
> Before:
>    arm-smmu-v3 arm-smmu-v3.0.auto: event: F_TRANSLATION client: 0000:2b:11.6
>      sid: 0x158e ssid: 0x0 iova: 0x280000000000 ipa: 0x0
>    DMAR: [DMA Write NO_PASID] Request device [86:00.0] fault addr 0xe0000000
>      [fault reason 0x05] PTE Write access is not set
>    AMD-Vi: Event logged [IO_PAGE_FAULT device=0000:41:00.0 domain=0x000a
>      address=0xe0000000 flags=0x0020]
> 
> After:
>    arm-smmu-v3 arm-smmu-v3.0.auto: event: F_TRANSLATION client: 0000:2b:11.6 [8086:1533]
>      sid: 0x158e ssid: 0x0 iova: 0x280000000000 ipa: 0x0
>    DMAR: [DMA Write NO_PASID] Request device [0000:86:00.0 8086:1533] fault addr 0xe0000000
>      [fault reason 0x05] PTE Write access is not set
>    AMD-Vi: Event logged [IO_PAGE_FAULT device=0000:41:00.0 8086:1533 domain=0x000a
>      address=0xe0000000 flags=0x0020]
> 
> Patch 1 adds vendor:device ID to ARM SMMUv3 translation fault logs.
> Patch 2 adds PCI segment and vendor:device ID to Intel VT-d DMAR
>          fault logs.
> Patch 3 adds a devid_str helper and vendor:device ID to all AMD IOMMU
>          event log paths.
> 
> Testing:
> Build-tested against mainline Linux (torvalds/master).
> 
> Runtime-tested on a custom downstream branch on ARM SMMUv3, Intel VT-d and
> AMD IOMMU hosts. Translation faults were induced in a virtualized setup
> by removing DMA mappings for an in-use region, causing the assigned device's
> subsequent DMA transactions to hit unmapped IOVAs and produce
> translation fault events. The resulting log lines were verified to
> contain the PCI vendor:device ID on all three platforms.
> 
> Lilit Janpoladyan (1):
>    iommu/arm-smmu-v3: Print PCI vendor:device ID in SMMU translation
>      fault logs
> 
> Yigit Oguz (2):
>    iommu/vt-d: Add PCI segment and vendor:device ID to DMAR fault logs
>    iommu/amd: Add vendor:device ID to AMD IOMMU event logs
> 
>   drivers/iommu/amd/iommu.c                   | 94 +++++++++++++--------
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29 ++++++-
>   drivers/iommu/intel/dmar.c                  | 33 +++++---
>   3 files changed, 104 insertions(+), 52 deletions(-)
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] iommu: Add PCI vendor:device ID to IOMMU fault logs
       [not found] <C1C278E8-E5F6-4701-9127-DCDBC64636E1@amazon.de>
@ 2026-05-18 15:52 ` Robin Murphy
  2026-05-18 17:54   ` Jason Gunthorpe
  0 siblings, 1 reply; 11+ messages in thread
From: Robin Murphy @ 2026-05-18 15:52 UTC (permalink / raw)
  To: Oguz, Yigit, joro@8bytes•org, will@kernel•org,
	baolu.lu@linux•intel.com, dwmw2@infradead•org,
	suravee.suthikulpanit@amd•com
  Cc: jgg@ziepe•ca, nicolinc@nvidia•com, iommu@lists•linux.dev,
	linux-arm-kernel@lists•infradead.org,
	linux-kernel@vger•kernel.org

On 18/05/2026 4:19 pm, Oguz, Yigit wrote:
> On 2026-05-08, Robin Murphy wrote:
>> Sorry, but why are unexpected DMA faults happening "at scale" in the
>> first place? If you have so many broken drivers that disambiguating them
>> needs help from the kernel, something seems fundamentally wrong with
>> that picture. Conversely if these are devices assigned to userspace then
>> we should perhaps reconsider their ability to spam up the host kernel
>> log at will anyway.
> 
> The use case is VFIO passthrough environments where translation faults
> show up during device lifecycle operations, mainly around device reset.
> When mappings are torn down and a device still has DMA in flight or
> issues DMA during/after FLR, the IOMMU blocks it and logs the fault.
> This series doesn't change when or whether events get logged, it just
> makes the existing lines more useful for triage when they do fire.
> 
>> I'm not saying I necessarily have anything against this change in
>> particular, but it has a strong smell of effort being spent on the wrong
>> thing...
> 
> Fair point. Whether the faults themselves should be addressed is a
> separate question, but since the kernel already logs them unconditionally,
> making the output more immediately useful seemed like low-hanging fruit.

TBH I think the more appropriate solution would be to have vfio-pci 
register its own fault handler, wherein it can properly deal with 
rate-limiting and/or entirely suppressing fault reports from misbehaving 
userspace, and if and when it does want to log something it is then free 
to do that in whatever format it wants, independent of the underlying 
IOMMU driver.

Thanks,
Robin.

>> (And even then AFAICS it only really helps in the specific scenario of
>> having only one of each type of device, otherwise you're back to still
>> needing per-system knowledge of how BDFs map to physical instances to
>> know what's what.)
> 
> The vendor:device ID answers the first question in triage: "what kind of
> device is this?" Even with multiple instances of the same type, narrowing
> by type cuts down the search space when correlating faults with device
> lifecycle events.
> 
> Thanks,
> Yigit
> 
> 
> On 2026-05-06 4:05 pm, Yigit Oguz wrote:
>> IOMMU fault and event logs currently identify devices using only their
>> PCI segment/bus/device/function (SSSS:BB:DD.F). While mapping a single
>> BDF to a device type is straightforward, doing so at scale across many
>> hosts and thousands of fault events requires additional tooling and
>> manual cross-referencing. Including the vendor:device ID directly in
>> the log line makes each event self-contained and immediately actionable
>> without any post-processing.
> 
> 
> Sorry, but why are unexpected DMA faults happening "at scale" in the
> first place? If you have so many broken drivers that disambiguating them
> needs help from the kernel, something seems fundamentally wrong with
> that picture. Conversely if these are devices assigned to userspace then
> we should perhaps reconsider their ability to spam up the host kernel
> log at will anyway.
> 
> 
> I'm not saying I necessarily have anything against this change in
> particular, but it has a strong smell of effort being spent on the wrong
> thing...
> 
> 
> (And even then AFAICS it only really helps in the specific scenario of
> having only one of each type of device, otherwise you're back to still
> needing per-system knowledge of how BDFs map to physical instances to
> know what's what.)
> 
> 
> Thanks,
> Robin.
> 
> 
>> This series adds vendor:device ID (VVVV:DDDD) to IOMMU event logs for
>> ARM SMMUv3, Intel VT-d and AMD IOMMU.
>>
>> Before:
>> arm-smmu-v3 arm-smmu-v3.0.auto: event: F_TRANSLATION client: 0000:2b:11.6
>> sid: 0x158e ssid: 0x0 iova: 0x280000000000 ipa: 0x0
>> DMAR: [DMA Write NO_PASID] Request device [86:00.0] fault addr 0xe0000000
>> [fault reason 0x05] PTE Write access is not set
>> AMD-Vi: Event logged [IO_PAGE_FAULT device=0000:41:00.0 domain=0x000a
>> address=0xe0000000 flags=0x0020]
>>
>> After:
>> arm-smmu-v3 arm-smmu-v3.0.auto: event: F_TRANSLATION client: 0000:2b:11.6 [8086:1533]
>> sid: 0x158e ssid: 0x0 iova: 0x280000000000 ipa: 0x0
>> DMAR: [DMA Write NO_PASID] Request device [0000:86:00.0 8086:1533] fault addr 0xe0000000
>> [fault reason 0x05] PTE Write access is not set
>> AMD-Vi: Event logged [IO_PAGE_FAULT device=0000:41:00.0 8086:1533 domain=0x000a
>> address=0xe0000000 flags=0x0020]
>>
>> Patch 1 adds vendor:device ID to ARM SMMUv3 translation fault logs.
>> Patch 2 adds PCI segment and vendor:device ID to Intel VT-d DMAR
>> fault logs.
>> Patch 3 adds a devid_str helper and vendor:device ID to all AMD IOMMU
>> event log paths.
>>
>> Testing:
>> Build-tested against mainline Linux (torvalds/master).
>>
>> Runtime-tested on a custom downstream branch on ARM SMMUv3, Intel VT-d and
>> AMD IOMMU hosts. Translation faults were induced in a virtualized setup
>> by removing DMA mappings for an in-use region, causing the assigned device's
>> subsequent DMA transactions to hit unmapped IOVAs and produce
>> translation fault events. The resulting log lines were verified to
>> contain the PCI vendor:device ID on all three platforms.
>>
>> Lilit Janpoladyan (1):
>> iommu/arm-smmu-v3: Print PCI vendor:device ID in SMMU translation
>> fault logs
>>
>> Yigit Oguz (2):
>> iommu/vt-d: Add PCI segment and vendor:device ID to DMAR fault logs
>> iommu/amd: Add vendor:device ID to AMD IOMMU event logs
>>
>> drivers/iommu/amd/iommu.c | 94 +++++++++++++--------
>> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29 ++++++-
>> drivers/iommu/intel/dmar.c | 33 +++++---
>> 3 files changed, 104 insertions(+), 52 deletions(-)
>>
> 
> 
> 
> 
> 
> 
> 
> 
> Amazon Web Services Development Center Germany GmbH
> Tamara-Danz-Str. 13
> 10243 Berlin
> Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
> Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
> Sitz: Berlin
> Ust-ID: DE 365 538 597



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] iommu: Add PCI vendor:device ID to IOMMU fault logs
  2026-05-18 15:52 ` Robin Murphy
@ 2026-05-18 17:54   ` Jason Gunthorpe
  0 siblings, 0 replies; 11+ messages in thread
From: Jason Gunthorpe @ 2026-05-18 17:54 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Oguz, Yigit, joro@8bytes•org, will@kernel•org,
	baolu.lu@linux•intel.com, dwmw2@infradead•org,
	suravee.suthikulpanit@amd•com, nicolinc@nvidia•com,
	iommu@lists•linux.dev, linux-arm-kernel@lists•infradead.org,
	linux-kernel@vger•kernel.org

On Mon, May 18, 2026 at 04:52:57PM +0100, Robin Murphy wrote:

> TBH I think the more appropriate solution would be to have vfio-pci register
> its own fault handler, wherein it can properly deal with rate-limiting
> and/or entirely suppressing fault reports from misbehaving userspace, and if
> and when it does want to log something it is then free to do that in
> whatever format it wants, independent of the underlying IOMMU driver.

+1

Jason


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] iommu/vt-d: Add PCI segment and vendor:device ID to DMAR fault logs
  2026-05-07 19:21   ` Pranjal Shrivastava
@ 2026-05-22 15:45     ` Oguz, Yigit
  0 siblings, 0 replies; 11+ messages in thread
From: Oguz, Yigit @ 2026-05-22 15:45 UTC (permalink / raw)
  To: Pranjal Shrivastava
  Cc: joro@8bytes•org, will@kernel•org, robin.murphy@arm•com,
	baolu.lu@linux•intel.com, dwmw2@infradead•org,
	suravee.suthikulpanit@amd•com, jgg@ziepe•ca, nicolinc@nvidia•com,
	iommu@lists•linux.dev, linux-arm-kernel@lists•infradead.org,
	linux-kernel@vger•kernel.org, Janpoladyan, Lilit, Yigit Oguz,
	Saenz Julienne, Nicolas

> Not an Intel iommu expert, but I have concerns about using
> pci_get_domain_bus_and_slot() in this path.
>
> AFAICT, dmar_fault_do_one() is running in a IRQ context & the pci_get_*
> family of functions iterates the global PCI klist. It eventually calls
> bus_to_subsys(), which takes a plain spin_lock(&bus_kset->list_lock) [1]
> which isn't IRQ-safe. Same thing with klist_put [2] called in klist_iter_exit

Yes, confirmed. bus_to_subsys() takes a non-IRQ-safe spinlock, so this
is indeed broken in hard IRQ context. 

> Same here, pci_dev_put call put_device which might sleep [3] and hence
> shouldn't be called in hard IRQ context.

Agreed.
  
I looked at converting this to request_threaded_irq() so the handler
runs in process context, but the DMAR fault interrupt is registered
early in boot before kthreads exist. Rearranging the boot sequence just
to enrich a log message isn't feasible.

I also considered a manual linear search, walk the PCI bus and device
lists to find the matching BDF. But on systems with hundreds of devices
registered, that's too much time spent in hard IRQ context.

Do you (or anyone on the list) have ideas for a clean way to get
vendor:device id in this context? 

Thanks,
Yigit

On Wed, May 06, 2026 at 03:05:38PM +0000, Yigit Oguz wrote:
> Include the full SSSS:BB:DD.F address with PCI segment and
> vendor:device ID (VVVV:DDDD) in DMAR fault messages. Uses
> iommu->segment for the PCI domain and pci_get_domain_bus_and_slot
> to look up the pci_dev. Falls back to segment:BDF without
> vendor:device if the device is not found.
>
> This brings Intel IOMMU fault logging in line with the ARM SMMUv3
> event decoding, making it easier to identify faulting devices
> (e.g. after FLR) without cross-referencing lspci.
>
> Before:
> DMAR: [DMA Write NO_PASID] Request device [86:00.0] fault addr 0xe0000000
> [fault reason 0x05] PTE Write access is not set
>
> After:
> DMAR: [DMA Write NO_PASID] Request device [0000:86:00.0 8086:1533] fault addr 0xe0000000
> [fault reason 0x05] PTE Write access is not set
>
> Signed-off-by: Yigit Oguz <yigitogu@amazon•de <mailto:yigitogu@amazon•de>>
> Signed-off-by: Lilit Janpoladyan <lilitj@amazon•com <mailto:lilitj@amazon•com>>
> Assisted-by: Claude:claude-4.6-opus
> ---
> drivers/iommu/intel/dmar.c | 33 +++++++++++++++++++++------------
> 1 file changed, 21 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> index d33c119a935e..225fa498d714 100644
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -1890,30 +1890,39 @@ static int dmar_fault_do_one(struct intel_iommu *iommu, int type,
> {
> const char *reason;
> int fault_type;
> + u8 bus = source_id >> 8;
> + u8 devfn = source_id & 0xFF;
> + struct pci_dev *pdev;
> + char devid[48];


Why not have a #define for this like you have for AMD and Arm?


>
> reason = dmar_get_fault_reason(fault_reason, &fault_type);
>
> + pdev = pci_get_domain_bus_and_slot(iommu->segment, bus, devfn);


Not an Intel iommu expert, but I have concerns about using
pci_get_domain_bus_and_slot() in this path.


AFAICT, dmar_fault_do_one() is running in a IRQ context & the pci_get_*
family of functions iterates the global PCI klist. It eventually calls
bus_to_subsys(), which takes a plain spin_lock(&bus_kset->list_lock) [1]
which isn't IRQ-safe. Same thing with klist_put [2] called in klist_iter_exit


> + if (pdev) {
> + snprintf(devid, sizeof(devid), "%04x:%02x:%02x.%d %04x:%04x",
> + iommu->segment, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> + pdev->vendor, pdev->device);
> + pci_dev_put(pdev);


Same here, pci_dev_put call put_device which might sleep [3] and hence
shouldn't be called in hard IRQ context.


> + } else {
> + snprintf(devid, sizeof(devid), "%04x:%02x:%02x.%d",
> + iommu->segment, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
> + }
> +
> if (fault_type == INTR_REMAP) {
> - pr_err("[INTR-REMAP] Request device [%02x:%02x.%d] fault index 0x%llx [fault reason 0x%02x] %s\n",
> - source_id >> 8, PCI_SLOT(source_id & 0xFF),
> - PCI_FUNC(source_id & 0xFF), addr >> 48,
> - fault_reason, reason);
> + pr_err("[INTR-REMAP] Request device [%s] fault index 0x%llx [fault reason 0x%02x] %s\n",
> + devid, addr >> 48, fault_reason, reason);
>
> return 0;
> }
>


[-------------- >8 -------------------]


Thanks,
Praan


[1] https://elixir.bootlin.com/linux/v7.0.1/source/drivers/base/bus.c#L60 <https://elixir.bootlin.com/linux/v7.0.1/source/drivers/base/bus.c#L60>
[2] https://elixir.bootlin.com/linux/v7.0.1/source/lib/klist.c#L209 <https://elixir.bootlin.com/linux/v7.0.1/source/lib/klist.c#L209>
[3] https://elixir.bootlin.com/linux/v7.0.1/source/drivers/base/core.c#L3794 <https://elixir.bootlin.com/linux/v7.0.1/source/drivers/base/core.c#L3794>








Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-05-22 15:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-06 15:05 [PATCH 0/3] iommu: Add PCI vendor:device ID to IOMMU fault logs Yigit Oguz
2026-05-06 15:05 ` [PATCH 1/3] iommu/arm-smmu-v3: Print PCI vendor:device ID in SMMU translation " Yigit Oguz
2026-05-07 17:01   ` Pranjal Shrivastava
2026-05-06 15:05 ` [PATCH 2/3] iommu/vt-d: Add PCI segment and vendor:device ID to DMAR " Yigit Oguz
2026-05-07 19:21   ` Pranjal Shrivastava
2026-05-22 15:45     ` Oguz, Yigit
2026-05-06 15:05 ` [PATCH 3/3] iommu/amd: Add vendor:device ID to AMD IOMMU event logs Yigit Oguz
2026-05-07 19:52   ` Pranjal Shrivastava
2026-05-08 10:45 ` [PATCH 0/3] iommu: Add PCI vendor:device ID to IOMMU fault logs Robin Murphy
     [not found] <C1C278E8-E5F6-4701-9127-DCDBC64636E1@amazon.de>
2026-05-18 15:52 ` Robin Murphy
2026-05-18 17:54   ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox