From: "Oguz, Yigit" <yigitogu@amazon•de>
To: Pranjal Shrivastava <praan@google•com>
Cc: "joro@8bytes•org" <joro@8bytes•org>,
"will@kernel•org" <will@kernel•org>,
"robin.murphy@arm•com" <robin.murphy@arm•com>,
"baolu.lu@linux•intel.com" <baolu.lu@linux•intel.com>,
"dwmw2@infradead•org" <dwmw2@infradead•org>,
"suravee.suthikulpanit@amd•com" <suravee.suthikulpanit@amd•com>,
"jgg@ziepe•ca" <jgg@ziepe•ca>,
"nicolinc@nvidia•com" <nicolinc@nvidia•com>,
"iommu@lists•linux.dev" <iommu@lists•linux.dev>,
"linux-arm-kernel@lists•infradead.org"
<linux-arm-kernel@lists•infradead.org>,
"linux-kernel@vger•kernel.org" <linux-kernel@vger•kernel.org>,
"Janpoladyan, Lilit" <lilitj@amazon•de>,
Yigit Oguz <yigit.oguz2000@gmail•com>,
"Saenz Julienne, Nicolas" <nsaenz@amazon•es>
Subject: Re: [PATCH 2/3] iommu/vt-d: Add PCI segment and vendor:device ID to DMAR fault logs
Date: Fri, 22 May 2026 15:45:18 +0000 [thread overview]
Message-ID: <C0E7D57F-B7AF-42EC-BF22-C5DC43DBB022@amazon.de> (raw)
In-Reply-To: <afzmQ6FieJ2YIt9Y@google.com>
> Not an Intel iommu expert, but I have concerns about using
> pci_get_domain_bus_and_slot() in this path.
>
> AFAICT, dmar_fault_do_one() is running in a IRQ context & the pci_get_*
> family of functions iterates the global PCI klist. It eventually calls
> bus_to_subsys(), which takes a plain spin_lock(&bus_kset->list_lock) [1]
> which isn't IRQ-safe. Same thing with klist_put [2] called in klist_iter_exit
Yes, confirmed. bus_to_subsys() takes a non-IRQ-safe spinlock, so this
is indeed broken in hard IRQ context.
> Same here, pci_dev_put call put_device which might sleep [3] and hence
> shouldn't be called in hard IRQ context.
Agreed.
I looked at converting this to request_threaded_irq() so the handler
runs in process context, but the DMAR fault interrupt is registered
early in boot before kthreads exist. Rearranging the boot sequence just
to enrich a log message isn't feasible.
I also considered a manual linear search, walk the PCI bus and device
lists to find the matching BDF. But on systems with hundreds of devices
registered, that's too much time spent in hard IRQ context.
Do you (or anyone on the list) have ideas for a clean way to get
vendor:device id in this context?
Thanks,
Yigit
On Wed, May 06, 2026 at 03:05:38PM +0000, Yigit Oguz wrote:
> Include the full SSSS:BB:DD.F address with PCI segment and
> vendor:device ID (VVVV:DDDD) in DMAR fault messages. Uses
> iommu->segment for the PCI domain and pci_get_domain_bus_and_slot
> to look up the pci_dev. Falls back to segment:BDF without
> vendor:device if the device is not found.
>
> This brings Intel IOMMU fault logging in line with the ARM SMMUv3
> event decoding, making it easier to identify faulting devices
> (e.g. after FLR) without cross-referencing lspci.
>
> Before:
> DMAR: [DMA Write NO_PASID] Request device [86:00.0] fault addr 0xe0000000
> [fault reason 0x05] PTE Write access is not set
>
> After:
> DMAR: [DMA Write NO_PASID] Request device [0000:86:00.0 8086:1533] fault addr 0xe0000000
> [fault reason 0x05] PTE Write access is not set
>
> Signed-off-by: Yigit Oguz <yigitogu@amazon•de <mailto:yigitogu@amazon•de>>
> Signed-off-by: Lilit Janpoladyan <lilitj@amazon•com <mailto:lilitj@amazon•com>>
> Assisted-by: Claude:claude-4.6-opus
> ---
> drivers/iommu/intel/dmar.c | 33 +++++++++++++++++++++------------
> 1 file changed, 21 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> index d33c119a935e..225fa498d714 100644
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -1890,30 +1890,39 @@ static int dmar_fault_do_one(struct intel_iommu *iommu, int type,
> {
> const char *reason;
> int fault_type;
> + u8 bus = source_id >> 8;
> + u8 devfn = source_id & 0xFF;
> + struct pci_dev *pdev;
> + char devid[48];
Why not have a #define for this like you have for AMD and Arm?
>
> reason = dmar_get_fault_reason(fault_reason, &fault_type);
>
> + pdev = pci_get_domain_bus_and_slot(iommu->segment, bus, devfn);
Not an Intel iommu expert, but I have concerns about using
pci_get_domain_bus_and_slot() in this path.
AFAICT, dmar_fault_do_one() is running in a IRQ context & the pci_get_*
family of functions iterates the global PCI klist. It eventually calls
bus_to_subsys(), which takes a plain spin_lock(&bus_kset->list_lock) [1]
which isn't IRQ-safe. Same thing with klist_put [2] called in klist_iter_exit
> + if (pdev) {
> + snprintf(devid, sizeof(devid), "%04x:%02x:%02x.%d %04x:%04x",
> + iommu->segment, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> + pdev->vendor, pdev->device);
> + pci_dev_put(pdev);
Same here, pci_dev_put call put_device which might sleep [3] and hence
shouldn't be called in hard IRQ context.
> + } else {
> + snprintf(devid, sizeof(devid), "%04x:%02x:%02x.%d",
> + iommu->segment, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
> + }
> +
> if (fault_type == INTR_REMAP) {
> - pr_err("[INTR-REMAP] Request device [%02x:%02x.%d] fault index 0x%llx [fault reason 0x%02x] %s\n",
> - source_id >> 8, PCI_SLOT(source_id & 0xFF),
> - PCI_FUNC(source_id & 0xFF), addr >> 48,
> - fault_reason, reason);
> + pr_err("[INTR-REMAP] Request device [%s] fault index 0x%llx [fault reason 0x%02x] %s\n",
> + devid, addr >> 48, fault_reason, reason);
>
> return 0;
> }
>
[-------------- >8 -------------------]
Thanks,
Praan
[1] https://elixir.bootlin.com/linux/v7.0.1/source/drivers/base/bus.c#L60 <https://elixir.bootlin.com/linux/v7.0.1/source/drivers/base/bus.c#L60>
[2] https://elixir.bootlin.com/linux/v7.0.1/source/lib/klist.c#L209 <https://elixir.bootlin.com/linux/v7.0.1/source/lib/klist.c#L209>
[3] https://elixir.bootlin.com/linux/v7.0.1/source/drivers/base/core.c#L3794 <https://elixir.bootlin.com/linux/v7.0.1/source/drivers/base/core.c#L3794>
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
next prev parent reply other threads:[~2026-05-22 15:45 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-06 15:05 [PATCH 0/3] iommu: Add PCI vendor:device ID to IOMMU fault logs Yigit Oguz
2026-05-06 15:05 ` [PATCH 1/3] iommu/arm-smmu-v3: Print PCI vendor:device ID in SMMU translation " Yigit Oguz
2026-05-07 17:01 ` Pranjal Shrivastava
2026-05-06 15:05 ` [PATCH 2/3] iommu/vt-d: Add PCI segment and vendor:device ID to DMAR " Yigit Oguz
2026-05-07 19:21 ` Pranjal Shrivastava
2026-05-22 15:45 ` Oguz, Yigit [this message]
2026-05-06 15:05 ` [PATCH 3/3] iommu/amd: Add vendor:device ID to AMD IOMMU event logs Yigit Oguz
2026-05-07 19:52 ` Pranjal Shrivastava
2026-05-08 10:45 ` [PATCH 0/3] iommu: Add PCI vendor:device ID to IOMMU fault logs Robin Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=C0E7D57F-B7AF-42EC-BF22-C5DC43DBB022@amazon.de \
--to=yigitogu@amazon$(echo .)de \
--cc=baolu.lu@linux$(echo .)intel.com \
--cc=dwmw2@infradead$(echo .)org \
--cc=iommu@lists$(echo .)linux.dev \
--cc=jgg@ziepe$(echo .)ca \
--cc=joro@8bytes$(echo .)org \
--cc=lilitj@amazon$(echo .)de \
--cc=linux-arm-kernel@lists$(echo .)infradead.org \
--cc=linux-kernel@vger$(echo .)kernel.org \
--cc=nicolinc@nvidia$(echo .)com \
--cc=nsaenz@amazon$(echo .)es \
--cc=praan@google$(echo .)com \
--cc=robin.murphy@arm$(echo .)com \
--cc=suravee.suthikulpanit@amd$(echo .)com \
--cc=will@kernel$(echo .)org \
--cc=yigit.oguz2000@gmail$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox