On Wed, 2017-09-20 at 12:54 -0700, Kees Cook wrote: > On Wed, Sep 20, 2017 at 12:40 AM, Abdul Haleem > wrote: > > On Tue, 2017-09-12 at 12:11 +0530, abdul wrote: > >> Hi, > >> > >> Memory hot-unplug on PowerVM LPAR running next-20170911 results in > >> Faulting instruction address: 0xc0000000002b56c4 > >> > >> which maps to the below code path: > >> > >> 0xc0000000002b56c4 is in __rmqueue (./include/linux/list.h:104). > >> 99 * This is only for internal list manipulation where we know > >> 100 * the prev/next entries already! > >> 101 */ > >> 102 static inline void __list_del(struct list_head * prev, struct > >> list_head * next) > >> 103 { > >> 104 next->prev = prev; > >> 105 WRITE_ONCE(prev->next, next); > >> 106 } > >> 107 > >> 108 /** > >> > > > > I see another kernel Oops when running transparent hugepages > > de-fragmentation test. > > > > And the faulty instruction address again pointing to same code line > > 0xc00000000026f9f4 is in compaction_alloc (./include/linux/list.h:104) > > > > steps to recreate: > > ----------------- > > 1. Enable transparent hugepages ("always") > > 2. Turn off the defrag $ echo 0 > khugepaged/defrag > > 3. Write random to memory path > > 4. Set huge pages numbers > > 5. Turn on defrag $ echo 1 > khugepaged/defrag > > > > > > new trace: > > ---------- > > Unable to handle kernel paging request for data at address > > 0x5deadbeef0000108 > > This looks like use-after-list-removal, that value appears to be LIST_POISON1. > > Try enabling CONFIG_DEBUG_LIST to see if you get better details? With above config enabled I see below messages and also call traces. But no kernel Oops. BUG: Bad page state in process drmgr pfn:770c7 page:f000000001dc31c0 count:0 mapcount:0 mapping:f000000001dc31c8 index:0x1 flags: 0x33ffff800000000() raw: 033ffff800000000 f000000001dc31c8 0000000000000001 00000000ffffffff raw: 5deadbeef0000100 5deadbeef0000200 0000000000000000 0000000000000000 page dumped because: non-NULL mapping -- Regard's Abdul Haleem IBM Linux Technology Centre