public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber•org>
To: "André Pribil" <a.pribil@beck-ipc•com>
Cc: "netdev@vger•kernel.org" <netdev@vger•kernel.org>
Subject: Re: Deadlock with restart_syscall()
Date: Fri, 27 Jul 2018 08:53:51 -0700	[thread overview]
Message-ID: <20180727085351.36210a12@xeon-e3> (raw)
In-Reply-To: <681500CE65202E47A192754B01DAB4673BE3D87D8D@SDE12.beckipc.net>

On Mon, 16 Jul 2018 09:31:06 +0200
André Pribil <a.pribil@beck-ipc•com> wrote:

> Hello,
> 
> I'm using kernel 4.14.52-rt34 on a single core ARM system and I'm seeing a 
> deadlock inside the kernel when two RT processes make calls in the right 
> temporal distance. The first process is trying to bring the Ethernet interface 
> up, with the SIOCGIFFLAGS ioctl(). The second process is checking the Ethernet 
> carrier, speed and duplex status, by reading e.g. "/sys/class/net/eth1/speed".
> 
> The first process finally gets to phy_poll_reset() in 
> drivers/net/phy/phy_device.c, where it calls msleep(50). 
> It never returns from the sleep.
> 
> The second process gets to speed_show() in net/core/net-sysfs.c. It tries to get
> the RTNL lock with rtnl_trylock(), but fails and calls restart_syscall(). 
> This happens over and over again.
> 
> It seems like the first process in no longer scheduled and cannot release the
> RTNL lock, while the second process is busy restarting the syscall. The first 
> process has a higher RT priority than the second process.
>                                                          
> Just for testing I've added the TIF_NEED_RESCHED flag to the restart_syscall() 
> function and I did not see the deadlock again with this change.
> 
> static inline int restart_syscall(void)
> {
> 	set_tsk_thread_flag(current, TIF_SIGPENDING | TIF_NEED_RESCHED);
> 	return -ERESTARTNOINTR;
> }
> 
> As a second test I released the RTNL lock while calling msleep() in 
> phy_poll_reset(). This also made the problem disappear.
> 
> I've found this thread, where a similar issue with restart_syscall() has been 
> reported:
> https://www.spinics.net/lists/netdev/msg415144.html
> 
> Any ideas how to fix this issue?
> 
> Andre   

Don't do control operations from RT processes!
There can be cases of priority inversion where RT process is waiting for
something that requires a kthread to complete the operation.

  parent reply	other threads:[~2018-07-27 17:16 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-16  7:31 Deadlock with restart_syscall() André Pribil
2018-07-27  9:31 ` André Pribil
2018-07-27 15:53 ` Stephen Hemminger [this message]
2018-07-30  8:08   ` André Pribil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180727085351.36210a12@xeon-e3 \
    --to=stephen@networkplumber$(echo .)org \
    --cc=a.pribil@beck-ipc$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox