From: Stewart Smith <stewart@linux•vnet.ibm.com>
To: Michael Ellerman <mpe@ellerman•id.au>,
Vipin K Parashar <vipin@linux•vnet.ibm.com>,
linuxppc-dev@lists•ozlabs.org
Subject: Re: [PATCH] powernv/opal: Handle OPAL_WRONG_STATE error from OPAL fails
Date: Thu, 23 Feb 2017 14:52:33 +1100 [thread overview]
Message-ID: <871supftry.fsf@linux.vnet.ibm.com> (raw)
In-Reply-To: <87r32th2rt.fsf@concordia.ellerman.id.au>
Michael Ellerman <mpe@ellerman•id.au> writes:
> Stewart Smith <stewart@linux•vnet.ibm.com> writes:
>
>> Vipin K Parashar <vipin@linux•vnet.ibm.com> writes:
>>> On Monday 13 February 2017 06:13 AM, Michael Ellerman wrote:
>>>> Vipin K Parashar <vipin@linux•vnet.ibm.com> writes:
>>>>
>>>>> OPAL returns OPAL_WRONG_STATE for XSCOM operations
>>>>>
>>>>> done to read any core FIR which is sleeping, offline.
>>>> OK.
>>>>
>>>> Do we know why Linux is causing that to happen?
>>>
>>> This issue is originally seen upon running STAF (Software Test
>>> Automation Framework) stress tests and off-lining some cores
>>> with stress tests running.
>>>
>>> It can also be re-created after off-lining few cores and following
>>> one of below methods.
>>> 1. Executing Linux "sensors" command
>>> 2. Reading contents of file /sys/class/hwmon/hwmon0/tempX_input,
>>> where X is offline CPU.
>>>
>>> Its "opal_get_sensor_data" Linux API that that triggers
>>> OPAL call "opal_sensor_read", performing XSCOM ops here.
>>> If core is found sleeping/offline Linux throws up
>>> "opal_error_code: Unexpected OPAL error" error onto console.
>>>
>>> Currently Linux isn't aware about OPAL_WRONG_STATE return code
>>> from OPAL. Thus it prints "Unexpected OPAL error" message, same
>>> as it would log for any unknown OPAL return codes.
>>>
>>> Seeing this error over console has been a concern for Test and
>>> would puzzle real user as well. This patch makes Linux aware about
>>> OPAL_WRONG_STATE return code from OPAL and stops printing
>>> "Unexpected OPAL error" message onto console for OPAL fails
>>> with OPAL_WRONG_STATE
>>
>> Ahh... so this is a DTS sensor, which indeed is just XSCOMs and we
>> return the xscom_read return code in event of error.
>>
>> I would argue that converting to EIO in that instance is probably
>> correct... or EAGAIN? EAGAIN may be more correct in the situation where
>> the core is just sleeping.
>>
>> What kind of offlining are you doing?
>>
>> Arguably, the correct behaviour would be to remove said sensors when the
>> core is offline.
>
> Right, that would be ideal. There appear to be at least two other hwmon
> drivers that are CPU hotplug aware (coretemp and via-cputemp).
>
> But perhaps it's not possible to work out which sensors are attached to
> which CPU etc., I haven't looked in detail.
Each core-temp@ sensor has a ibm,pir property, so linking back to what
core shouldn't be too hard. For mem-temp@ sensors, we have the chip-id.
> In that case changing just opal_get_sensor_data() to handle
> OPAL_WRONG_STATE would be OK, with a comment explaining that we might be
> asked to read a sensor on an offline CPU and we aren't able to detect
> that.
Agree.
--
Stewart Smith
OPAL Architect, IBM.
next prev parent reply other threads:[~2017-02-23 3:52 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-20 14:16 [PATCH] powernv/opal: Handle OPAL_WRONG_STATE error from OPAL fails Vipin K Parashar
2016-12-21 5:24 ` Mukesh Ojha
2017-01-27 0:17 ` Michael Ellerman
2017-01-27 6:48 ` Vipin K Parashar
2017-02-13 0:43 ` Michael Ellerman
2017-02-15 5:01 ` Stewart Smith
2017-02-15 20:12 ` Vipin K Parashar
2017-02-16 0:52 ` Stewart Smith
2017-02-20 5:03 ` Michael Ellerman
2017-02-23 3:52 ` Stewart Smith [this message]
2017-02-28 9:20 ` Vipin K Parashar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871supftry.fsf@linux.vnet.ibm.com \
--to=stewart@linux$(echo .)vnet.ibm.com \
--cc=linuxppc-dev@lists$(echo .)ozlabs.org \
--cc=mpe@ellerman$(echo .)id.au \
--cc=vipin@linux$(echo .)vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox