public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks•im>
To: Joey Hess <id@joeyh•name>
Cc: git@vger•kernel.org
Subject: Re: infelicities in git hash-object --stdin-paths with special characters
Date: Thu, 5 Dec 2024 10:44:55 +0100	[thread overview]
Message-ID: <Z1F2F58coUV7hpak@pks.im> (raw)
In-Reply-To: <Z03xM9AvbUpqXpkI@kitenet.net>

On Mon, Dec 02, 2024 at 01:41:07PM -0400, Joey Hess wrote:
> Apparently "Icon\r" is a common filename on OSX, anyway it's a legal
> unix filename. It seems that sending a line containing that filename to
> git hash-object --stdin-paths triggers some DOS-style CRLF handling.
> Here I am running git version 2.45.2 on Linux.
> 
> $ touch Icon^M
> $ printf 'Icon\r\n' | git hash-object --stdin-paths
> fatal: could not open 'Icon' for reading: No such file or directory
> 
> $ echo 'wrong file!' > Icon
> $ printf 'Icon\r\n' | git hash-object --stdin-paths
> 1c43b74a7787621318ee7442eb5a36e32476f326
> 
> While looking at builtin/hash-object.c to see why it might do this, I quickly
> noticed another odd behavior:
> 
> $ touch '"foo"'
> $ printf '"foo"\n' | git hash-object --stdin-paths
> fatal: could not open 'foo' for reading: No such file or directory
> 
> $ touch '"foo'
> $ printf '"foo\n' | git hash-object --stdin-paths
> fatal: line is badly quoted
> 
> The documentation does not seem to mention that quoted lines in
> --stdin-paths are at all special. Of course, quoting would be one way to
> work around the CRLF problem, if it were documented.

Indeed -- the documentation does not meniton quoting at all, but we do
use `unquote_c_style()` to parse paths. So the following works:

    $ echo foobar >"$(printf 'something\n\rsomething')"
    $ printf 'something\n\rsomething' | git hash-object --stdin-paths
    fatal: could not open 'something' for reading: No such file or directory
    $ printf '"something\\n\\rsomething"' | git hash-object --stdin-paths
    323fae03f4606ea9991df8befbb2fca795e648fa

Note that you have to escape both "\n" and "\r", and then Git handles
unquoting for you. This really needs documentation though.

> It seems that some parts of git that read filenames from stdin use
> strbuf_getline_lf and others use strbuf_getdelim_strip_crlf. There does
> not seem to be any consistency, and my impression is any user is best
> off using -z, when the command supports it, to avoid the mess.
> 
> Given all that, maybe adding -z to hash-object would be a good "fix".

I think this is a good idea regardless of whether we document the
quoting behaviour or not. It is way easier for programs to embed NUL
characters than having to handle the quoting rules implemented by Git.

Patrick

      reply	other threads:[~2024-12-05  9:45 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-02 17:41 infelicities in git hash-object --stdin-paths with special characters Joey Hess
2024-12-05  9:44 ` Patrick Steinhardt [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z1F2F58coUV7hpak@pks.im \
    --to=ps@pks$(echo .)im \
    --cc=git@vger$(echo .)kernel.org \
    --cc=id@joeyh$(echo .)name \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox