public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: "Shawn O. Pearce" <spearce@spearce•org>
To: Marc Strapetz <marc.strapetz@syntevo•com>
Cc: EGit developer discussion <egit-dev@eclipse•org>,
	git@vger•kernel.org, robin.rosenberg@dewire•com
Subject: Re: [egit-dev] Re: jgit problems for file paths with non-ASCII characters
Date: Thu, 26 Nov 2009 12:03:35 -0800	[thread overview]
Message-ID: <20091126200335.GW11919@spearce.org> (raw)
In-Reply-To: <4B0E8FF2.8040206@syntevo.com>

Marc Strapetz <marc.strapetz@syntevo•com> wrote:
> > We should try to work harder with the git-core folks to get character
> > set encoding for file names worked out.  We might be able to use a
> > configuration setting in the repository to tell us what the proper
> > encoding should be, and if not set, assume UTF-8.
> 
> I agree that this should be the ultimate goal, though the default should
> better be "system encoding" for compatibility with current git
> repositories and instead have newer git versions always set encoding to
> UTF-8. Thus, for our jgit clone I've introduced a system property to
> configure Constants.PATH_ENCODING set to system encoding. It's used by
> PathFilter and this resolves my original problem.

That's probably a good point, using the system encoding on a
repository may produce the file names in a more compatible way
with git-core.  But we probably don't want the encoding to be a
single encoding constant in this JVM, we probably need to support
a per-repository configuration of the encoding for path names so
that we can eventually move to a non-platform specific encoding.

> I have tried to switch more usages from Constants.CHARACTER_ENCODING to
> Constants.PATH_ENCODING, but ended up in confusion due to my lack of
> understanding: primarily because I couldn't tell anymore whether encoded
> strings were file names or not.

Heh.  Yea.  There are a number of file name encoding sites.  I think
everything in the treewalk package, as well as the GitIndex, Tree and
DirCache* classes.  Also the Patch class and its FileHeader friend.

> Does it make sense to explicitly
> distinguish encoding usages in that way? We could try to contribute here
> (and hopefully cause less review effort to jgit developers than the
> changes itself are worth ;-)

Yes, it does.  Because we eventually need to support encodings
other than the current UTF-8 we assume for file names, especially
if a repository is using the local filesystem encoding and that
isn't UTF-8.

-- 
Shawn.

      reply	other threads:[~2009-11-26 20:03 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-25 13:47 jgit problems for file paths with non-ASCII characters Marc Strapetz
2009-11-25 21:11 ` Robin Rosenberg
2009-11-26  0:54   ` [egit-dev] " Shawn O. Pearce
2009-11-26 13:09     ` Thomas Singer
2009-11-26 14:47       ` Johannes Schindelin
2009-11-26 15:31         ` Thomas Singer
2009-11-26 19:57           ` Shawn O. Pearce
2009-11-26 16:44       ` Robin Rosenberg
2009-11-26 14:25     ` Marc Strapetz
2009-11-26 20:03       ` Shawn O. Pearce [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091126200335.GW11919@spearce.org \
    --to=spearce@spearce$(echo .)org \
    --cc=egit-dev@eclipse$(echo .)org \
    --cc=git@vger$(echo .)kernel.org \
    --cc=marc.strapetz@syntevo$(echo .)com \
    --cc=robin.rosenberg@dewire$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox