public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: "Shawn O. Pearce" <spearce@spearce•org>
To: EGit developer discussion <egit-dev@eclipse•org>
Cc: Marc Strapetz <marc.strapetz@syntevo•com>, git@vger•kernel.org
Subject: Re: [egit-dev] Re: jgit problems for file paths with non-ASCII characters
Date: Wed, 25 Nov 2009 16:54:23 -0800	[thread overview]
Message-ID: <20091126005423.GM11919@spearce.org> (raw)
In-Reply-To: <200911252211.55137.robin.rosenberg@dewire.com>

Robin Rosenberg <robin.rosenberg@dewire•com> wrote:
> onsdag 25 november 2009 14:47:25 skrev  Marc Strapetz:
> > I have noticed that jgit converts file paths to UTF-8 when querying the
> > repository.
...
> > Is this a bug or a misconfiguration of my repository? I'm using jgit
> > (commit e16af839e8a0cc01c52d3648d2d28e4cb915f80f) on Windows.
> 
> A bug. 
> 
> The problem here is that we need to allow multiple encodings since there
> is no reliable encoding specified anywhere.

This is a design fault of both Linux and git.  git gets a byte
sequence from readdir and stores that as-is into the repository.
We have no way of knowing what that encoding is.  So now everyone
touching a Git repository is screwed.

> The approach I advocate is
> the one we use for handling encoding in general. I.e. if it looks like UTF-8,
> treat it like that else fallback. This is expensive however

We should try to work harder with the git-core folks to get character
set encoding for file names worked out.  We might be able to use a
configuration setting in the repository to tell us what the proper
encoding should be, and if not set, assume UTF-8.

> and then we have
> all the other issues with case insensitive name and the funny property that
> unicode has when it allows characters to be encoding using multiple sequences
> of code points as empoloyed by Apple.

But as you said, this still doesn't make the Apple normal form
any easier.  Though if we know we are on such a strange filesystem
we might be able to assume the paths in the repository are equally
damaged.  Or not.

-- 
Shawn.

  reply	other threads:[~2009-11-26  0:54 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-25 13:47 jgit problems for file paths with non-ASCII characters Marc Strapetz
2009-11-25 21:11 ` Robin Rosenberg
2009-11-26  0:54   ` Shawn O. Pearce [this message]
2009-11-26 13:09     ` [egit-dev] " Thomas Singer
2009-11-26 14:47       ` Johannes Schindelin
2009-11-26 15:31         ` Thomas Singer
2009-11-26 19:57           ` Shawn O. Pearce
2009-11-26 16:44       ` Robin Rosenberg
2009-11-26 14:25     ` Marc Strapetz
2009-11-26 20:03       ` Shawn O. Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091126005423.GM11919@spearce.org \
    --to=spearce@spearce$(echo .)org \
    --cc=egit-dev@eclipse$(echo .)org \
    --cc=git@vger$(echo .)kernel.org \
    --cc=marc.strapetz@syntevo$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox