public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail•com>
To: "Ian Clarke" <ian.clarke@gmail•com>
Cc: git@vger•kernel.org
Subject: Re: A better approach to diffing and merging
Date: Mon, 01 Dec 2008 03:41:38 -0800 (PST)	[thread overview]
Message-ID: <m3y6z0i0mu.fsf@localhost.localdomain> (raw)
In-Reply-To: <823242bd0811291012g15c4d442qa5d7afc9cc762b20@mail.gmail.com>

"Ian Clarke" <ian.clarke@gmail•com> writes:

> Apologies if this is off-topic, but I recently had an idea for a
> better way to do diffs and merging which I thought may be of interest
> to those on this list.

[...]
> While I'm no merging expert, it seems that most merging algorithms do
> it on a line-by-line basis, treating source code as nothing but a list
> of lines of text.  It got me thinking, what if the merging algorithm
> understood the structure of the source code it is trying to merge?
> 
> So the idea is this:
> 
> Provide the merge algorithm with the grammar of the programming
> language, perhaps in the form of a Bison grammar file, or some other
> standardized way to represent a grammar.
> 
> The merge algorithm then uses this to parse the files to be diffed
> and/or merged into trees, and then the diff and merge are treated as
> operations on these trees.  These operations may include creating,
> deleting, or moving nodes or branches, renaming nodes, etc.  There has
> been quite a bit (pdf) of academic research on this topic, although I
> haven't yet found off-the-shelf code that will do what we need.

First, as Brian Dessent said it would be hard to generate parse tree
in the presence of compile-time configuration (using preprocessor
in C/C++, but in principle this applies to programs in any language;
not only you have to know conditionals, but also compile options).
And for dynamic languages you would have to take care about
self-modifying programs.

Second, from what I understand we have _good_, established algorithms
for merging sequences (which includes sequence of lines, or sequence
of words), and for merging special kinds of trees that are
representations of directory structure.  I haven't read link to
mentioned research, but I think that it is still unproven research,
and not something well established and well tested.

Third, it would require embedding knowledge about various programming
languages (including C, shell, Perl, TeX) and document formats
(including XML, HTML, AsciiDoc) in version control system...

> Still, it shouldn't be terribly hard to implement.

So, try to provide us with some proof-of-concept patches, then.
-- 
Jakub Narebski
Poland
ShadeHawk on #git

  parent reply	other threads:[~2008-12-01 11:42 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-29 18:12 A better approach to diffing and merging Ian Clarke
2008-11-29 23:40 ` Boyd Stephen Smith Jr.
2008-11-30  2:54   ` Miklos Vajna
2008-11-30  1:56 ` Brian Dessent
2008-12-01  9:54   ` Karl Hasselström
2008-12-01 11:41 ` Jakub Narebski [this message]
2008-12-02  8:37   ` Karl Hasselström

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m3y6z0i0mu.fsf@localhost.localdomain \
    --to=jnareb@gmail$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=ian.clarke@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox