From: Simon Richter <Simon.Richter@hogyros•de>
To: Cedric Sodhi <manday@openmail•cc>, git@vger•kernel.org
Subject: Re: Git for structured data
Date: Sun, 7 Dec 2025 14:26:46 +0900 [thread overview]
Message-ID: <2ae0a2d5-e909-4c51-9459-83f5c6950d51@hogyros.de> (raw)
In-Reply-To: <aTMNdQ_NHTVPtwG8@air>
[-- Attachment #1.1: Type: text/plain, Size: 2611 bytes --]
Hi,
On 12/6/25 01:51, Cedric Sodhi wrote:
> Why can't we have structured, version controlled data?
You can version control inside a relational database, by adding valid
time columns with a range-between-timestamps type and a constraint to
disallow overlaps. There are good indexing techniques, the first thing
that springs to mind is [1], but I'm fairly sure there are others, and a
modern RDBMS should provide constraints on range types.
A valid time column can encode either "time at which the data is valid",
or "time at which the data was current in the database", with two
columns, you can encode both at the same time.
If you hide the "data is current within" column behind a view and
automatically update it, this creates the historical log of when an
entry was updated.
Tracking arbitrary data in git is, of course, also possible, but
requires diff/merge tools adequate for the data. The built-in tools are
adequate for the main use case, text files that usually change on a
line-by-line basis and are seldom reorganized as a whole, so we can
pretend they are one-dimensional.
In KiCad, the files we generate describe a three-dimensional structure.
No matter how we normalize the file contents, elements can only be moved
on one axis without requiring us to move them to a different position in
the file.
So if I sort by z,y,x, then moving an object to a different z coordinate
likely results in "deletion" of the old object at the existing place,
and "creation" of a new object at a different place in the file, the
one-dimensional diff algorithm is unable to create a minimal diff here
that shows that only the z coordinate changed.
Not sorting (i.e. leaving elements in creation order) means that
deleting and recreating an object with the same parameters causes it to
move within the file.
The solution is to treat the serialized representation as just that, a
serialization, and not try to interpret order in any meaningful way, but
this requires dedicated diff/patch tools and heuristics that guess
whether deleting and creating similar objects constitutes a move or if
the objects are unrelated, same as git does in its move detection.
I think that diff/merge on relational data is more difficult than
expressing history inside the relational tables. For other data
structures, this may be different, and git might be a viable storage
method for history -- but in any case it requires the effort to build an
appropriate plug-in.
Simon
[1] https://link.springer.com/chapter/10.1007/BFb0054512
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2025-12-07 5:27 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-05 16:51 Git for structured data Cedric Sodhi
2025-12-06 16:27 ` René Scharfe
2025-12-06 18:47 ` Cedric Sodhi
2025-12-06 21:02 ` Christian Couder
2025-12-07 5:26 ` Simon Richter [this message]
2025-12-07 17:23 ` Cedric Sodhi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2ae0a2d5-e909-4c51-9459-83f5c6950d51@hogyros.de \
--to=simon.richter@hogyros$(echo .)de \
--cc=git@vger$(echo .)kernel.org \
--cc=manday@openmail$(echo .)cc \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox