* clean/smudge filters for pdf files @ 2008-10-23 19:44 Leo Razoumov 2008-10-23 21:32 ` Pierre Habouzit 0 siblings, 1 reply; 5+ messages in thread From: Leo Razoumov @ 2008-10-23 19:44 UTC (permalink / raw) To: git I am trying to improve storage efficiency for PDF files in a git repo. Following earlier discussions in this list I am trying to set up proper clean/smudge filters. What follows is my current setup # in ~/.gitconfig [filter "pdf"] clean = "pdftk - output - uncompress" smudge = "pdftk - output - compress" # in .gitattributes *.pdf filter=pdf Unfortunately, it seems as though that pdftk uncompress followed by pdftk compress do not leave the file invariant. I tried several uncompress+compress iterations and the file still keep changing (the size though stays the same). Is there any other alternative way to store PDF files in git repo more efficiently? Any alternative to pdftk on Linux? --Leo-- ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: clean/smudge filters for pdf files 2008-10-23 19:44 clean/smudge filters for pdf files Leo Razoumov @ 2008-10-23 21:32 ` Pierre Habouzit 2008-10-24 1:40 ` Leo Razoumov 0 siblings, 1 reply; 5+ messages in thread From: Pierre Habouzit @ 2008-10-23 21:32 UTC (permalink / raw) To: Leo Razoumov; +Cc: git [-- Attachment #1: Type: text/plain, Size: 1182 bytes --] On Thu, Oct 23, 2008 at 07:44:39PM +0000, Leo Razoumov wrote: > I am trying to improve storage efficiency for PDF files in a git repo. > Following earlier discussions in this list I am trying to set up > proper clean/smudge filters. What follows is my current setup > > # in ~/.gitconfig > [filter "pdf"] > clean = "pdftk - output - uncompress" > smudge = "pdftk - output - compress" > > # in .gitattributes > *.pdf filter=pdf > > Unfortunately, it seems as though that pdftk uncompress followed by > pdftk compress do not leave the file invariant. I tried several > uncompress+compress iterations and the file still keep changing (the > size though stays the same). > Is there any other alternative way to store PDF files in git repo more > efficiently? > Any alternative to pdftk on Linux? actually it uses some kind of zlib algorithm so that's pretty normal you don't have the same result with a packer. Maybe one could write a tool like pristine-tar for that purpose. -- ·O· Pierre Habouzit ··O madcoder@debian•org OOO http://www.madism.org [-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: clean/smudge filters for pdf files 2008-10-23 21:32 ` Pierre Habouzit @ 2008-10-24 1:40 ` Leo Razoumov 2008-10-24 8:10 ` Michael J Gruber 2008-10-24 8:44 ` Michael J Gruber 0 siblings, 2 replies; 5+ messages in thread From: Leo Razoumov @ 2008-10-24 1:40 UTC (permalink / raw) To: Pierre Habouzit; +Cc: git On 10/23/08, Pierre Habouzit <madcoder@debian•org> wrote: > On Thu, Oct 23, 2008 at 07:44:39PM +0000, Leo Razoumov wrote: > > I am trying to improve storage efficiency for PDF files in a git repo. > > Following earlier discussions in this list I am trying to set up > > proper clean/smudge filters. What follows is my current setup > > > > # in ~/.gitconfig > > [filter "pdf"] > > clean = "pdftk - output - uncompress" > > smudge = "pdftk - output - compress" > > > > # in .gitattributes > > *.pdf filter=pdf > > > > Unfortunately, it seems as though that pdftk uncompress followed by > > pdftk compress do not leave the file invariant. I tried several > > uncompress+compress iterations and the file still keep changing (the > > size though stays the same). > > Is there any other alternative way to store PDF files in git repo more > > efficiently? > > Any alternative to pdftk on Linux? > > > actually it uses some kind of zlib algorithm so that's pretty normal you > don't have the same result with a packer. Maybe one could write a tool > like pristine-tar for that purpose. > With zlib you get the same deterministic result as long as you use the same zlib packer and unpacker. With pdftk compress/uncompress seem not to form a bijection pair. This issue was briefly discussed on this list back in April 2008 but no resolution emerged. --Leo-- ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: clean/smudge filters for pdf files 2008-10-24 1:40 ` Leo Razoumov @ 2008-10-24 8:10 ` Michael J Gruber 2008-10-24 8:44 ` Michael J Gruber 1 sibling, 0 replies; 5+ messages in thread From: Michael J Gruber @ 2008-10-24 8:10 UTC (permalink / raw) To: SLONIK.AZ; +Cc: Pierre Habouzit, git Leo Razoumov venit, vidit, dixit 24.10.2008 03:40: > On 10/23/08, Pierre Habouzit <madcoder@debian•org> wrote: >> On Thu, Oct 23, 2008 at 07:44:39PM +0000, Leo Razoumov wrote: >> > I am trying to improve storage efficiency for PDF files in a git repo. >> > Following earlier discussions in this list I am trying to set up >> > proper clean/smudge filters. What follows is my current setup >> > >> > # in ~/.gitconfig >> > [filter "pdf"] >> > clean = "pdftk - output - uncompress" >> > smudge = "pdftk - output - compress" >> > >> > # in .gitattributes >> > *.pdf filter=pdf >> > >> > Unfortunately, it seems as though that pdftk uncompress followed by >> > pdftk compress do not leave the file invariant. I tried several >> > uncompress+compress iterations and the file still keep changing (the >> > size though stays the same). >> > Is there any other alternative way to store PDF files in git repo more >> > efficiently? >> > Any alternative to pdftk on Linux? >> >> >> actually it uses some kind of zlib algorithm so that's pretty normal you >> don't have the same result with a packer. Maybe one could write a tool >> like pristine-tar for that purpose. >> > > With zlib you get the same deterministic result as long as you use the > same zlib packer and unpacker. With pdftk compress/uncompress seem not > to form a bijection pair. This issue was briefly discussed on this > list back in April 2008 but no resolution emerged. For a different file format I use the pair "gzip -c, gunzip -c" without any problems, so zlib is not a problem. I do see the effect that checkouts on different machines may have different compressed files (same gzip version), but this is a non-issue. Your experience with pdftk confirms mine. It shuffles things around becauses it parses the files into objects and then writes them out again in possibly different order. This is no problem for pdf because it uses "pointers" (it's a bijection up to reordering), but it's a weird design, and complicates things for us. I'm still looking for something viable, I'll let list know when I've found something... Michael ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: clean/smudge filters for pdf files 2008-10-24 1:40 ` Leo Razoumov 2008-10-24 8:10 ` Michael J Gruber @ 2008-10-24 8:44 ` Michael J Gruber 1 sibling, 0 replies; 5+ messages in thread From: Michael J Gruber @ 2008-10-24 8:44 UTC (permalink / raw) To: SLONIK.AZ; +Cc: Pierre Habouzit, git Little addition to my previous reply: Multivalent apparently almost get's there. After 2 iterations most of the uncompressed file is stable, except for some binary blob at the end. Alas, it's Java and not even completely open source. Michael ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-10-24 8:45 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-10-23 19:44 clean/smudge filters for pdf files Leo Razoumov 2008-10-23 21:32 ` Pierre Habouzit 2008-10-24 1:40 ` Leo Razoumov 2008-10-24 8:10 ` Michael J Gruber 2008-10-24 8:44 ` Michael J Gruber
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox