GitPlumbingin Action

2022-06-02

LANG    LIU

What‘s git

Questions Haunted in Mind

  • 什么是 staging area

  • 文件状态释意:tracked & untracked modified & unmodified

  • 切换分支为什么很快

  • git reflog 工作原理,如果 reflog 数据也没了呢

  • git 每次 commit 存储的是增量还是全量数据

Pro Git

Again, What's Git

  • 白话:Git is a free and open source distributed version control system designed to
                    handle everything from small to very large projects with speed and efficiency

  • 黑话:Git is a content-addressable filesystem

Subcommands

  • Porcelain commands: add, commit, checkout, branch ...

  • a full user-friendly VCS

  • Plumbing commands: cat-file, hash-object, update-index, write-tree, commit-tree ...

  • a toolkit for a version control system

  • designed to be chained together UNIX-style or called from scripts

Today's work

echo "test content 1" > test1.txt
echo "test content 2" > test2.txt
git add .
git commit -m "first commit"

echo "test content 1 modified" > test1.txt
git add .
git commit -m "second commit"
cheetsheet 1 - git data type
Seq Type How to create
1 blob git hash-object
2 tree git update-index
git write-tree
3 commit git commit-tree
cheetsheet 2 - git data structure
  • data storage: find .git/objects -type f

  • content = compress(header(type+contentLen) + raw content)

  • file name = sha1(content)

cheetsheet 3 - git command mapping
Seq Porcelain Plumbing
1 git add git hash-object
git update-index
2 git commit git write-tree
git commit-tree
cheetsheet 4 - other commands
  • show data type: git cat-file -t hashxx

  • show data content: git cat-file -p hashxx

implement with plumbing commands

echo "test content 1" > test1.txt
echo "test content 2" > test2.txt
git hash-object -w test1.txt # hash1 blob
git hash-object -w test2.txt # hash2 blob
git update-index --add --cacheinfo 100644 hash1 test1.txt
git update-index --add --cacheinfo 100644 hash2 test2.txt
git write-tree # hash3 tree
echo 'First commit' | git commit-tree hash3 # hash4 commit

echo "test content 1 modified" > test1.txt
git hash-object -w test1.txt # hash5 blob
git update-index --add --cacheinfo 100644 hash5 test1.txt
git write-tree # hash6 tree
echo 'Second commit' | git commit-tree hash6 -p hash4 # hash7 commit

git log hash7
# to make it more convinient
git update-ref refs/heads/master hash7
git update-ref refs/heads/test-branch hash7

Back to the Questions

staging area
  • 白话:staging area

  • 黑话:.git/index

tracked & untracked
  • 白话:untracked means a file not exist in the previous commit and of course not staged

  • 黑话:untracked 是指该文件还未被执行 hash-object/update-index,即没有被添加到 .git/index

modified & unmodified
  • 白话:modified means a tracked file has been modified in the working directory but not yet staged

  • 黑话: modified 是指该文件被改动后还未被执行 hash-object/update-index,即没有被添加到 .git/index

切换分支为什么很快

因为分支的存储和文件的存储是完全分离的,git 追踪的文件内容本身存储与 `.git/objects` 文件夹下,而对分支的操作是对 `.git/refs` 的操作。新建一个分支仅仅是创建一个包含 commit 类型的文件的 SHA1 值(40个字符长度)的文件,如:

➜  test git:(master) cat .git/refs/heads/test-branch
058cdaf0b58a13b5945548da3054087e9f9b265c

git reflog 原理

  • .git/logs: .git/HEAD 的时间机器

  • command: git reflog or git log -g

如果 .git/log 被删除了
时光机器还能工作吗
可以!
  • 使用 git fsck --full 命令

  • 寻找那些散落人间的未被引用的 git 文件

  • 其中就有以前提交的 commit 文件

增量存储 or 全量存储
  • 初期是全量存储

  • git hash-object 生成的 blob 文件中全量存储文件内容

  • git gc 后是增量存储

  • 从记录全量内容的 blob 文件,转为记录不同 commit 间 delta 的 pack 文件

  • git verify-pack -v .git/objects/pack/xxxxx

  • git 也会自动触发 gc 行为,触发条件未知,可查看 .git/objects/pack/

Have fun

The END