Robin O'Connell

   

Git in Rust

wyag (Write Yourself a Git) is a simplified clone of Git written in Rust. I wrote it to deepen my understanding of Git and to solidify and expand my knowledge of Rust after working through the Rust book. I started out following a Python-based guide written by Thibault Polge (hence the name) but diverged considerably along the way—not least because Polge's guide is unfinished.

See below for a detailed account of wyag's features, limitations, and test methodology as well as my reflection on how the codebase could be improved.

Tools & Technologies

Languages

  • Rust
  • Python

Crates

  • CLI: clap
  • Error handling: anyhow, thiserror
  • Testing: assert_fs, dir-diff, predicates, sevenz-rust
  • Utility: base16ct, byteorder, flate2, itertools, ordered-multimap, path-absolutize, regex, rust-ini, sha1

Features

The core functionality of the following Git commands is implemented in wyag:

This subset of commands is sufficient for a single-branch workflow. branch and switch allow the creation and use of additional branches, but the presently unimplemented merge command is needed to take full advantage of them.

Limitations

All git commands not listed in the previous section are unavailable. Notably, merge, rebase, revert, and reset have not been implemented. Furthermore, most commands only support a subset of the options available in git.

While the checkout command is not implemented, switch and restore cover the majority of its use cases. In fact, these commands were created with the intent of splitting up the overloaded checkout command: see commit f496b06 in the Git repo.

Additionally:

Tests

Unit tests are present in many modules where appropriate, but because Git primarily operates on the file system, the testing strategy relies heavily on integration tests.

Snapshots

In addition to the usual obstacles involved when testing code that interacts with the file system, there is the added dimension that file timestamps (e.g. modification time) are meaningful (in particular, they are stored in the index file). To overcome this, the testing apparatus is based around "snapshots" stored in .7z archives, which are able to save and restore the timestamps associated with their contents. These archives are stored in the /snapshots directory.

Most integration tests follow this procedure:

  1. Extract a "before" snapshot to a temporary directory.
  2. Execute a command on the temp directory.
  3. Extract an "after" snapshot and compare it to the temp directory.

Recipes

To generate and vet the correctness of these snapshots, a Python script /scripts/snapshots.py has been provided. Each snapshot is based on a "recipe" located in the /scripts/recipes directory. A recipe is a Python script that supplies the steps to generate the "before" snapshot (via the setup function), the wyag command(s) being tested (run_wyag function), and the equivalent Git commands (run_git function) which serve as the ground truth.

When the recipe is executed with the snapshots.py generate command, two identical copies of the "before" snapshot are created. The wyag commands are executed on one copy and the git commands are executed on the other. Then, a diff of the two directories and the contents of their index files is produced. The user is asked to confirm that any differences between the two are acceptable. (Differences often arise from unimplemented features rather than errors.) If accepted, two archives are created in the /snapshots directory: before_<recipe name>.7z and after_<recipe name>.7z. By convention, a recipe should use the exact name of the test that it supports.

To use the script, git and 7z must be present on your $PATH. To learn more, run this command from the project's root directory:

python scripts/snapshots.py --help

Coverage

Integration tests have been written for almost all commands that modify the file system, including:

Most other commands simply read from the file system and output information to the console. Notably, however, the switch and restore commands (which do modify the file system) lack appropriate coverage due to challenges with timestamps that will require a rework of the testing apparatus.

Future Work

Due to the educational nature of this project, I have limited plans to continue its development. However, I plan to implement the most critical missing feature, the merge command, which is the final piece required to support a multi-branch workflow.

If I were to continue development beyond that, I would: