--- title: Git Training tags: - git - long-read - training --- These are git training materials for people who would like to understand how git works rather than try to memorize all of its commands without knowing what they do. It is divided into 45 chapters. You can read them back-to back, or pick out a specific topic you are interested in. This training does not cover everything there is to know about git. Instead, it aims to make you familiar with git in a way that will give you the confidence to explore further on your own. :::note This is the **long read** format (20k words). This training is also available as a 1¾-hour **[audiobook][audio]** or **[video series][video]**. ::: [text]: /training/git/ [audio]: /training/git/audio/ [video]: /training/git/video/ ## Chapter 1: Welcome Hello everyone and welcome to this **git** training. My name is joost, and today I will explain git to you. If that already sounds scary, then don't worry. My goal today is to explain git in an approachable way that requires no prior knowledge of, or experience with git. By the end of this training, you should have a clear understanding of what git is, what it does, and how it does it. If that sounds like the kind of thing you would like to learn, then you've come to the right place. Let's dive right in and learn about git. ## Chapter 2: What is git? To understand what git is, let us start by explaining the problem it is trying to solve. The problem is that people don't always get it right from the first time. The content of this git training didn't spring from my mind fully formed. It started out as a list of bullet points of things I wanted to cover. Then it was gradually turned into a rough draft over numerous edits and rewrites. At some point, I also started to involve other people. I asked them to proofread or provide input, which led to more changes. Somehow, we need to keep track of all these changes. And ensure that at any time we can go back to an earlier version. Or allow multiple people to work on the same text without overwriting each other's changes. If you've ever seen a folder with a file listing like this, you are already familiar with this problem: - `git-draft.md` - `git-draft.02.md` - `git-03.md` - `git-03_comments-by-serge.md` - `git-good.md` - `git-good_final.md` This is not an efficient way to keep track of different versions of a single file. Let alone if we may have many different files, with different collaborators working on them. What we need is some sort of **system** to **control** all these different **versions** somehow. Thankfully, such systems exist. And software that is created specifically for this task are called **version control systems**. Git is such a version control system, but there are others too. So let's meet a few of them. ## Chapter 3: Version control systems One of the first version control systems (or VCS) was `sccs`, which stands for _Source Code Control System_. From its name, we can learn that the origins of version control systems can be traced back to software developers. They were the first group of people who not only faced this problem of working together on a bunch of files, but also had the means to come up with a way to make it more efficient. SCCS was first released in 1973 which most likely means that version control systems have been around for longer than you've been alive. Almost 10 years later, in 1982, RCS was released. It stands for _Revision Control System_ and to this day, it is still maintained. Where RCS was intended to be used locally on a computer system, new systems emerged that relied on a centralized repository to allow people to collaborate from different systems. The two most significant members of this second generation of the version control systems were CVS (Concurrent Versions System) and Subversion. For most people, these systems were good enough, and the market for version control systems stalled somewhat, as it was considered a solved problem. Except, not all software projects are the same size. Some people were collaborating on so many files with so many different people that the second generation of tools was not good enough for them. One such project was the Linux kernel. Started in 1991 as a hobby project by a Finnish student named Linus Torvalds, by 2002 the Linux kernel underpinned a multibillion Linux market spearheaded by vendors such as Redhat, Suse, and early adopters like IBM. The Linux kernel itself though was and is an open source project. And while more and more people worked on the kernel professionally, they were spread out not only geographically but also throughout many different companies. Keeping track of all the changes in the kernel was causing friction. So in 2002, Linus Torvalds made a decision that would send shockwaves through the open source world. He unilaterally announced that the Linux kernel would switch to Bitkeeper as its version control system. Bitkeeper used a more innovative approach to version control, and did not rely on a central repository. The announcement was controversial because Bitkeeper was a closed-source product, that was only available under a commercial license. And while BitMover -- the company behind the Bitkeeper product -- waived the license fee for Linux kernel developers, many kernel developers objected out of principle to having to use a closed-source product to contribute to the open source Linux kernel. This went on for a while until in 2005 BitMover -- the company -- grew increasingly worried that the kernel developers would reverse engineer their technology. So they imposed further restrictions which made it impossible for kernel developers to use their product. Faced with this dilemma, and in a move that would forever cement his reputation as an exceptionally gifted software engineer, Linus Torvalds decided to take matters into his own hands. He sat down and over the couple of a couple of days wrote his own version control system: git. A few years later, every major open source project had migrated from subversion to git, and sites like GitHub and GitLab sprung up to provide centralized git hosting. Today, git is a household name among developers, as well as the de facto standard version control system on the planet. It's a remarkable success story with many parallels to Linux itself. Both are not only free for people to use, but their excellent technical foundations mean they have taken the world by storm. ## Chapter 4: Git won't fall out of the sky Knowing the history of git, and its origins as the version control system for the Linux kernel, goes a long way to explain one of its more glaring shortcomings: Why does it seem so damn hard to use? The answer is, of course, that Linux kernel developers are rather comfortable with all this complexity. They know exactly what git does under the hood, and as a result all its numerous commands make sense to them. People who've mastered git are like airline pilots. To the laymen, all those dials and buttons in the cockpit seem like an impenetrable wall of confusion. Without any insight into how an airplane works and what keeps it in the air, trying to learn all these buttons is going to be frustrating at best. If you'd like to learn how to fly, the smart way to go about it is to first understand what keeps a plane in the air. Likewise, if we want to learn git, the smart way to do it, is to first understand how it keeps track of changes. So let's start there. And keep in mind that unlike airplanes, git won't fall out of the sky when we make a mistake. ## Chapter 5: Directed Acyclic Graph Git is built on the combination of two concepts, and you're probably already somewhat familiar with both of them. The first concept is the so-called DAG. Which stands for _Directed Acyclic Graph_. A _graph_ in computer science and mathematics alike is a structure in which we can store not only information, but also relationships between that information. You may have heard of Facebook's _social graph_, which holds information about Facebook's users, but also information about the relationships between those users. Alice, Bob, Tony, Jim, and Sandra are all Facebook users. In addition, Alice is a friend of Bob. Bob's father is Tony. Tony works at McDonalds. Jim and Sandra also work at McDonalds. We call this sort of data structure a _graph_. The users themselves are the _nodes_ of the graph. Each node holds the data for one user. The relationships between users are the _edges_ of the graph. If we visualize this structure, the users or nodes of the graph would be represented by points or little circles. The relationships between the user or edges of the graph would be lines that we draw between the users to show how they are connected to each other. ![A graph](graph.png 'Example of a graph') Git stores its data in a graph structure, but not one like Facebook's social graph where connections can go all over the place and in all directions. Instead, it uses a Directed Acyclic Graph or DAG which imposes two additional constraints on the graph. **Directed** means that relationships or edges are one-way only. In Facebook's graph, Alice is a friend of Bob, and Bob can also be a friend of Alice. This makes the relationship or edge between them bidirectional. In other words, it's like a two-way street. In a directed graph like git uses, this is not allowed. The edges are only ever going in one direction. Like a river. **Acyclic** means that there can be no loops in the graph. In Facebook's graph, Alice is a friend of Bob. If Bob is a friend of Jim and Jim in turn is a friend of Alice, this creates a loop. Like a roundabout. In an _acyclic graph_ like git uses, this is not allowed. You can create as many relationships of edges you want. But when they re-converge, they can only do so downstream from the direction of the graph. Like a river. ![A DAG](dag.png 'Example of a graph') So to summarize, a DAG or Directed Acyclic Graph is a graph where edges go in one direction only (directed), and no loops are allowed (acyclic). Like a river, a DAG can split into branches. Each of those branches can further split, or they can join another branch further downstream. But no matter how intricate our river delta gets, we can never go backwards. We can never branch off and then somehow reconnect to a point before the one where we branched off from and form a loop. Water cannot run uphill. If you can remember that, you know what a DAG is. And once you know what a DAG is, it's easier to think about your data in git. All your changes are right there, each version of your work represented by a node in the graph. The question that remains is, how does git keep track of the edges, or the relationship between the nodes? So let's look at that in the next chapter. ## Chapter 6: Checksums In the previous chapter, I mentioned that git is built on the combination of two concepts. The first was the Directed Acyclic Graph or DAG. The second are checksums. A checksum is a way to reduce an arbitrary amount of data to a smaller amount of data that can still uniquely identify it. If that sounds overly complicated, don't despair because you are already familiar with a perfect metaphor: The fingerprint. The data stored in a fingerprint can never possibly contain all the data that makes you you. But that's not its purpose. Instead, your fingerprint behaves as a checksum. Which means that we only need to verify the fingerprint to know that it's you. In computer science these fingerprints or checksums are calculated by a type of cryptographic function that we call a hash function. For this reason, checksums are often referred to as hashes. You may have already heard of some of the more well known hashing methods, such as MD5 or SHA1. The latter -- SHA1 -- is the hashing method git uses under the hood. Git relies extensively on these checksums. So much so that each commit object -- we'll talk about what exactly a commit or commit object is later, for now let's just agree that the nodes in our graph are called commits in git parlor -- so each commit has a checksum and this checksum is used as the commit ID. It uniquely identifies the commit. Because of this checksum, you can never ever have two commits with the same ID in git. If you had two commits with the same ID it means they are identical in every way. And so they are not two commits, but just the same commit. So how does it work exactly? Well, each time we _commit_ data to git and create a commit object in the process, git will make a checksum of the _commit object_ which will end up being a node in our graph. The following data is included in the commit object, which means it is used to calculate the checksum: - The data itself - The author of the commit - The date of commit - The log message that goes with the commit - The checksum of the previous commit So with the exception of the very first commit -- which is a bit like the source of our river and the only node in our graph that does not have a direct ancestor -- each commit has a reference to the commit it is based on. This reference forms the relationship in our graph. It says, this commit right here follows that commit over there with this ID. Including the ID or checksum of the parent commit provides strong protection against data tampering. If any commit object in our DAG is changed, it's checksum and thus its ID will change. And all commits that stem from it will have a parent commit ID that no longer matches. If we were to go in and change that, it will in turn change the ID of that commit, and then the next one would break and so on and so forth. In other words, all of these commits are chained together with a cryptographic checksum that makes it impossible to tamper with them. If at this point, a light goes off in your brain and you think _hey, haven't I heard this before somewhere?_, then yes you most likely have heard about this sort of immutable ledger because this is the exact same technology that underpins the blockchain. At this point, I feel it's worth pointing out -- for the crypto-bros out there -- that Satoshi Nakamoto's original bitcoin paper was published at the end of October 2008. As we learned in chapter 2, Linus Torvalds wrote git in 2005, more than 3 years prior to that. Which explains why some people think Linus is Satoshi. But he's not. Enough about blockchain. While it's a useful crutch to explain how different commits are linked together in git, it would be a distraction to talk about it any further. Especially since we're finally getting to the good stuff: Let's start using git in the next chapter. ## Chapter 7: Installing git Before we can gain some hands-on experience, we should make sure git is installed on our system. If you're on Windows, you should [download the git release for Windows from the git website](https://git-scm.com/download/win) which is git-scm.com. If you use MacOS, you can [download the git release for macOS](https://git-scm.com/download/mac) from the same website. Or, you can also install the Xcode command-line tools from Apple, as they include git. If you run Linux, chances are git is already installed on your system. Or if not, you can install it with your system's package manager. Be it through `apt install git` on Debian-based systems such as Ubuntu. Or through `yum install git` on Redhat-based systems such as Fedora. In any case, installing git should be quick and painless. To verify it worked, open a terminal window and type `git`. If you get a bunch of info, we're good to go. ## Chapter 8: git init The first thing you'll notice as we start using git, is that it does not require any sort of central components or server. Those of you who have trouble distinguishing between git itself and the popular git hosting services such as GitHub or GitLab, should take a moment to appreciate this. You don't need anything to use git. You can use it on your own, without collaborating with anyone. All you need to do is open up a terminal window. And in the directory or folder where you want to keep track of your changes or versions, you type: ```bash git init Initialized empty Git repository in /Users/joost/git-training/.git/ ``` Congratulations, you have just created a **git repository**. An empty repository for now, but a git repository nonetheless. The English dictionary tells us that a repository is _a place where things are stored_. In git parlor, we use the word _repository_ to refer to the top-level folder where git is doing its magic of keeping track of our files. In it, we can create as many files or subfolders as we like, and they are all inside our repository. If however, we go up one level, we are outside of our repository. ## Chapter 9: The .git folder In the top-level folder of our repository, git has create a `.git` subfolder. This folder is where git will write all of the data in our graph. It's where it will store metadata, and anything else that is required for git to do what it does. There is no database, there is no server, it's all just a bunch of files in this mysterious `.git` folder. When working with git, you **never** venture in this folder. Doing so may not only irreparably mess up your repository. It's also not required. But, out of curiosity, let's have a look anyway. ```bash ls -1 .git HEAD config description hooks info objects refs ``` If you open this folder, you'll see a bunch of files and folders, the most important of them are: - The `HEAD` file (all uppercase) keeps a reference to where we are right now. If we think of our graph as a river, it is the equivalent of a **You are here** marker on a map of that river. - The `objects` folder is where git will store our commit objects. It contains two subfolders, `objects/info` and `objects/pack`. Both of them are empty right now, but that will change soon enough. - The `refs` folder is where git will store info about the various ways in which we decided to branch and split our river of data or graph. It also contains two subfolders, `refs/heads` and `refs/tags`. Both of which are also empty for now. Let's not worry too much about this structure. Once again, you rarely if ever need to venture into the `.git` folder. However, seeing how its content changes when we run various commands can help us understand what git is doing behind the scenes. So we will refer to these files and folders from time to time. ## Chapter 10: git status For now, let's see what our current status is. To do so, type `git status`. ```bash git status On branch main No commits yet nothing to commit (create/copy files and use "git add" to track) ``` When we enter this command, git will tell us what the current status is. For now git tells us that: - We are on branch main. Each time we split our river of git data, this creates a branch. The main branch is the one we start from. The source of our river so to speak. - There are no commits yet - There is nothing to commit. But we could create files and use `git add` to track them Ok, not much going on here, but git hints us that we should use `git add` to start tracking files. So let's follow its advice in the next chapter. ## Chapter 11: git add The `git add` command is the only command that can add our data to git. Which is why it's called `git add`. Under the hood, this data will be stored `.git` folder. ready to be added to the DAG later. To add data, we first need some data. So let's create a file named `hello.md` and add a line of text in it that says _Hello git_: ```md title="hello.md" Hello git ``` Now, if we run `git status` again, the output will be different. ```bash git status On branch main No commits yet Untracked files: (use "git add ..." to include in what will be committed) hello.md nothing added to commit but untracked files present (use "git add" to track) ``` Git will still tell us that we are on branch main and that there are no commits. But this time it will tell us that there are _untracked files_. Specifically _hello.md_. So git watches our repository and it knows there’s a file there we're not keeping track of. It also hints us once again that we can start tracking this file with the `git add` command. So, let's do as it says. In our terminal, we type `git add hello.md`. ```bash git add hello.md ``` Ok, that was a bit underwhelming because nothing happened. Git didn't say anything and we don't even know whether it did anything. Let's run `git status` again to see what's changed. ```bash git status On branch main No commits yet Changes to be committed: (use "git rm --cached ..." to unstage) new file: hello.md ``` Hey, this is new. Git now tells us that there are changes to be committed. It knows that there is a new file named `hello.md`. It also tells us what command to run to _unstage_ this file, which is some nice foreshadowing for the next chapter. But before we get to that, let's take another look at the `.git` folder where git keeps our data. ```bash ls .git/objects 0d info pack ls .git/objects/0d ec2239efc0bbfabe4078f5357705ca93b5475e.git/objects/0d/ec2239efc0bbfabe4078f5357705ca93b5475e file .git/objects/0d/ec2239efc0bbfabe40 .git/objects/0d/ec2239efc0bbfabe4078f5357705ca93b5475e: zlib compressed data ``` If you look in the `.git/objects` folder you should see that it has changed. A new folder and file appeared which holds a bunch of compressed data. The reason I'm asking you to go digging through these files is because this is an important thing that many people, even those rather familiar with git, don't realize. And that is that **git add writes data**. That's right. The only time you are adding your data to git is with the `git add` command. All git's other commands deal with metadata, or how to structure the graph and create relationships, branches, and so on. But you putting your data in git is always via the `git add` command. Now, let's see where we added this data, shall we? Because -- spoiler alert -- it was not added to the graph. ## Chapter 12: The staging area When we think about how data is stored in git, there are essentially 3 things that come into play. ![Git layers](git-layers.png) At the lowest level we have **the file system**. At the end of the day, git is just a bunch of files on disk, and all it does is write to those files. When git refers to our own data on disk, not its internal metadata, but the files and folders we are looking to keep track of, git will refer to this as the **working directory**. So when you see that, just think _oh right, the files on my disk right now_. At the top level we have the **index**, which is just another word for the DAG or graph in which git keeps track of our data. In between these two sits the **staging area**. This is where git stores and prepares data before adding it to our graph. As we saw in the previous chapter, each time we use the `git add` command, git writes our data. Specifically, it takes the data on disk and copies it to the staging area. It will remain there until we _commit_ it. Just like `git add` is the only way to move data from disk to the staging area, `git commit` is the only way to move data in the staging area permanently onto the index, which is the DAG, or git's graph. Understanding how data moves between these layers is crucial to understanding git. For example, what if we add a file to the staging area with `git add`. And then after adding the file, we make a change to it. What will happen? If you can guess, great. If not, let's try it out. Let's run git status again before we do anything to make it easy to compare. ```bash git status On branch main No commits yet Changes to be committed: (use "git rm --cached ..." to unstage) new file: hello.md ``` Git tells us that we're on branch main, that there are no commits yet, and that there are changes to be committed. Specifically a new file named `hello.md`. Now let's open this `hello.md` file and add some more text to it. Let's change the line that says `Hello git` to `Hello git. How are you?`. After saving the file, we run `git status` again. ```bash On branch main No commits yet Changes to be committed: (use "git rm --cached ..." to unstage) new file: hello.md Changes not staged for commit: (use "git add ..." to update what will be committed) (use "git restore ..." to discard changes in working directory) modified: hello.md ``` This time, git still tells us that we're on branch main, and that there's no commits yet. It also reminds us that the `hello.md` file is staged to be committed. But in addition, it now tells us that there are changes that are not staged for commit. Changes to the `hello.md` file. So, to recap. When we make changes to a file in our working directory, git will notice this. If we **add** this file with `git add`, git will copy our file to the staging area. If we then make further changes, git will notice that the file in our working directory has changed again, and will once again inform us about it. However, the version of the file that we added with `git add` earlier is still in the staging area. By now you should have learned how `git add` is how we tell git to write our data to the staging area. But that's only half of the work. To make sure our data is added to the DAG, we need to commit. Which is what we'll do in the next chapter. ## Chapter 13: git commit Now that we have covered the `git add` command, and know about the staging area, the next step on our learning path is the `git commit` command. The `git commit` command is how we tell git to take the data that is in our staging area, and add it to the DAG. To do so, git will create a commit object and add _labels_ to it. It will calculate a checksum and add it to the commit object in the dag to link it to its parent commit. Git will also move the `HEAD` label (remember, `HEAD` is the equivalent of a **your are here** marker) to point to our new commit. It will also move the `main` label to our new commit to indicate that this is the tip of the `main` branch. But don't take my word for it, let's try it out by typing `git commit`. When we do so, git will open an editor to allow us to write the commit message. We'll talk about writing good commit messages later, for now let's just write _My first commit_ and then save and close the file. ```bash git commit [main (root-commit) 4506faf] My first commit 1 file changed, 1 insertion(+) create mode 100644 hello.md ``` Git will show us some output, including the branch we are on (main) the first couple of characters of the commit ID or checksum, as well as some other info. If you pay close attention, you will see that right after the branch name it says **root-commit**. That's because the commit we just added to the DAG is special. It is -- and will forever be -- the only commit in our entire index (or DAG if you will) that does not have a parent commit. This root commit is the source of our river of data from which everything else will spring. ## Chapter 14: git log We've already learned that the commit ID is very important in git. Later -- when you are a git guru and will ask it to do advanced stuff -- you will often need to specify the commit ID. Git keeps a log of all commits, and you can ask it to show this log with the `git log` command. ```bash git log commit 4506fafad7b70ff2c44d7900d457f9a65133f7ed (HEAD -> main) Author: Joost De Cock Date: Tue Apr 4 15:32:20 2023 +0200 My first commit ``` When we run `git log` it will show us a list of all commits starting with our most recent commit, all the way back to the root commit. It will include the commit ID as well as the author, date, and log message. In other words, the log contains all the metadata. That's it. That's the entire chapter. There are more things `git log` can do but for now I just wanted to introduce the command, as we will be using it in the next chapter, when we talk about labels in git. ## Chapter 15: Labels in git Our DAG now consists of a single node. There is exactly one commit, and it is not related to any other commits. But it does have labels. Labels are how git keeps track of different branches, which is something we'll take a closer look at in a later chapter. For now, we have a single branch. It's called `main` which is the default branch in git. And while one single branch is not very exciting, it is all we need to understand how labels work. To see the labels git uses, let's ask git to show us the commit log with `git log`. ```bash git log commit 4506fafad7b70ff2c44d7900d457f9a65133f7ed (HEAD -> main) Author: Joost De Cock Date: Tue Apr 4 15:32:20 2023 +0200 My first commit ``` Apart from the metadata about the commits themselves, git will also show the various labels we are currently using. If you look at the most recent commit ID, you will see that it is followed by information between brackets. First it will say `HEAD` and then a little arrow pointing to `main`. Remember in chapter 8 where we went spelunking in the `.git` folder, we learned that `HEAD` is like a _you are here_ marker. In other words, git will put the `HEAD` label on whatever commit it considers to be where we are right now. So each commit we make will always become a child of whatever commit the `HEAD` label is on. While `git log` is certainly the user-friendly way to retrieve this information, we can also figure out where HEAD is pointing to by looking into the `.git` folder. If you look at the contents of the `.git/HEAD` file, you will see it holds a reference to `refs/heads/main`. ```bash cat .git/HEAD ref: refs/heads/main ``` If in turn you looks into the `.git/refs/heads/main` file, you will see that it holds the ID of our root commit. ```bash cat .git/refs/heads/main 4506fafad7b70ff2c44d7900d457f9a65133f7ed ``` In other words, HEAD points to refs/head/main, which points to our commit. So git knows that both the `HEAD` label and the `main` label are both on this root commit. Why this matters will become clearer when we talk about branching. For now, what you should know is that each branch has its own label, which should be on the last commit made on that branch. The `HEAD` label is special. In that it always points to the commit that will become the parent of the next commit we'll make. ## Chapter 16: git show The `git show` command will show us what exactly what is included in any given commit. We've already made our first commit, the so-called root commit, but perhaps it was Friday evening, we logged off, and now we're back on Monday morning and we can't exactly remember where we left things. The first thing to do in this scenario is to run `git status`. ```bash git status On branch main Changes not staged for commit: (use "git add ..." to update what will be committed) (use "git restore ..." to discard changes in working directory) modified: hello.md no changes added to commit (use "git add" and/or "git commit -a") ``` Git tells us we're on the main branch, and that we have changes that are not staged for commit. Perhaps you have a good memory and remember that we changed the contents of our file from `Hello git` to `Hello git. How are you?`. But if you're anything like me, you don't remember this. And so you'd like to see what the heck it was that you committed on Friday evening. If we want to know that, we can just ask git to show us what's actually in this commit. To do so, we use the `git show` command followed by the ID of the commit we want to see. We don't actually have to include the entire ID. All we need is the first 4 characters of the ID. If later we have plenty of commits and there are multiple commit IDs that start with these same 4 characters, git will show us a list of all matching commits and ask us to be more specific. But for now, with only one commit, 4 characters is plenty. ```bash git show 4506 commit 4506fafad7b70ff2c44d7900d457f9a65133f7ed (HEAD -> main) Author: Joost De Cock Date: Tue Apr 4 15:32:20 2023 +0200 My first commit diff --git a/hello.md b/hello.md new file mode 100644 index 0000000..0dec223 --- /dev/null +++ b/hello.md @@ -0,0 +1 @@ +Hello git ``` Just as with the `git log` command, git will show us all the metadata of the commit. But this time around, it will also show us a diff. In other words, what exactly this commit changed. And we can see that this was a new file and that its contents are `Hello git`. What we committed on Friday evening was what was in the staging area at that time. Which was our file with `Hello git` in it. And not the current version on disk which has `Hello git. How are you?` in it. If we want to add this updated version of the file to the repository, we should first add it to the staging area with `git add` and then add it to the DAG with `git commit`. ```bash git add hello.md git commit -m "My second commit" ``` When you run the `git commit` command, git will open an editor to let you enter the commit message. If you'd like, you can specify the commit message on the command line with the `-m` flag, and then git will just use that. If we now run `git status` again, git will tell us there are no changes. It will say something like _nothing to commit, working tree clean_. And when you hear _working tree_ you should just think _working directory_. Git is telling that the files that are in our working directory hold the exact same data as what is stored in the DAG. ```bash git status On branch main nothing to commit, working tree clean ``` If we run `git log` we now see two commits. ```bash git log commit 036776b2794a9ad3e21f8da83e6cfeca4d9fedb6 (HEAD -> main) Author: Joost De Cock Date: Tue Apr 4 16:22:48 2023 +0200 My second commit commit 4506fafad7b70ff2c44d7900d457f9a65133f7ed Author: Joost De Cock Date: Tue Apr 4 15:32:20 2023 +0200 My first commit ``` If we run `git show` with the first 4 characters of the second commit, we can see exactly what was changed. ```bash git show 0367 commit 036776b2794a9ad3e21f8da83e6cfeca4d9fedb6 (HEAD -> main) Author: Joost De Cock Date: Tue Apr 4 16:22:48 2023 +0200 My second commit diff --git a/hello.md b/hello.md index 0dec223..d6a72bb 100644 --- a/hello.md +++ b/hello.md @@ -1 +1 @@ -Hello git +Hello git. How are you? ``` You probably won't be using the `git show` command that often. However, it's good to know that once a commit is added to the DAG you can identify it with its ID and ask git to tell you exactly what happened in this commit with the `git show` command. ## Chapter 17: Branching in git Before we dive into branching in git, which is where things become really interesting, let's do a quick recap of the most important things we have learned so far: - We know that `git init` is how we initialize an empty repository - We know that `git add` is how we copy files from our file system to the staging area which is where git prepares them so they are ready to be committed - We know that `git commit` is how we take everything that's currently in the staging area and add it to the DAG - We know that git uses the `HEAD` label to keep track of where we are, and there's also a label for each branch Alright, so far so good. Now let's see how we can use what we've learned to understand what git does when we start creating additional branches. Note that we already have a branch. Everything needs to be on _some_ branch, so git starts us of with a default branch which is called `main`. On this default branch, we have made two commits so far. Before we look at how we can create a new branch, we should probably pause for a moment to make sure we understand why you would want to make a branch in the first place. Remember in chapter two where we were talking about why we need version control systems. Specifically this list of files: - `git-draft.md` - `git-draft.02.md` - `git-03.md` - `git-03_comments-by-serge.md` - `git-good.md` - `git-good_final.md` If we were to manage this in git instead, the first couple of drafts would probably just be additional commits on the same branch. But then there's this file with the `_comments-by-serge` suffix, which probably means that this was a colleague making changes to a file. Well, this would be a good candidate to go on a different branch. Because **branches in git are all about isolating your work**. If you are working on your own on something that has a relatively linear progression from initial idea to final outcome, you may only need one single branch. But if you are working on things that progress at different speeds or need to be kept apart, you will find that branches are going to be a life-saver. ![Branching in git](git-branches.png) As a practical example, imagine that you are maintaining a website. The production code, the one that is deployed on the web server, is in the `main` branch. Last week you've started working on a new feature: the website will now also have a dark mode. However, you were smart, so rather than do this in the main branch, you've created a so-called _feature branch_ for this, let's say you've named it `dark-mode`. Now your boss comes in and points out a small typo on the home page. It's not a big deal, but your boss is a bit of a grammar nazi, so they want you to drop what you're doing and fix it now. If you had been doing your dark mode work on the `main` branch, you would be in a real pickle right now. Because you would have mixed your new dark mode work with the production code, and so fixing the typo would have had to wait until you were ready with that, or you'd have to somehow undo the work you did so far, or at least find a way to disentangle those changes from what was there before. Don't let this happen to you. Embrace branching in git. Branches are not hard to understand, and we'll show you exactly how to make them in the next chapter. ## Chapter 18: git branch To work with branches in git, we use the `git branch` command. If we run it without any additional info, git will show us a list of current branches. The active branch will have an asterisk in front of it. ```bash git branch * main ``` We only have one branch for now, the `main` branch. If we wanted to create another branch, we could do so by specifying its name when running the `git branch` command. So if we would run `git branch example` it would create a new `example` branch. ```bash git branch example git branch example * main ``` If we run `git branch` again, we can see that the `example` branch was created. We also see that the current branch is still the `main` branch. In other words, the `git branch` command only creates the branch. It does nothing else. If we want to switch the active branch from `main` to our new `example` branch, there's a command for that too, and it's `git switch`. We'll use that in the next chapter, but first let's remove our example branch again. To do so, use the `-d` flag, for delete followed by the branch name. So the command to remove the `example` branch is `git branch -d example`. ```bash git branch -d example Deleted branch example (was 036776b). ``` Poof gone. Now let's look at some more efficient ways to not only create a branch, but also make it active. ## Chapter 19: git switch The `git switch` command switches between branches. In other words, it changes which branch is currently active. In git, the active branch is whatever the `HEAD` labels points to, so when we use the `git switch` command, git is typically just moving the `HEAD` label to a different branch. But, git switch can also create a new branch. For that, you should pass it the `-c` flag (for create) followed by the branch name. So if we run `git switch -c my-feature` git will not only create the `my-feature` branch, it will also make it active by moving the `HEAD` label to it. ```bash git switch -c my-feature Switched to a new branch 'my-feature' git branch main * my-feature ``` If you're curious, you should look at the contents of `.git/HEAD` and you'll find that it now contains a reference to `refs/heads/my-feature`. When creating a branch, all git has done for now is created a new label with the `my-feature` branch name, and added it to the `HEAD` commit. We can verify this with the `git log` command and we'll see that where previously `HEAD` pointed to the `main` branch. It now points to both the `main` and the `my-feature` branch. ```bash git log commit 036776b2794a9ad3e21f8da83e6cfeca4d9fedb6 (HEAD -> my-feature, main) Author: Joost De Cock Date: Tue Apr 4 16:22:48 2023 +0200 My second commit commit 4506fafad7b70ff2c44d7900d457f9a65133f7ed Author: Joost De Cock Date: Tue Apr 4 15:32:20 2023 +0200 My first commit ``` That's because `HEAD` sits on the commit that is the exact point where we decided to branch off. In other words, this commit now marks a point where our river splits in two and each branch can go its own way. To illustrate this point, let's add a new file called `feature.md` and add some data into it, let's add a line that says `This is a new feature`. When we run `git status` git will tell us that there's a new file that is not currently being tracked, and it suggests to use `git add` to track it. ```bash git status On branch my-feature Untracked files: (use "git add ..." to include in what will be committed) feature.md nothing added to commit but untracked files present (use "git add" to track) ``` Which is sweet of git, but we know that by now. So we add the file to the staging area with `git add feature.md` and then commit it with `git commit -m "Working on a new feature"`. ```bash git add feature.md git commit -m "Working on a new feature" [my-feature cf32fd5] Working on a new feature 1 file changed, 1 insertion(+) create mode 100644 feature.md ``` If we now check the commit log with `git log` we not only see our new commit, we also see that for the very first time, not all our labels are on the same commit. The `HEAD` and `my-feature` labels are on the last commit we just made. But the `main` label is still on the previous commit. ![A branch forms](git-branch1.png) If we add and commit another change -- say we that we update our file to read `Started working on a new feature` rather than just `This is a new feature` -- both the `HEAD` and `my-feature` labels would move to the new commit as this is now the new tip of the `my-feature` branch. To clarify why we made this change, we'll use `Manage expectations through better phrasing` as our commit message. Because why not. ```bash echo "Started working on a new feature." > feature.md git add feature.md git commit -m "Manage expectations through better phrasing" ``` If we run `git log` again, we will see that our latest commit is added and has both the `HEAD` and `my-feature` labels connected to it. The `main` label meanwhile is falling further behind. Our `my-feature` branch is now two commits _ahead_ of the `main` branch. ```bash git log commit 402793d176388b5d5da5f257eaa41b2eb4a19e54 (HEAD -> my-feature) Author: Joost De Cock Date: Wed Apr 5 08:44:20 2023 +0200 Manage expectations through better phrasing commit cf32fd562bb63b64361642a293d312b4d2449877 Author: Joost De Cock Date: Tue Apr 4 17:40:02 2023 +0200 Working on a new feature commit 036776b2794a9ad3e21f8da83e6cfeca4d9fedb6 (main) Author: Joost De Cock Date: Tue Apr 4 16:22:48 2023 +0200 My second commit commit 4506fafad7b70ff2c44d7900d457f9a65133f7ed Author: Joost De Cock Date: Tue Apr 4 15:32:20 2023 +0200 My first commit ``` ![A branch extends](git-branch2.png) ## Chapter 20: git checkout In the previous chapter, we used the `git switch` command to create a branch and _switch_ to it, or in other words, make it active by moving the `HEAD` label to the tip of this branch. We already mentioned that git only does a couple of things and the various commands are typically just ways to combine those different things. And the `git switch` command is a good example to illustrate this. As we've learned in chapter 18, we can use `git branch` to create a branch. However, we also learned that this does not make that branch _active_. In other words, it does not move the `HEAD` label to it. Which is why `git switch` is handy because it does that for us. But `git switch` is not special. All it does is combine git's basic operations in a way that saves us some typing. In the case of creating a branch and _switching_ to it, we can accomplish the same by executing 2 commands in a row. First, we run `git branch my-feature` to create the branch. Then we run `git checkout my-feature` to make the branch active. It is that second command, `git checkout` that we're going to talk about in this chapter because it's one of git's core functionalities that you should really understand. ![Getting data out of the DAG](git-layers.png) In chapter 12, when we learned about the staging area, we learned that `git add` adds things to the staging area, while `git commit` adds them to the DAG or index. But so far we have only learned how to _add_ data to git. The question of how to get it back out hasn't come up yet. The `git checkout` command reads data from the DAG or index and puts it on our filesystem. There is no staging area when we read from the DAG. Only when we write does the staging area come into play. So whenever we want to go the other way, and have our local file system replicate a particular commit in our DAG, we use the `git checkout` command. We're at a particularly good point to illustrate this because we're currently on the `my-feature` branch which is 2 commits ahead of the `main` branch. Furthermore, during those two commits, we created a new file called `feature.md`. No such file exists in the main branch. So before we do anything, let's do a quick `ls` to see what files are currently on our file system. ```bash ls feature.md hello.md ``` As expected, we have a `feature.md` file and a `hello.md` file. And yes, the `.git` folder is also there, but that's a hidden folder that we know about because we're quickly turning into git wizards here. But we don't take that into account. Alright, so now let's make the `main` branch active by issuing the command `git checkout main`. ```bash git checkout main Switched to branch 'main' ``` Git will tell us something like _Switched to branch main_ which is nice of git and tells us that the `HEAD` label is now on the `main` branch. But moving labels is not the only thing git has done. If you run `ls` again, you would see that the `feature.md` file is gone. ```bash ls hello.md ``` All that we're left with is our `hello.md` file. Which on one hand might seem scary that things can just disappear like that. On the other hand, when we think about it, we've asked git to go back to the `main` branch. And the tip of the `main` branch never had this file to begin with. So git reads from the DAG and makes sure that our filesystem is exactly like it was when we made that last commit on the `main` branch. If we want to go back to the `my-feature` branch, we can do so with the `git checkout my-feature` command. However, let's apply some of what we've learned here and just use `git switch my-feature` instead. ```bash git switch my-feature Switched to branch 'my-feature' ``` Sure enough, git has _switched to_ or activated the `my-feature` branch again by moving the `HEAD` label to it. And if we run `ls` again, we once again have two files, `hello.md` and `feature.md`. ``` ls feature.md hello.md ``` So, when we use `git switch` here, it calls `git checkout` under the hood, because `git checkout` is the only command that will actually read data from the DAG and make sure to restore the file system to the state it was in at that point. The `git checkout` command can not only check out branches. You can also check out a specific commit -- by passing it a commit ID -- or a tag which is something we haven't talked about yet, but we will later. For now, think of tag as a label that does not need to be on the tip of branch, but can go anywhere. One bonus feature that `git checkout` has up its sleeve is that it can also create branches. To do so, use the `-b` flag followed by the branch name. So when we used ```bash git switch -c my-feature ``` earlier to create a branch with `git switch` and it's `-c` flag for create. We could also have ran ```bash git checkout -b my-feature ``` instead. The result would have been exactly the same. But ultimately, only `git branch` can create a branch. All these other commands just re-use the same trick by calling `git branch` under the hood. Why is the flag to create a new branch `-c` when we use `git switch` and `-b` when we use `git checkout`? I don't know. But what I do know is that this is part of why people get frustrated with git. Because yes, it is not easy to remember all of the commands and their feature flags. But if we start to understand what git is doing under the hood, then it doesn't really matter all that much. When you want to create a branch, you can do so with whatever command you like best. The choice is yours. ## Chapter 21: Merging in git When we first discussed branching in git, we said that using branches is all about _isolating our work_. And -- just to be clear on this -- this is true. That's why we use branches. However, isolation is almost always a temporary state. We don't want to isolate our work in perpetuity. We want to go on a little journey to work on something without having to worry about any other changes. But when we're ready for it, we'd like to come back and contribute the fruits of our labor somehow. In git, we call this _merging_ and it is the exact opposite of _branching_. When we branch, our river splits in two. When we merge, we rejoin two branches of our river so that they come together again. ![Branching and merging in git](git-branch-merge.png) Depending on how much our branches diverged, merging them back together can be anything from straightforward to pretty complicated. Git needs to somehow figure out how to reconcile all of the difference in the two branches we are merging, and land on a situation that encapsulates all changes in both branches. Quite often, git will figure it out on its own. Sometimes though, it won't be able to, and will rely on us to tell it what to do. Before we get to that, let's start with some simpler examples of merging in the next chapter. ## Chapter 22: git merge To merge branches in git, we use the `git merge` command. The `git merge` command will merge whatever branch we ask it to into the branch that has the `HEAD` label. Remember that the `HEAD` label is the equivalent of a _you are here_ marker on our DAG. So if we want to merge branch `my-feature` into branch `main`, then we should first switch to branch `main` so that the `HEAD` label is on the `main` branch. Now if we would run the `git merge my-feature` command, git would merge the `my-feature` branch into wherever `HEAD` is, which is the `main` branch because we made sure of that. If that all sounds a bit confusing, just keep in mind that by default `git merge` only expects one single argument, the name of the branch you want to merge. So where should git merge that branch into? Well, into whatever branch we're on right now. And `HEAD` always points to whatever branch we're on right now. In the next chapters, we'll look at some examples and different merging scenarios. ## Chapter 23: Fast-forward merging The simplest kind of merge git can perform is a so-called fast-forward merge. A fast-forward merge can only occur when one of our two branches has seen no changes since the moment we branched. As it happens, this is the exact scenario we are in right now. We created a new `my-feature` branch and have added 2 commits to it. However, our `main` branch -- the one we branched off from to create our `my-feature` branch -- all this time has just been sitting there. Nothing has changed, nobody has added any commits to the `main` branch. So now, if we switch to the main branch and ask git to merge the `my-feature` branch, all git really has to do is move the `HEAD` and `main` labels to the tip of the `my-feature` branch. It doesn't even have to create a merge commit. All it needs to do is move a bunch of labels, because these branches never went in different directions. One went ahead and got 2 new commits, while the other just sat there. And it can now just catch up. ![Fast-forward merging in git](git-ff-merge.png) To try this ourselves, we should first switch to the `main` branch using the `git switch main` command. Then, we can merge the `my-feature` branch with the `git merge my-feature` command. ```bash git switch main Switched to branch 'main' git merge my-feature Updating 036776b..402793d Fast-forward feature.md | 1 + 1 file changed, 1 insertion(+) create mode 100644 feature.md ``` Git will merge everything, and even tell us it's doing a fast-forward merge. If we look at our commit log, with `git log` we can see that no commits have been added to the log. The most recent commit is still the one with the `Manage expectations through better phrasing` commit message, which was the second commit we did on the `my-feature` branch. ```bash git log commit 402793d176388b5d5da5f257eaa41b2eb4a19e54 (HEAD -> main, my-feature) Author: Joost De Cock Date: Wed Apr 5 08:44:20 2023 +0200 Manage expectations through better phrasing commit cf32fd562bb63b64361642a293d312b4d2449877 Author: Joost De Cock Date: Tue Apr 4 17:40:02 2023 +0200 Working on a new feature commit 036776b2794a9ad3e21f8da83e6cfeca4d9fedb6 Author: Joost De Cock Date: Tue Apr 4 16:22:48 2023 +0200 My second commit commit 4506fafad7b70ff2c44d7900d457f9a65133f7ed Author: Joost De Cock Date: Tue Apr 4 15:32:20 2023 +0200 My first commit ``` This confirms that all git had to do to merge this was move labels around. But that's not the only thing git did though. If we run `ls` again, you will find that the `feature.md` file is now present. So git not only moved labels, it also used `git checkout` under the hood to make sure that our current working directory is in sync with what is stored in the DAG. ```bash ls feature.md hello.md ``` Every time we move `HEAD`, either explicitly or as a side effect of what we asked git to do such as merging a branch in this case. Git will ensure that our current folder is kept in sync with what the DAG tells it should be there. ## Chapter 24: 3-way merging A 3-way merge in git is the default way of merging. Default in the sense that all other merges either can only happen under specific circumstances -- like the fast-forward scenario we discussed in the previous chapter -- or you need to tell git explicitly that you want it to do some other type of merge. If you just tell git to merge it will check whether a fast-forward merge is possible, and if not will do a 3-way merge. So that begs the question: _What is a 3-way merge?_ And arguably a more interesting question: _Why is it called a 3-way merge?_ It is because [with a honey the middle there's some leeway](https://www.youtube.com/watch?v=Pi7gwX7rjOw)? Sadly, no. It is called a 3-way merge because git needs 3 commits to make this merge work. The most recent commit of each of the two branches (the tip of the branches) and a _merge commit_ which is a special commit git will create and that will have the two other commits as its ancestors. ![A 3-way merge in git](git-3way-merge.png) When we ask git to merge something, we will immediately know whether git is using a fast-forward merge or a 3-way merge. That is because in a fast-forward merge git does not need to add a commit. It just moves labels. So it will do the merge and that's the end of it. However, if a fast-forward merge is not possible, git will need to create a merge commit. And when we commit we need a commit message for the log. So the moment we ask git to merge and it needs to do a 3-way merge, it will prompt us for the commit message, which tells us that this will be a 3-way commit. To trigger a 3-way merge in our example repository, we first need to make sure that our two branches each have changes or commits on them that are not on the other branch. We are currently on the main branch. But we could run ```bash git switch main ``` to make sure we are. ```bash git switch main Already on 'main' ``` Now let's add an extra line to our `hello.md` file that says _Added in main._. ```bash echo " Added in main." >> hello.md ``` If we run `git status` git will tell us that there are changes to the `hello.md` file and suggest that perhaps we should stage them. ```bash git status On branch main Changes not staged for commit: (use "git add ..." to update what will be committed) (use "git restore ..." to discard changes in working directory) modified: hello.md no changes added to commit (use "git add" and/or "git commit -a") ``` But we already knew that so let's add them to the staging area with ```bash git add hello.md ``` Then, we can commit with ```bash git commit -m "Commit on the main branch" ``` Now that we've added a commit to our `main` branch, let's do the same on our `my-feature` branch. First, we switch to the branch with ```bash git switch my-feature ``` Next let's add an extra line to our `feature.md` file that says _Added in my-feature._. ```bash echo " Added in my-feature." >> feature.md ``` Let's add and commit this change too: ```bash git add feature.md git commit -m "Commit on the my-feature branch" ``` Ok, we now have two branches that each have a commit on them that the other branch does not have. This scenario cannot be merged with a fast-forward merge. As a matter of fact, if we run `git log` now, we see something that is interesting but at this point should not be surprising. ```bash git log commit 666ef4596af22ed63ba9d66e2627b991cb155197 (HEAD -> my-feature) Author: Joost De Cock Date: Wed Apr 5 12:16:59 2023 +0200 Commit on the my-feature branch commit 402793d176388b5d5da5f257eaa41b2eb4a19e54 Author: Joost De Cock Date: Wed Apr 5 08:44:20 2023 +0200 Manage expectations through better phrasing commit cf32fd562bb63b64361642a293d312b4d2449877 Author: Joost De Cock Date: Tue Apr 4 17:40:02 2023 +0200 Working on a new feature commit 036776b2794a9ad3e21f8da83e6cfeca4d9fedb6 Author: Joost De Cock Date: Tue Apr 4 16:22:48 2023 +0200 My second commit commit 4506fafad7b70ff2c44d7900d457f9a65133f7ed Author: Joost De Cock Date: Tue Apr 4 15:32:20 2023 +0200 My first commit ``` Remember, we are currently on the `my-feature` branch. And sure enough, the commit log tells us that both the `HEAD` and `my-feature` labels are on our most recent commit. However, nowhere in the commit log can we see the `main` label. It's like, it does not exist somehow. It of course does exist. But it is not shown because by default, git log will look at the DAG and will follow a trail from where `HEAD` is to its ancestor commit, and then that commit's ancestor, and so on. Essentially peddling upstream in our DAG river. And so it will never come across the most recent commit on the `main` branch. But if we do `git log --all` git will just show us all commits. ```bash git log --all commit 666ef4596af22ed63ba9d66e2627b991cb155197 (HEAD -> my-feature) Author: Joost De Cock Date: Wed Apr 5 12:16:59 2023 +0200 Commit on the my-feature branch commit 6a60eec85b16726b34ff0a8768c8d9e3a670c7d2 (main) Author: Joost De Cock Date: Wed Apr 5 12:13:37 2023 +0200 Commit on the main branch commit 402793d176388b5d5da5f257eaa41b2eb4a19e54 Author: Joost De Cock Date: Wed Apr 5 08:44:20 2023 +0200 Manage expectations through better phrasing commit cf32fd562bb63b64361642a293d312b4d2449877 Author: Joost De Cock Date: Tue Apr 4 17:40:02 2023 +0200 Working on a new feature commit 036776b2794a9ad3e21f8da83e6cfeca4d9fedb6 Author: Joost De Cock Date: Tue Apr 4 16:22:48 2023 +0200 My second commit commit 4506fafad7b70ff2c44d7900d457f9a65133f7ed Author: Joost De Cock Date: Tue Apr 4 15:32:20 2023 +0200 My first commit ``` Alright, now that we've established that this situation cannot be merged with a fast-forward merge, let's switch to the `main` branch and ask git to merge the `my-feature` branch. ```bash git switch main Switched to branch 'main' git merge my-feature Merge made by the 'ort' strategy. feature.md | 3 +++ 1 file changed, 3 insertions(+) ``` Sure enough, git will prompt us for a commit message, although it's being helpful and has already provided a default message for us saying _Merge branch 'my-feature'_. If we inspect the commit log with `git log` we see that we once again have all labels in the log. ```bash git log commit 147cc6189ebeba3315daf7ad2b2e4e719eb8a21f (HEAD -> main) Merge: 6a60eec 666ef45 Author: Joost De Cock Date: Wed Apr 5 12:31:42 2023 +0200 Merge branch 'my-feature' commit 666ef4596af22ed63ba9d66e2627b991cb155197 (my-feature) Author: Joost De Cock Date: Wed Apr 5 12:16:59 2023 +0200 Commit on the my-feature branch commit 6a60eec85b16726b34ff0a8768c8d9e3a670c7d2 Author: Joost De Cock Date: Wed Apr 5 12:13:37 2023 +0200 Commit on the main branch commit 402793d176388b5d5da5f257eaa41b2eb4a19e54 Author: Joost De Cock Date: Wed Apr 5 08:44:20 2023 +0200 Manage expectations through better phrasing commit cf32fd562bb63b64361642a293d312b4d2449877 Author: Joost De Cock Date: Tue Apr 4 17:40:02 2023 +0200 Working on a new feature commit 036776b2794a9ad3e21f8da83e6cfeca4d9fedb6 Author: Joost De Cock Date: Tue Apr 4 16:22:48 2023 +0200 My second commit commit 4506fafad7b70ff2c44d7900d457f9a65133f7ed Author: Joost De Cock Date: Tue Apr 4 15:32:20 2023 +0200 My first commit ``` However, there's some interesting things to take note of here. For one thing, the `HEAD` and `main` labels are now on the merge commit, which is the one git created. But the `my-feature` label remains on the last commit in the `my-feature` branch. This is because we merge the `my-feature` branch **into** the `main` branch. In other words, no changes whatsoever were made to the `my-feature` branch, the only changes -- the new merge commit -- were made on the `main` branch because that's the one we're merging into. Let's have a look at this _merge commit_ that git created. We have its ID right there in the log, so we can use `git show` to show it in detail. ```bash git show 147cc commit 147cc6189ebeba3315daf7ad2b2e4e719eb8a21f (HEAD -> main) Merge: 6a60eec 666ef45 Author: Joost De Cock Date: Wed Apr 5 12:31:42 2023 +0200 Merge branch 'my-feature' ``` What we can see is that there are no real changes in this commit. It's essentially an empty commit, it has a log message, an author, and a date, but it did not record any changes. What it does include is the IDs of the commits that it merged. If we check these IDs against our log, we can see that they were, at the time of the merge, the most recent commits on each of the merged branches. In other words, these IDs together with the merge commit itself, make up the 3 commits that together form a 3-way commit. Some people -- let's call them git purists -- do not like this kind of empty merge commit. Which is why git also provides different ways to merge things. We'll have a look at such an alternative merging strategy in the next chapter. ## Chapter 25: Squash merging A squash merge is a merging strategy where rather than try to merge a bunch of new commits from one branch into another, git will instead take this bunch of new commits, and stage them as a single ready-to-go commit that will have the same effect. It is essentially telling git _Hey git, I did a bunch of work here in this branch, now can we pretend I did all of that in one sitting and just make it a single commit as if there was never a branch at all_. An example will make this more clear, but before doing so, let me quickly go back to the point before our merge. Yes, you can do that. No I won't show you how until a later chapter. For now, let me just quickly do that. Ok, we now have two branches again, and they are not merged. Each of them has one commit on them that the other does not have. To make this example more meaningful, we are now going to add two more commits to the `my-feature` branch. ```bash git switch my-feature Switched to branch 'my-feature' echo " This is extra commit 1." >> feature.md git add feature.md git commit -m "Extra commit 1" [my-feature 4b23e6e] Extra commit 1 1 file changed, 2 insertions(+) echo " This is extra commit 2." >> feature.md git add feature.md git commit -m "Extra commit 2" [my-feature d987de0] Extra commit 2 1 file changed, 2 insertions(+) ``` If we check the commit log with `git log` we can see that we now have 3 commits in our `my-feature` branch that are not in the `main` branch. ```bash git log commit d987de06e624d0ffaf23678f317b97c85dd10989 (HEAD -> my-feature) Author: Joost De Cock Date: Wed Apr 5 12:51:35 2023 +0200 Extra commit 2 commit 4b23e6e68f510f7ff12a8aa83ec879475301854a Author: Joost De Cock Date: Wed Apr 5 12:51:11 2023 +0200 Extra commit 1 commit 666ef4596af22ed63ba9d66e2627b991cb155197 Author: Joost De Cock Date: Wed Apr 5 12:16:59 2023 +0200 Commit on the my-feature branch commit 402793d176388b5d5da5f257eaa41b2eb4a19e54 Author: Joost De Cock Date: Wed Apr 5 08:44:20 2023 +0200 Manage expectations through better phrasing commit cf32fd562bb63b64361642a293d312b4d2449877 Author: Joost De Cock Date: Tue Apr 4 17:40:02 2023 +0200 Working on a new feature commit 036776b2794a9ad3e21f8da83e6cfeca4d9fedb6 Author: Joost De Cock Date: Tue Apr 4 16:22:48 2023 +0200 My second commit commit 4506fafad7b70ff2c44d7900d457f9a65133f7ed Author: Joost De Cock Date: Tue Apr 4 15:32:20 2023 +0200 My first commit ``` Next, we will ask git to squash-merge these commits into the `main` branch. To do so, we first switch to the `main` branch, and then use the `--squash` flag in our merge command to tell git we want to squash-merge. ```bash git switch main Switched to branch 'main' git merge --squash my-feature Squash commit -- not updating HEAD Automatic merge went well; stopped before committing as requested ``` Git is being explicit here and telling us that it did not update `HEAD` and it did not commit, as requested. So what did it do. Well, if we run `git status` we see that it has staged changes to be committed. ```bash git status On branch main Changes to be committed: (use "git restore --staged ..." to unstage) modified: feature.md ``` And if we run `git log` we can see that `HEAD` is still on the most recent commit of the `main` branch. Nothing has been changed by the merge. ```bash git log commit 6a60eec85b16726b34ff0a8768c8d9e3a670c7d2 (HEAD -> main) Author: Joost De Cock Date: Wed Apr 5 12:13:37 2023 +0200 Commit on the main branch commit 402793d176388b5d5da5f257eaa41b2eb4a19e54 Author: Joost De Cock Date: Wed Apr 5 08:44:20 2023 +0200 Manage expectations through better phrasing commit cf32fd562bb63b64361642a293d312b4d2449877 Author: Joost De Cock Date: Tue Apr 4 17:40:02 2023 +0200 Working on a new feature commit 036776b2794a9ad3e21f8da83e6cfeca4d9fedb6 Author: Joost De Cock Date: Tue Apr 4 16:22:48 2023 +0200 My second commit commit 4506fafad7b70ff2c44d7900d457f9a65133f7ed Author: Joost De Cock Date: Tue Apr 4 15:32:20 2023 +0200 My first commit ``` That is, of course, because technically, we haven't really merged anything. Git has prepared the staging area in such a way that when we commit this, it will have the same effect as merging our feature branch. But no merge ever occurred. There is no empty merge commit. It looks as if all the work in the `my-feature` branch was done in one regular commit on the `main` branch. Some people prefer this way of merging. If you don't have a personal preference, you can mostly forget about squash-merging. But it's good to know the option is there should you ever feel like you'd want to use it. ## Chapter 26: git diff In this chapter, we are going to look at how git can help us compare different versions of our files. The way to do that is with the `git diff` command -- which you should write with double `f` because it stands for _difference_. By default, the command will compare your working directory -- that is the files on your file system right now -- with the staging area. We don't have any changes right now, which we can confirm by running `git status` so if we would run `git diff` right now, it would not give us any info. ```bash git status On branch main nothing to commit, working tree clean git diff ``` So let's quickly make a change by opening the `feature.md` file and change the `This is extra commit 2.` line to `This is extra commit 3.`. If now we run `git status` git will tell us that there are changes in `feature.md` that have not been staged. Ok, good to know. But what _exactly_ has changed? If we run `git diff` git will tell us. ```bash git diff diff --git a/feature.md b/feature.md index 5a862ad..b65b93d 100644 --- a/feature.md +++ b/feature.md @@ -5,4 +5,4 @@ Added in my-feature. This is extra commit 1. -This is extra commit 2. +This is extra commit 3. ``` The output is formatted like the `diff` command on Unix and Linux systems. If you've never heard of diff, then this will take some getting used to, but it's not that hard to figure out. The diff does not show the entire file, only the differences. Lines preceded by a `-` sign have been removed. Whereas lines preceded by a `+` sign were added. As I mentioned, by default `git diff` will compare the working directory with the staging area. If you would instead like to compare the staging area to the DAG (specifically, to `HEAD`), pass it the `--cached` flag. Why cached? Because in git the staging area is also referred to as the cache. If we run `git diff --cached` now, we will get nothing. Because we have not staged anything, so there is no difference between the staging area and the DAG. ```bash git diff --cached ``` However if we were to stage a change, the results would be different. So if we run ```bash git add feature.md ``` We now have changes in our staging area. Let's first run `git diff` again ```bash git diff ``` We get no output. Because there are no changes between our working copy and the staging area (or cache). If we run `git diff --cached` on the other hand, we will once again see a diff of the changes we've made. But now those changes are between the staging area and `HEAD`. ```bash git diff --cached diff --git a/feature.md b/feature.md index 5a862ad..b65b93d 100644 --- a/feature.md +++ b/feature.md @@ -5,4 +5,4 @@ Added in my-feature. This is extra commit 1. -This is extra commit 2. +This is extra commit 3. ``` If you have changed many files, you limit the scope of the command by including a file or folder name. Like `git diff feature.md`. You can also compare between branches, or even commits. Check the output of `git diff --help` if you're curious about all possibilities. ## Chapter 27: Git and the network So far, all of our work has been done in our very own repository that only exists on our computer. That's great, I actually use this often myself when I'm just looking to avoid losing changes or keeping track of things. However, the more common use case is that we are collaborating with others. That we are working on something together with friends or colleagues and we want to share our changes with them. Fueled by the rise of git hosting sites like [GitHub](https://github.com) and [GitLab](https://gitlab.com/) this scenario has become so popular that today many people don't fully comprehend the difference between let's say _git_ and _github_. Not you of course. You're on chapter 27 and are probably eager to find out how we get git to talk to the network. The first thing to know is that git will only ever talk to the network when you tell it to. That's perhaps something to appreciate for a moment in today's world of cloud services, subscriptions, telemetry, and so on. Git will not do any networking unless you ask it to. So how do you ask it? Well, these are the relevant commands: - First up is `git clone` which you can think of as the networked version of `git init`. - Second is `git fetch` which downloads remote data but makes no local changes. - As an alternative, there is `git pull` which also downloads, but merges changes locally. - And finally there's `git push` which does the opposite and pushes our local changes to the remote server. Let's look at each of these in detail over the next 4 chapters. ## Chapter 28: git clone If you've ever used git before, chances are `git clone` was the very first command you used. That is because unlike `git init` which creates a repository locally, `git clone` will set up a local copy of a pre-existing repository that exists _somewhere else_. This _somewhere else_ can be many different things. It can be another folder on your computer, a shared drive, or network mount, a remote location that you access over SSH or another tunnel, or the most common scenario, a git hosting service like GitHub or GitLab. No matter where we are cloning from, git refers to the source repository that we are cloning from as the **remote**. To make this all a bit more hands-on, let's practice by cloning a repository from GitHub. There are, of course, millions of repositories on GitHub but I have set up a repository for this purpose, so let's use that one. It will be our remote, and you can find it at [github.com/joostdecock/git-training/](https://github.com/joostdecock/git-training/). Git can use several protocols to talk to the remote. When cloning a repository from GitHub, the very first choice we have to make is choosing the protocol we want to use. Since this will influence the URL that we have to pass to the `git clone` command. The URL can be found on the repository page of the hosting service. GitHub has a big green **Code** button, whereas GitLab has a big blue **Clone** button. Both of them give you a drop-down that lists the URLs to clone with either SSH or HTTPS. When possible, you should always pick SSH. It has a number of benefits, and it's what we'll use in the examples below. However, you should know that you need to [setup your SSH keys to do so](https://docs.github.com/en/authentication/connecting-to-github-with-ssh). Check the documentation of your git hosting provider of choice for more details. Alright, so to clone a repository, we run `git clone` followed by the URL. In our case, to clone with SSH we run: ```bash git clone git@github.com:joostdecock/git-training.git Cloning into 'git-training'... remote: Enumerating objects: 3, done. remote: Counting objects: 100% (3/3), done. remote: Compressing objects: 100% (2/2), done. remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0 Receiving objects: 100% (3/3), done. ``` If we don't specify anything else, git will create a folder in the current directory that has the same name as the repository we are creating, `git-training` in this case. But if we want to use a different name, we can specify it after the URL: ```bash git clone git@github.com:joostdecock/git-training.git other-name Cloning into 'other-name'... remote: Enumerating objects: 3, done. remote: Counting objects: 100% (3/3), done. remote: Compressing objects: 100% (2/2), done. remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0 Receiving objects: 100% (3/3), done. ``` Congratulations, you have cloned your first repository. If you enter the directory, you will find the familiar `.git` folder there that holds all of git's internals. This repository will behave just like the local one we created earlier. But there are subtle differences that can tell you this repository was cloned from a remote repository. One place you will see a difference is when you run `git log`. ```bash git log commit 15e0732ee970875938bf26c78b5522958cdc1d0c (HEAD -> main, origin/main, origin/HEAD) Author: Joost De Cock Date: Wed Apr 5 16:58:09 2023 +0200 Initial commit ``` We can see the `HEAD` and `main` labels on the most recent commit, as expected, but in addition there are two other labels: `origin/HEAD` and `origin/main`. The location of the `origin/HEAD` and `origin/main` labels indicate where the `HEAD` and `main` labels are in the origin repository. Or more accurately, where they were last time git talked to the origin. Another way you can confirm that this repository was cloned from a remote is by running `git remote`: ```bash git remote origin ``` It will tell you `origin` which isn't all that useful. But if we add the `-v` flag (for verbose) it will give us a bit more info: ```bash git remote -v origin git@github.com:joostdecock/git-training.git (fetch) origin git@github.com:joostdecock/git-training.git (push) ``` We can see that git has not one but 2 URLs for our origin. One to **fetch** and one to **push**. So let's look at what _fetch_ is all about in the next chapter. ## Chapter 29: git fetch The `git fetch` command is like a careful version of `git pull`. When you run `git fetch`, it will connect to the origin and download new or updated data, but it won't make any changes to the DAG locally. To fully appreciate what exactly `git fetch` does would lead us too far down a rabbit hole. But essentially it is a non-intrusive version of `git pull`. So it will download, it will make sure everything is available locally, but it won't actually change anything. If you want the changes to be applied, you should merge then explicitly. To do so, you insert the `origin` keyword between the `git merge` command and the branch name to merge. If I make a change to the repository on GitHub, you can see that `git fetch` will download a bunch of data, and `git merge origin main` will then merge it. ```bash git fetch remote: Enumerating objects: 5, done. remote: Counting objects: 100% (5/5), done. remote: Compressing objects: 100% (2/2), done. remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0 Unpacking objects: 100% (3/3), 682 bytes | 113.00 KiB/s, done. From github.com:joostdecock/git-training 15e0732..8e13092 main -> origin/main git merge origin main Updating 15e0732..8e13092 Fast-forward README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) ``` Using `git fetch` is the careful approach because you first download the remote changes, and then merge them. This gives you the option to take a moment between steps to inspect what has changed and decide whether you want these changes to be added to your local repository or not. But realistically, you're most often going to use `git pull` instead. And as it happens, that's what we'll be looking at in the next chapter. ## Chapter 30: git pull The `git pull` command is the more brave way to update from a remote. It will not only download changes, it will also merge them so that your local repository is in sync with the remote. If I make another change on the remote, we can see that running `git pull` will download and merge in one fell swoop. ```bash git pull remote: Enumerating objects: 5, done. remote: Counting objects: 100% (5/5), done. remote: Compressing objects: 100% (2/2), done. remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0 Unpacking objects: 100% (3/3), 688 bytes | 114.00 KiB/s, done. From github.com:joostdecock/git-training 8e13092..a2bbbde main -> origin/main Updating 8e13092..a2bbbde Fast-forward README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) ``` Because `git pull` does downloading and merging for you, it's faster than `git fetch`. The downside is that you cannot press pause before merging and make sure that you actually want these changes. As such, `git pull` is best used when you trust the remote and the people who have write access to it. If, on the other hand, you are not so sure everything in the remote is kosher, use `git fetch` instead. ## Chapter 31: git push The last of the git networking commands is `git push`. It does the opposite of `git pull`, which downloads changes from the remote and ensures your local repository is in sync with the remote. In contrast, `git push` uploads your changes to the remote, and ensures that the remote is in sync with your local repository. To try this out, we will first make a change to the `README.md` file, then add it to the staging area, and finally commit it. ```bash echo " - This is change 3" >> README.md git add README.md git commit -m "change 3" ``` Now if we run `git log` we will see that our `main` branch is one commit ahead of `origin/main`. ```bash git log commit 1c5b1fc6d687c985341fb05c4b54252216cfa7bf (HEAD -> main) Author: Joost De Cock Date: Wed Apr 5 18:21:44 2023 +0200 change 3 commit a2bbbde96deb36c70c772dda06279b87c345e43b (origin/main, origin/HEAD) Author: Joost De Cock Date: Wed Apr 5 18:16:02 2023 +0200 Update README.md commit 8e130929ab044aa3616821f46d67927ea4673ab5 Author: Joost De Cock Date: Wed Apr 5 18:14:09 2023 +0200 Update README.md commit 15e0732ee970875938bf26c78b5522958cdc1d0c Author: Joost De Cock Date: Wed Apr 5 16:58:09 2023 +0200 Initial commit ``` So, let's bring origin up to date with our local change by running `git push`. ```bash git push Enumerating objects: 5, done. Counting objects: 100% (5/5), done. Delta compression using up to 8 threads Compressing objects: 100% (2/2), done. Writing objects: 100% (3/3), 281 bytes | 281.00 KiB/s, done. Total 3 (delta 1), reused 0 (delta 0), pack-reused 0 remote: Resolving deltas: 100% (1/1), completed with 1 local object. To github.com:joostdecock/git-training.git a2bbbde..1c5b1fc main -> main ``` And sure enough, if we run `git log` again, we can see that the `origin/main` label is now on the most recent commit, which shows that the `main` branch of the remote was updated. ```bash git log commit 1c5b1fc6d687c985341fb05c4b54252216cfa7bf (HEAD -> main, origin/main, origin/HEAD) Author: Joost De Cock Date: Wed Apr 5 18:21:44 2023 +0200 change 3 commit a2bbbde96deb36c70c772dda06279b87c345e43b Author: Joost De Cock Date: Wed Apr 5 18:16:02 2023 +0200 Update README.md commit 8e130929ab044aa3616821f46d67927ea4673ab5 Author: Joost De Cock Date: Wed Apr 5 18:14:09 2023 +0200 Update README.md commit 15e0732ee970875938bf26c78b5522958cdc1d0c Author: Joost De Cock Date: Wed Apr 5 16:58:09 2023 +0200 Initial commit ``` ## Chapter 32: Amending the most recent commit Mistakes happen. Fortunately, git has various ways that you can fix mistakes big and small after the fact. A common mistake is a typo in the commit message. Or perhaps you forgot to add one particular file to the staging area before committing. In a situation like this, `git commit --amend` is your friend. It allows you to update the most recent commit. Or more accurately, the commit that currently has the `HEAD` label on it. To illustrate this, I've initialized a brand new repository with `git init` and I have added and committed one file. If we look at the commit log, we can see only one commit. ```bash git log commit 7594ef965914a5437d6101eb5f707a47022640c7 (HEAD -> main) Author: Joost De Cock Date: Wed Apr 5 18:33:55 2023 +0200 My firts commit ``` However, and this is a bit embarrassing. For the commit message I wrote _My firts commit_ when it should have been _My first commit_. This is not that big a deal perhaps, but I don't want my first commit to forever be plagued by a typo in the commit message. But, I can amend it by running: ```bash git commit --amend -m "My first commit" ``` I can choose to pass the new commit message in the command, using the `-m` flag, or I can just let git open an editor for me to write the commit message. However way we decide to go, if we run `git log` again, we can see that the commit message has been updated, and our typo is fixed. ```bash git log commit 148e3d998ad516d75821b2e192281e7d47f2cdb3 (HEAD -> main) Author: Joost De Cock Date: Wed Apr 5 18:33:55 2023 +0200 My first commit ``` We can also see that the commit date has not changed. Which goes to show that we did not somehow create a new commit and replaced the old one. Instead, we updated the existing commit with a new commit message. However, if we pay even closer attention, we can see that the commit message is not the only thing that has changed. The commit ID has also changed. Which should not come as a surprise because in chapter 6 we learned that git uses the commit data, the author, the date, and the log message as input to create the commit checksum. So when we change the commit's log message, the commit checksum will also change. There is no way around that. This brings us to an important point to take into consideration whenever you are tempted to go and change something about git's history. Important enough to warrant its own chapter. ## Chapter 33: A warning about rewriting history In the previous chapter, we got our first taste of how we can _rewrite history_ in git. In the next chapters, we'll see more ways that we can go back and make changes to the DAG, the structured data where git keeps all our work. However, there's an important caveat that you should keep in mind whenever you want to change git's history. And that is that **checksums don't lie**. You can go back in git's history and change things. That's not a problem. But keep in mind that when you change either the commit data, the author, the date, or the log message, the commit checksum/ID will change. Why does this matter? Well, it may very well not. As long as the history you are changing only exists in your local copy -- in other words, on your computer -- things will be fine. But if you are rewriting a shared history, for example by cloning a repository, then rewriting a bunch of its history and then pushing back those changes, things will not end well. Because now you and other contributors will have a different idea of what the git history is, and the entire DAG will unravel. So, as a rule of thumb, **make sure to only ever rewrite your own history and never rewrite any history that you have shared with others**. With that warning out of the way, let's look at some more ways we can rewrite our own history. ## Chapter 34: git reset The `git reset` command allows you to reset to an earlier state of the DAG or -- in its more gentle mode with the `--soft` flag, merely move the `HEAD` label to a different spot than the tip of a branch. If you do not specify what to reset to, git assumes you want `HEAD`. So if you run this command without any arguments, it will reset to the current `HEAD`. To make that a bit more tangible, imagine you have a git repository. If you make some changes to a file, and then add it to the staging area with `git add` you now have things in the staging area that are not in `HEAD`. If you run `git reset` at this moment, the changes will be removed from the staging area. However, the files on disk will keep their changes. Git will only concern itself with the DAG and staging area. ```bash git status On branch main nothing to commit, working tree clean echo "changed" >> readme.md git status On branch main Changes not staged for commit: (use "git add ..." to update what will be committed) (use "git restore ..." to discard changes in working directory) modified: readme.md no changes added to commit (use "git add" and/or "git commit -a") git add readme.md git status On branch main Changes to be committed: (use "git restore --staged ..." to unstage) modified: readme.md git reset Unstaged changes after reset: M readme.md git status On branch main Changes not staged for commit: (use "git add ..." to update what will be committed) (use "git restore ..." to discard changes in working directory) modified: readme.md no changes added to commit (use "git add" and/or "git commit -a") ``` But the use of `git reset` is not limited to clearing out the staging area. You can also reset an earlier state of the DAG. Either by referencing a specific commit or -- as it's used rather often -- by telling it how many steps to _go back_ from `HEAD`. This should be easier to understand when we use an example. Let' say you are working on solving a bug. You've created a branch for this, and you've finally fixed the bug and would now like to submit your fix for somebody else to merge. However, you didn't fully understand the bug at first, and you tried to fix it three times without success, before finally nailing it on your fourth attempt. So now you have these 4 commits that document how it took you repeated attempts to find and fix this bug. Which is perhaps not a problem, but maybe you would just feel better if the commit log showed a single commit where you went in like a ninja, fixed the bug without touching anything else. For this sort of ninja-level git log, you have two options: You either get everything perfect from the first attempt. Or -- if you are a merely human like myself -- you learn to use `git reset` and simply rewrite history and cast yourself in the leading role of ninja git master. Let's say it took us 4 commits to fix this bug. We've made changes, used `git add` and then `git commit` 4 times in a row, and now we've finally got it right. At this point, our commit log will have these 4 recent commits at the top, with the `HEAD` label pointing to the most recent one. ```bash commit 33f7eeee7e842cd615096e2670c218d580a1e7af (HEAD -> main) Author: Joost De Cock Date: Tue Apr 11 08:51:14 2023 +0200 bugfix commit 4 commit 51f2e1834332415c15bcf586d48a329b02c47534 Author: Joost De Cock Date: Tue Apr 11 08:51:10 2023 +0200 bugfix commit 3 commit 68b7b9e925bc1bf0cad512c4ec2f1150359bc33b Author: Joost De Cock Date: Tue Apr 11 08:51:06 2023 +0200 bugfix commit 2 commit 6ed8335d1b67843da8920be539314eac7ee277a3 Author: Joost De Cock Date: Tue Apr 11 08:50:51 2023 +0200 bugfix commit 1 commit 148e3d998ad516d75821b2e192281e7d47f2cdb3 Author: Joost De Cock Date: Wed Apr 5 18:33:55 2023 +0200 My first commit ``` Now if we use `git reset --soft HEAD~4` we are telling git to soft reset HEAD to 4 commits earlier. ```bash git reset --soft HEAD~4 git log commit 148e3d998ad516d75821b2e192281e7d47f2cdb3 (HEAD -> main) Author: Joost De Cock Date: Wed Apr 5 18:33:55 2023 +0200 My first commit git status On branch main Changes to be committed: (use "git restore --staged ..." to unstage) modified: readme.md ``` The effect of this is that the most recent 4 commits are undone. They are removed from the DAG, but the result of their combined changes is preserved in the staging area. Which means that we can now commit what's in the staging area, and this commit will hold the work done in the 4 commits we just reset or rolled back. In the commit log though, it will for always appear that we did all of this work in one single commit. ```bash git commit -m "Fixed a bug like a boss" [main 7645b32] Fixed a bug like a boss 1 file changed, 4 insertions(+) git log commit 7645b322297796f82de5dac44a2c9c1be8e0d7dd (HEAD -> main) Author: Joost De Cock Date: Tue Apr 11 08:56:37 2023 +0200 Fixed a bug like a boss commit 148e3d998ad516d75821b2e192281e7d47f2cdb3 Author: Joost De Cock Date: Wed Apr 5 18:33:55 2023 +0200 My first commit ``` You probably noticed that I used the `--soft` flag after the `git reset` command. Which begs the question _what is a soft reset, and is there also a hard reset?_. The answer is yes. Let's look at the various types of resets in the next chapter. ## Chapter 35: Soft, mixed, and hard resets in git The default mode of `git reset` is to do a so-called _mixed_ reset, which personally I think should be called _firm_ reset because it's in between a soft and a hard reset. A soft reset will reset changes from the DAG but will leave them in the staging area. A mixed reset will reset changes from the DAG and the staging area, but will keep the changes in the working directory (as in, the files on your disk). A hard reset will reset changes from the DAG and from the staging area and from the working directory. In other words, a hard reset will discard your work with no way to retrieve it. For this reason, you should go with a `--soft` reset if you would like to keep your changes in the staging area. Typically this means you want to re-commit them again, you are simply bundling some commits into one. You should use a `--mixed` reset -- which is the default, so you don't have to specify it -- if you want the commits undone and also removed from the staging area. Perhaps you made some changes that in retrospect were not a good idea. Typically this means you want to _undo_ commits entirely. You should only ever use a `--hard` reset if you know what you are doing, you are not afraid of losing your work, or you learn best by suffering. ## Chapter 36: git tag By now we've learned how git uses labels to reference specific commits. We've also learned that the `HEAD` label is special because it acts like a _you are here_ marker on our DAG telling us where we are at any moment. But git also creates labels for each branch, and even adds labels for remote branches. You can leverage the same functionality to add your own labels. This can be done with the `git tag` command which adds a tag to a commit. Such a tag is a label created by you rather than one that git manages internally. It also will never move, unlike the way git automatically moves its internal labels to keep track of things. Tagging commits like this is a way to sort of _bookmark_ a commit so you can refer to it in a simpler way than by using its checksum ID. One use case where tagging is used extensively is to track releases throughout the lifecycle of a project. To tag a commit, you use the `git tag` command followed by the tag name. For example, if you are releasing version 1.1.0 of the software you are working on, you could tag the latest commit with `git tag v1.1.0`. ```bash git tag v1.1.0 git log commit 7645b322297796f82de5dac44a2c9c1be8e0d7dd (HEAD -> main, tag: v1.1.0) Author: Joost De Cock Date: Tue Apr 11 08:56:37 2023 +0200 Fixed a bug like a boss commit 148e3d998ad516d75821b2e192281e7d47f2cdb3 Author: Joost De Cock Date: Wed Apr 5 18:33:55 2023 +0200 My first commit ``` Now at any moment in time, you can easily restore your working directory in the state it was when version 1.1.0 came out by running `git checkout v1.1.0`. If you do so right away, nothing special will happen. But if you do it further down the line when more commits have been added you will find that git freaks out a bit because you are now in a _detached HEAD_ state. We will cover what exactly such a _detached HEAD_ state is in a later chapter. For now, let's just agree that it sounds equal parts scary and funny. ## Chapter 37: git stash Earlier, in chapter 12, we learned that there are 3 areas to keep in mind when working with git: - The index or DAG - The staging area - The working directory, in other words, the files on our disk Since you're a bit of a git pro by now, it's probably time to let you know that there is a 4th area where you can put things called the `stash`. The stash is a like a little pocket dimension where you can temporarily put what you are current working on. It's implemented like a stack, so you can push several states to it. This comes in handy when you are working on something and you have a bunch of staged files or local changes, and now you suddenly have to switch to a different branch to work on some urgent bugfix, for example. The problem is that you are probably not ready to commit your current work because it's sort of halfway finished. In a scenario like this, you can use `git stash` to bundle up your work in progress and put it in on the stash, which is a stack-like structure that you can push data to. When you run `git stash` is has the same effect as ```bash git stash push ``` In other words, push is its default behavior. Here too, git is merely recycling the things it does well. When you run this command, git will essentially add your changes to a commit object. But it won't actually commit anything, but instead push this commit object on the stash stack. After doing this, it will ensure that your working directory is _clean_ again, in other works in sync with `HEAD`. If, at any moment, potentially after `HEAD` points to a new commit, you want to re-apply these changes, you can do so with ``` git stash apply ``` To see the different entries on the stash stack, you can run: ``` git stash list stash@{0}: WIP on main: e127d62 Merge branch 'mathieu-add-iocs' into 'main' stash@{1}: On main: image-src-plugin stash@{2}: WIP on main: 0f96791 Feat: Easy export of the oauth token stash@{3}: WIP on main: 9918d55 fix: Delay gitlab API action until localstorage is ready ``` You will notice that by default, git will use the commit message of wherever `HEAD` is pointing to to identify this entry in the stash stack. If you'd rather specify your own message, you can do so with the `-m` switch: ```bash git stash push -m "Halfway through working on the scroll bug" ``` The `git stash` command provides a handle little storage area where you can put your half-finished work when you have to switch from one task to another in git. You can learn more by running `git stash --help`. ## Chapter 38: The .gitignore file By default, git will look for any changes in the folder holding your repository and eagerly nudge you to add and commit them. But sometimes, that's not what you want. You typically want to keep track of only those files that matter, and not things like dependencies, build artifacts, error logs, or those pesky `.DS_Store` files on mac. Fortunately, git has a standard way to tell it to _ignore_ certain files or folders and that is through a `.gitignore` file. A `.gitignore` file is typically added to the top-level folder of a repository although it's worth pointing out that you can also add one in a subfolder. The file is a simple text file where each line holds a name or pattern of files or folders to match. Those matching files will then be ignored. For example, if you're doing NodeJS development on MacOS you should probably at least have the following in your `.gitignore` file: ``` # Keep dependencies out of the repository node_modules # Don't track debug logs npm-debug.log* # Ugh Mac, you are the worst .DS_Store ``` You should then add and commit this file so that others collaborating with you can also benefit from it. You can add comments to this file by starting the line with the `#` sign. For more details, run `man gitignore`. ## Chapter 39: You've detached HEAD, now what? If at any time you checkout something that is not the tip of a branch -- for example an older tag or commit ID -- you will get a message from git that reads something like: ```bash You are in 'detached HEAD' state. ``` The reason git gets nervous is because you have placed `HEAD` in the middle of the DAG somewhere and you are not on the tip of any branch. As a result, if you were to make changes now and commit them, such a commit would be accessible by its checksum ID only. Remember that git uses labels internally to keep track of where things are, and each branch has a label that points to its most recent commit, in other words, the tip of the branch. You've now moved the `HEAD` label to a commit that does not have any branch label on it. So from git's point of view, you are not currently on a branch and if you were to make changes here they would be added to the DAG but without a label you could only ever reference them by their internal ID. To get out of this situation, you have two options. You can checkout any other branch which will move the `HEAD` label to the tip of that branch, which in turn means that `HEAD` is no longer detached. Or you can create a branch where you are right now, which will mean `HEAD` will now be on the tip of your newly created branch and thus will also no longer be detached. Whichever option you choose, a _detached HEAD_ state is not a good place to make changes. So if you want to just have a look around without changing anything, that's fine. But if you plan to add commits, you should really start by creating a branch first. ## Chapter 40: References in git A _reference_ in git is an umbrella term for anything that points to a given commit in the DAG. A reference can be a commit ID, a label or tag like `HEAD`, a branch name, or a remote. In most git commands, these various types of references are interchangeable. For example, when you use `git checkout` it expects a reference. So it can be any of these. It is not super important to know exactly what a reference is. But it can help you understand why git commands can take various types of input. It's because under the hood, they are all references. ## Chapter 41: Objects in git We've talked about the DAG in git, and how various commits are linked together. But we have not really delved into how git keeps track of things under the hood. In other words, how is all this information stored in the `.git` folder. Git provides a _content-addressable filesystem_. Which is a fancy word to throw around, but really just means that git acts as a big key-value store. You give it something to store, and it will hand you back a key to retrieve it with. This key is, of course, the checksum that we've mentioned earlier. We already covered that git uses this to store commits. We've also mentioned that git really only knows how to do a few tricks, but manages to combine them in various ways to provide a lot of functionality. The same is true here. The way commits are stored is not unique to commits. Git has 4 types of objects that it stores, and commits are only one of them. So called **blob-objects** are what store your actual data. If you add a file to git, the contents of that file will go in a _blob-object_. You give it the file contents, you get an ID in return. Done. This has the nice side effect that no two identical files will ever be stored in git. Let's say you keep your documentation in git -- which would be a smart thing to do -- and you use an image to clarify something on page A and the same image on page B. Even if you stored that image on-disk in two locations in your repository, git will only create one blob-object for it in the `.git` folder because the checksum of both images are the same, so there is only one key and only one value to retrieve for that key. Another object type in git are called **tree-objects**. A tree object addresses some shortcomings of the blob-objects. For example, we need to be able to store the filename somehow. Which is different from the file contents. And if we add a bunch of files together, we need to keep track that these files belong together. This sort of information is stored in _tree-objects_ in git. Then, there are the **commit-objects**. These are, of course, the objects we've been paying most attention to so far. In chapter 6 we explained that a commit object holds the data itself, the author, the date, the log message, and the checksum of the parent commit. Well, when we wrote _the commit data itself_ what that means under the hood is the ID of the _tree-object_ that holds the information about the data stored for this commit. The last type of object that git uses are for _annotated tags_. We've seen before how you can attach your own label to any commit with the `git tag` command. What we didn't get into is that you can add more info to the tag such as a message or you can even cryptographically sign a tag. Git needs a place to store this additional information, and so there is an object type to specifically deal with this. However, the use of `git tag` that we demonstrated will only create the label, and not create an object in git. ## Chapter 42: Commit message structure Writing good commit messages in git is a bit of an art. Some people have strong feelings on the matter, others see the commit messages as a nuisance and put in whatever just to make git happy. There have been efforts to create rules for commit messages that people should adhere to, such as the _Conventional Commits_ specification which you can read about at [conventionalcommits.org](https://www.conventionalcommits.org/). But at the end of the day, how you write your commit messages depends a lot on context. Is it a project you work on alone, or do you work together with others, and so on. However one thing that you should know about is how git treats different parts of the commit message differently, and how that impacts how you should write the commit message. You see, each time git launches an editor to write the commit message, that message is potentially made up of two parts: - The first line, which should be followed by an empty line - The rest of the message The first line should be short, and should hold a terse summary of why the commit was made. Then, you can add more info by leaving a blank line, followed by a lengthy message going into the fine details of the how/what/why of your commit. You don't have to do this. If one line is all you need, then that's fine. Just keep in mind that if you want to provide more info, you should split it up between a terse summary on the first line, and the rest of the info starting from line 3. ## Chapter 43: Dealing with merge conflicts While the _detached HEAD_ state you might find yourself in may sound scary, the situation that most git users would like to avoid is having to deal with a merge conflict. A merge conflict occurs when we attempt to merge branches, and git is not able to figure out how to merge all changes without losing some information. This is almost always the result of two commits in different branches changing not only the same file, but the same line within that file. At this point, git will attempt to do as much as it can to resolve the situation. But for those files where it can't figure out what to do on its own, it will ask us to sort it out ourselves. Which really isn't that big a deal for us because by now we have a good idea of how git works. But for the casual user who suddenly find themselves with a half-completed merge, merge conflicts can be rather dispiriting. On the plus side, learning to deal with and solve merge conflicts will cement your reputation as a git guru. So for practice, let's cause a merge conflict and see how we can resolve it. For this scenario, we are going to create a new git repository (with `git init`) and quickly create a merge conflict by: - Adding and committing a `conflict.md` file that holds: ```md title="confict.md" I will cause murge conflict I will not ``` - Creating a new branch named `notmain` - On the `notmain` branch, update the `conflict.md` file so that it holds: ```md title="confict.md" I will cause a merge conflict I will not ``` and then add and commit that change. - On the `main` branch, update the `conflict.md` file so that it holds: ```md title="confict.md" I will cause merge conflict I will not ``` and then add and commit that change too. - Finally, we switch to the `main` branch and attempt to merge with the `git merge notmain` command. Spoiler alert, it won't work. ```bash git merge notmain Auto-merging conflict.md CONFLICT (content): Merge conflict in conflict.md Automatic merge failed; fix conflicts and then commit the result. ``` Git is asking us to _fix conflicts and then commit the result_. If we run `git status` at this point, it will also say that there is a merge conflict and ask us to either fix the conflict and then run `git commit` or abort the merge with `git merge --abort`. ```bash git status On branch main You have unmerged paths. (fix conflicts and run "git commit") (use "git merge --abort" to abort the merge) Unmerged paths: (use "git add ..." to mark resolution) both modified: conflict.md no changes added to commit (use "git add" and/or "git commit -a") ``` So let's first take a moment to appreciate that while we now have a merge conflict on our hands, git already told us that we can simply back out of this situation by aborting the merge with `git merge --abort`. Which is nice of git, but we're not scared by a little merge conflict. Instead, let's fix the conflict. And to do so, we essentially have 3 options: - Option 1 is to pick the changes in the `main` branch as the winner, and discard the changes in the `notmain` branch. - Option 2 is the opposite of that: pick the changes in the `notmain` branch as the winner, and discard the changes in the `main` branch. - Option 3 is to investigate the conflict in more detail and demonstrate that for now we are still smarter than git and can find a solution that keeps all changes. The third option is almost always what you want, but if you are certain the changes in one branch can be discarded, you can just load the latest version of the file from the winning branch. By now you should know that `git checkout` is the only command to get data out of the DAG, so let's say we want the `notmain` branch to win, then we could solve this merge conflict with: ```bash git checkout main conflict.md ``` Or, if you wanted to keep the version in the `notmain` branch, you could run ```bash git checkout notmain conflict.md ``` instead. If we run `git status` after this, git will inform us that all conflicts are resolved. However, we still need to commit to finalize the merge. That's because we're doing a 3-way commit here, and so the merge is not complete until the merge commit happens. ```bash git status On branch main All conflicts fixed but you are still merging. (use "git commit" to conclude merge) ``` However, we took the easy way out here, and that's no fun. So let's abort our efforts to merge this here with ```bash git merge --abort ``` And just like that we are back to the point before we triggered our merge conflict. So let's simply try to merge again, which will land us in the same situation. To do so, we run ```bash git merge notmain Auto-merging conflict.md CONFLICT (content): Merge conflict in conflict.md Automatic merge failed; fix conflicts and then commit the result. ``` Sure enough, git reliably drops us back into the same merge conflict state we were before. This time around, let's look into the contents of `conflict.md`: ```md title=conflict.md <<<<<<< HEAD I will cause merge conflict ======= I will cause a merge conflict > > > > > > > notmain > > > > > > > I will not ``` If you do this, you will see that git has included the changes from both branches into the file, and denoted with lesser than (`<<<<<<<`), equal (`=======`) and greater than (`>>>>>>>`) symbols which line belongs to which branch. In our case, only the first line has a conflict, the rest of the file (which is only one line) does not. However, it's possible that there are multiple conflicts in a single file, so you should search the file for `=======` just to make sure. For each merge conflict, you need to make a choice of how to reconcile the changes from the different branches, and then update the file making sure to remove the extra markup git added. We also are not limited to picking one option over the other. We are using the same git functionality as before, so anything goes. In our example, let's update the first line to read `I am no longer a merge conflict` and remove the markup git added. ```md title=conflict.md I am no longer a merge conflict I will not ``` After you've resolved all conflicts -- whether it is through looking into the file and implementing your own solution, or by picking one branch's version over the other -- you should add the file or files that had the merge conflict in them, and then commit them. In our example, we run ```bash git add conflict git commit ``` And with that, we have successfully resolved the merge conflict, and merged the two branches. Git put everything that it could figure out on its own in the staging area, asking us only to take care of those files where it could not figure out how to merge them. So we updated that file, and after adding it to the staging area, we could complete the merge by doing the merge commit. Nothing we used is new to us. If you know how git does what it does, merge conflicts should no longer scare you. ## Chapter 44: Tips and best practices Now that you know everything you need to know about git and then some, here are some tips and best practices to not only make working with git more enjoyable, but also facilitate working with others. - Branches are free. Use them. - Make many small commits, rather than 1 massive commit. - Adding things to the staging area is a good way to do a _soft-save_ before you're ready to commit. - Write commit messages that focus on why you did something, not what you did. - Update your prompt to show what branch you are on. You can [download a script for this from the git project's repository on GitHub](https://github.com/git/git/blob/master/contrib/completion/git-prompt.sh). - Use `.gitignore` to keep files that should not be subject to version control out of your repository. - Use the inline documentation. There's loads of it. Use `git command --help` to access it. ## Chapter 45: Where to go from here We've covered a lot of ground and hopefully you'll walk away from this series with a good understanding of git's basic principles, as well as some hands-on examples and commands to guide you through the most common use-cases. That being said, there's a lot to git that we did not cover yet, or that we did not cover in detail. Things like rebasing or reflog, cherry-picking or the infamous octopus merge. Thankfully, git has a ton of inline documentation, and there's a wealth of information out there on the internet for when you want to learn about the more advanced corners of git. There's also a bunch of GUI tools that can help you visualize the git DAG such as [gitx](https://github.com/gitx/gitx/releases) or [gitkraken](https://www.gitkraken.com/jc). My goal throughout this series was not to provide you with the ultimate git training. Instead, I wanted to show you that when it comes to git, there's nothing to be afraid of. If you've made it this far, I am cautiously optimistic that it worked. Which is great news, because I really believe that my life is better because of git. So hopefully after all this, your life will be better too.