I believe many software engineers have had the urge to build their own text editor at some point in their career, and here’s mine. I have never solo’ed a project this large before, and I have learned a lot in the process. This series of blog posts is to document the challenges and design decisions I made while developing Vvvim.

Source code available on GitHub.

Table of Contents

  1. Background
  2. Frontend-Backend Separation
  3. Designing a Buffer Data Structure – Nested Gap Buffer
  4. Wrestling with Bazel
    1. Including External Dependency from GitHub
    2. Including Local Dependency
  5. Epilogue
  6. Takeaways

Background

Recently I’ve been picking up C++ as I am joining a new team primarily using C++ in a month. In the past, I coded in Java and Python professionally, and C is my “mother tongue”. I learned a bit of C++ in my freshman and sophomore year, so I know the basics of STLs, operator overloading, etc. But despite all this, any serious C++ codebase still felt like navigating through an alien library.

The idea started when I saw the tutorial Build Your Own Text Editor which is a step-by-step tutorial that teaches you how to build a text editor in about 1,000 lines of C without dependency.I followed along while trying to make the code as idiomatic C++ as possible. But after a while of wrestling with weird terminal issues and questionable design decisions, I gave up and gave ncurses a shot. It turns out much better than what I expected, for example, the code for entering raw with termios.h (original tutorial):

if (tcgetattr(STDIN_FILENO, &orig_termios) == -1) die("tcgetattr");
atexit(disableRawMode);
struct termios raw = orig_termios;
raw.c_iflag &= ~(BRKINT | ICRNL | INPCK | ISTRIP | IXON);
raw.c_oflag &= ~(OPOST);
raw.c_cflag |= (CS8);
raw.c_lflag &= ~(ECHO | ICANON | IEXTEN | ISIG);
raw.c_cc[VMIN] = 0;
raw.c_cc[VTIME] = 1;
if (tcsetattr(STDIN_FILENO, TCSAFLUSH, &raw) == -1) die("tcsetattr");

The same code with ncurses:

initscr();
raw();                // Disable line buffering, also intercepting C-z etc
noecho();             // Don't echo on getch()
nonl();               // Capture C-m = 13 instead of 10
keypad(stdscr, true); // Handle arrow keys, Fn keys and some more

Also displaying things with move() and printw() is significantly easier than keeping track of an append buffer, while doing all the cursor management.

So the journey begins.

Frontend-Backend Separation

Switching to ncurses only took a surprisingly short amount of time, and it was time for making my first design decision: how should I separate editor and terminal logic? Ideally, I would like the frontend and backend to be separate so that I could build a Qt UI or even a web UI in the future (probably not but one can dream). At this point, I can:

  • Option A (Template pattern): Define an abstract Editor class with concrete methods for editor logic, and virtual methods for handling UI (e.g. ReadKey(), MoveCursor()) Then derive a TerminalEditor class which implements these UI-related methods.

  • Option B (Composition over inheritance): Define a UserInterface interface (sorry) which defines all the UI-related functions and let TerminalUserInterface implement it. Then define a concrete Editor class with a UserInterface ui_ member variable.

I initially went with A, but later decided B is the better design, for a few reasons:

  • Editor “has-a” user interface, it feels more natural this way

  • I can potentially create a mock of user interface, and test editor logic with it

  • It forces me to separate editor logic from frontend logic, e.g. with option A I can totally conveniently handle cursor movements in the derived class, but that leads to a lot of duplicated code when there are multiple frontends

But option B also has its issues: the UI needs internal states of the editor at unpredictable times. For example, Editor calls ui_->ModeSwitched(new_mode) after a mode switch to notify the frontend so it can:

  1. Redraw the status bar
  2. Change cursor shape
  3. Or whatever it needs to do

But redrawing the status bar also needs the current cursor position, so one can either:

  1. Attach the entire editor state to each ModeSwitched() call, or
  2. Let the frontend store a copy of all the editor states

Both approaches sound unreasonable and involve unnecessary duplication, so I took the easy way out: letting UserInterface store a pointer to Editor, and making UserInterface a friend of Editor. However this creates a circular dependency between Editor and UserInterface, but thanks to StackOverflow, two forward declarations and some Bazel wrestling later, I got it working.

Designing a Buffer Data Structure – Nested Gap Buffer

Next up is the buffer. Just for fun, I decided not to consult any external source on how to build a buffer data structure for text editing and treated this like a whiteboard interview. In hindsight, the data structure that I came up with looks kind of like a Gap Buffer, except it is a normal gap buffer of characters inside a gap buffer of lines. I named it Nested Gap Buffer.

For comparison, a normal gap buffer looks like:

// private variables
std::string head_;
std::string tail_;

std::string GapBuffer::ToString() {
    return head_ + std::string(tail_.rbegin(), tail_.rend()); // head_ + reverse(tail_)
}

A nested gap buffer looks like:

std::vector<std::string> lines_head_;
std::string head_;
std::string tail_;
std::vector<std::string> lines_tail_;

And the ToString would be (the C++ is too long, allow me to use Haskell for the demo):

intercalate "\n" (lines_head_ ++ [head_ ++ reverse(tail_)] ++ reverse(lines_tail_))

The logic of all other operations are quite straight-forward, just moving strings and characters around to move the gap either vertically or horizontally.

The reason to use nested gap buffer over gap buffer is that, nested gap buffer also allows efficient operations on lines (e.g. random access, insert, delete). With std::move and std::move_iterator minimizing the cost of throwing strings around, the complexity of all operations are negligible:

  • Both Insert() and Delete() have O(1) time complexity
  • GetLineView(int line_number) runs in O(1) time if not getting the current line, which needs a merge which has a time complexity of O(c), where c = number of character in line.
  • MoveInsertionPointTo(int row, int col) runs in O(r + c) time where r is the number of rows

Wrestling with Bazel

Then I need to test the buffer that I just implemented. Step one: get GoogleTest. Here comes my greatest fear – adding external dependency in Bazel when you don’t have a Googler badge on your belt or an infrastructure team behind your back.

<rant>

🎵 Tell me true, tell me why, is http_archive deprecated? Is it for this that Inbox died? 🎵

Deprecated. load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive") for a drop-in replacement.

What’s the recommended way to do this now? Even the official tutorial for including GoogleTest suggests using the “drop-in replacement”. And why does your recently open-sourced project still use it? And why do I need to write a gtest.BUILD telling Bazel how to build GoogleTest when GoogleTest itself uses Bazel?

</rant>

But anyways, it seems like using http_archive is still the easiest and usual way to include external C++ dependencies. Here’s what I ended up using:

Including External Dependency from GitHub

In WORKSPACE:

http_archive(
     name = "com_google_googletest",
     urls = ["https://github.com/google/googletest/archive/b6cd405286ed8635ece71c72f118e659f4ade3fb.zip"],  # 2019-01-07
     strip_prefix = "googletest-b6cd405286ed8635ece71c72f118e659f4ade3fb",
     sha256 = "ff7a82736e158c077e76188232eac77913a15dac0b22508c390ab3f88e6d6d86",
)

http_archive(
    name = "com_google_absl",
    strip_prefix = "abseil-cpp-master",
    urls = ["https://github.com/abseil/abseil-cpp/archive/master.zip"],
)

Then depend on external libraries:

cc_library(
    name = "nested_gap_buffer",
    srcs = ["nested_gap_buffer.cc"],
    hdrs = ["nested_gap_buffer.h"],
    deps = [
        "@com_google_absl//absl/strings",
    ]
)

Including Local Dependency

And there’s another dependency – ncurses. Here’s how I included it:

In WORKSPACE:

new_local_repository(
    name = "ncurses",
    path = "/usr/local/opt/ncurses", # install with homebrew
    build_file = "ncurses.BUILD",
)

In a new file ncurses.BUILD beside WORKSPACE:

cc_library(
   name = "main",
   srcs = ["lib/libncurses.a"],
   visibility = ["//visibility:public"],
)

Then depend on ncurses with "@ncurses//:main".

Epilogue

After testing the buffer and hooking it up to the editor, I guess this is the point where I should call it an MVP, even though it supports nothing other than moving around with hjkl and entering/exiting insert mode with i and C-c, but it worked.

In the next part, I’ll talk about building a key mapping layer, and some other challenges.

Takeaways

  • Composition over inheritance
  • How to wrestle with Bazel
  • C++ magic like std::move and std::move_iterator

Next episode.