Strings in Go's runtime
Motivation
Consider this an explainer to Rob Pike’s blog post on Go’s strings. Written with Go programmers in mind.
In the blog post, he says without much evidence that Go’s strings are implemented as slices. I guess it wasn’t oversight on his part but proving that point wasn’t the goal of the blog post. Secondly anyone who wanted to find out for themselves had Go’s code at their disposal to explore. That’s what I did. And so in this post I try my best to explain how close Go’s implementation of strings is to its implementation of slices.
I don’t discuss string encoding. That’s not in the runtime, the thing I explore. But I discuss
common string operations that mimic slices such as length of a string (len("go")
), concatenation
("go" + "lang"
), indexing ("golang"[0]
), and slicing ("golang"[0:2]
). To be fair, indexing and
slices are operations in their own rights, which means their availability on strings has nothing (or
very little) to do with the nature of strings. This is not entirely true, but please accept it, as
the truth will take us into the Go compiler through the underlying types of so-called basic types
and back. Secondly, I don’t write this post under oath.
The Nature of Strings
I’m yet to come across a programming language where strings have different underlying memory
structure: the contiguous slots of memory. What this means is that the bytes of a string sit next to
each other in memory, with nothing in between them. That is, if you used the famous 12-byte string,
hello, world
in your program, and got the chance to inspect them in memory, you’d find them
sitting in the same row, each byte (or character) followed immediately by the other, with no empty
spaces or foreign bytes in between them. As far as I know, Go doesn’t deviate from this wisdom.
But this is all talk of memory, the physical stuff. When we’ve reached this point all things are the same and the differences between programming languages are effectively erased. So let’s back up one level into runtimes, and here we see how the different languages go about their businesses, and this is where we find the details of how Go implements its string data type. Luckily a significant part of the Go runtime is written in Go, with, God bless the authors, extensive comments explaining both the obvious and not so obvious implementation details. The runtime’s implementation of string can be found here on GitHub as at the time of writing. Let’s walk about it.
Go’s String
In the Go runtime, a string is a stringStruct
type:
It consists of str
, a pointer to the block of memory where the actual bytes are located and len
,
which is the length of the string. Since strings are immutable, these never change.
Creating a New String
The function tasked with making new strings in the runtime is called rawstring
. Here’s how it’s
implemented as at the time of writing (comments in code are mine):
rawstring
returns a string and a byte slice where the actual bytes of the string should be stored,
and it is this byte slice that will be used in all operations on the string. We can safely call them
data ([]byte
) and metadata (stringStruct
).
But this is not the end of it. And perhaps this is the only time you have access to the real
non-zeroed byte slice behind the string. In fact, the comment on rawstring
instructs the caller to
only use the byte slice once (to write contents of the string) and then drop it. The rest of the
time, the string struct should be good enough.
Knowing this, let’s look at how some common string operations are implemented. It will also make sense to us why good old concatenation isn’t a recommended way to build large strings.
Common String Operations
Length (len("go")
)
Since strings are immutable, the length of a string stays constant. Even better we know it by the
time we’re storing the string, and that’s what we store in stringStruct
’s len
field. Thus,
requests for the length of a string are take the same amount of time regardless of the size of the
string. In Big-O terms, it’s a constant time operation.
Concatenation ("go" + "lang"
)
It’s a simple process. Go first determines the length of the resultant string by summing lengths of all the strings you want to concatenate. Then it requests that size of contiguous memory block. There’s an optimization check and more importantly a safety check. The safety check ensures that the resultant string won’t have a length that exceeds Go’s maximum integer value.
Then the next step of the process begins. The bytes of the individual strings are copied one after
another into the new string. That is, bytes in different locations in memory are copied into a new
location. It’s unpleasant work, and should be avoided where possible. Hence the recommendation to
use strings.Builder
instead since it minimizes memory copying. It’s the closest
we’ll come to mutable strings.
Indexing ("golang"[0]
)
Go’s indexing operator is written as [index]
, where index is an integer. As at the time of writing
it was available on arrays, slices, strings, and some pointers.
What arrays, slices, and strings have a common underlying type. In physical memory terms it’s a
contiguous block of memory. In Go parlance, an array. For strings this is the byte slice that was
returned by rawstring
, this is where the string’s contents are stored. And that’s what we index
on. Goes without saying that the “some pointers” I mentioned above as compatible with the indexing
operator are those with array underlying type.
Note that it’s the same syntax on maps but with different behavior. With maps the type of the key determines the type of the value in between the brackets.
Slicing ("golang"[0:2]
)
The slice operator has the same compatibility as the index operator: operand must have an array underlying type. Thus it works on the same set of types: arrays, slices, strings, and some pointers.
On strings there’s a caveat. The full slice operator is [low:high:capacity]
. In one go it allows
you to create a slice and set the capacity of its underlying array. But remember strings are
immutable and so there will never be a need to have an underlying array bigger that what’s really
needed for the string’s contents. Hence the 3-slice operator doesn’t exist for strings.
The strings
and strconv
packages
Go provides the strings
and strconv
packages for dealing
with strings. I already mentioned the more efficient Builder
for building large strings. It’s
provided by the strings
package. There’s other niceties in there. Together they provide tuned
functions for string transformations, conversions, comparisons, search and replace, etc. Check them
out before you build your own.
Source of Confusion
cap(slice)
vs cap(string)
The built-in function cap
returns the capacity of a slice’s underlying array. Throughout the life
of a slice the underlying array’s capacity is allowed to keep changing. Usually it grows to
accommodate new elements. If a string is a slice, why doesn’t it respond to a cap
inquiry? the
question goes. The simple answer: Go strings are immutable. That is, it will never grow or shrink in
size, which in turn means that if cap
were implemented it would be the same as len
.
Got comments or corrections for factual errors? There’s a Hacker News thread for that.