Quantcast
Channel: OpenPanel » c++
Viewing all articles
Browse latest Browse all 8

Strings in Grace

$
0
0

The lack of a ‘native’ string type is one of the major gripes people have had with C and C++. I’ve basically grown up on C and I’ve walked that line. At the same time, when I looked at the string abstractions as they were implemented in class libraries and other languages, I realized that some things that were easy to do if you treated strings as arrays of characters in the way C did, were harder to accomplish if you treated them as types. To wit:

const char *hwld = "Hello, world.";

/* Insanely quick copy of a substring */
const char *wld = hwld + 7;

The C approach — having this mutable array as a working area — makes it relatively easy to hack and split strings into smaller parts and work with them as first class strings that can be used as arguments for other functions. All without ever copying a byte to a new object.

The Grace string class has grown a lot of nice features that make it easier to forget the feeling of loss that accompanies the sudden inability to cut up strings by spraypainting them with NUL bytes. With methods like string::left, string::mid, string::cutat and the strutil::split family of functions, a lot of splicing joy can be had for all.

The storage behind a string object uses a reference count and copy-on-write to deal with assignments and mutations. Assignments can be a real pain in the context of strings, which is why a number of languages recognize the concept of immutable strings. These languages make you go on string building expeditions to dynamically compose new strings, but they do this to allow for assignment without copying; if both the sending and the receiving object guarantee not to alter the data, it is safe to let them point to the same memory location. Copy-on-write satisfies this same ‘zero copy’ approach to assignment, but allows strings to remain mutable. The cost of copying is at best prevented, at worst delayed until the moment of mutation.

With all the hacking, cutting and copying kung-fu under its belt, Grace strings made me stop missing the C array approach, except for one thing: Each time you split up a string god killed a kitten data got copied. This week, the string class got a major overhaul: Next to the pointer to the copy-on-write back-end memory that each individual string carries, it now keeps track of an offset. So now, when you do this:

string s1 = "Hello, world."
string s2 = s1.cutat (' ');

what actually happens is this:

That means even less copying. Some unexpected things now trigger a copy-on-write, though. The most sad one is conversion to a C-compatible string. Since C expects a NUL character, a sub-string will have to mutate, triggering a copy-on-write.


Viewing all articles
Browse latest Browse all 8

Trending Articles