Ivor Horton and Peter Van WeertBeginning C++17https://doi.org/10.1007/978-1-4842-3366-5_7

7. Working with Strings

Ivor Horton¹ and Peter Van Weert²

(1)

Stratford-upon-Avon, Warwickshire, UK

(2)

Kessel-Lo, Belgium

This chapter is about handling textual data much more effectively and safely than the mechanism provided by a C-style string stored in an array of char elements:

In this chapter, you’ll learn

How to create variables of type string
What operations are available with objects of type string and how you use them
How to chain together various bits and pieces to form one single string
How you can search a string for a specific character or a substring
How you can modify an existing string
How to convert a string such as "3.1415" into the corresponding number
How to use streams and stream manipulators for advanced string formatting
How you can work with strings containing Unicode characters
What a raw string literal is

A Better Class of String

You’ve seen how you can use an array of elements of type char to store a null-terminated (C-style) string. The cstring header provides a wide range of functions for working with C-style strings including capabilities for joining strings, searching a string, and comparing strings. All these operations depend on the null character being present to mark the end of a string. If it is missing or gets overwritten, many of these functions will march happily through memory beyond the end of a string until a null character is found at some point or some catastrophe stops the process. Even if your process survives, it often results in memory being arbitrarily overwritten. And once that happens, all bets are off! Using C-style strings is therefore inherently unsafe and represents a serious security risk. Fortunately, there’s a better alternative.

The string header of the C++ Standard Library defines the std::string type, which is much easier to use than a null-terminated string. The string type is defined by a class (or to be more precise, a class template), so it isn’t one of the fundamental types. Type string is a compound type, which is a type that’s a composite of several data items that are ultimately defined in terms of fundamental types of data. Next to the characters that make up the string it represents, a string object contains other data as well, such as number of characters in the string. Because the string type is defined in the string header, you must include this header when you’re using string objects. The string type name is defined within the std namespace, so you’d need a using declaration to use the type name in its unqualified form. We’ll start by explaining how you create string objects.

Defining string Objects

An object of type string contains a sequence of characters of type char, which can be empty. This statement defines a variable of type string that contains an empty string:

std::string empty; // An empty string

This statement defines a string object that you refer to using the name empty. In this case, empty contains a string that has no characters and so it has zero length.

You can initialize a string object with a string literal when you define it:

std::string proverb {"Many a mickle makes a muckle."};

proverb is a string object that contains a copy of the string literal shown in the initializer. Internally, the character array encapsulated by a string object is always terminated by a null character as well. This is done to assure compatibility with the numerous existing functions that expect C-style strings.

Note

You can convert a std::string object to a C-style string using two similar methods. The first is by calling its c_str() member function (short for C-string):

const char* proverb_c_str = proverb.c_str();

This conversion results in a C-string of type const char*. Because it’s const, this pointer cannot be used to modify the characters of the string, only to access them. Your second option is the string’s data() function, which starting from C++17 evaluates to a non-const char* pointer¹ (prior to C++17, data() resulted in a const char* pointer as well):

char* proberb_data = proverb.data();

You should convert to C-style strings only when calling legacy C-style functions. In your own code, we of course recommend you consistently use std::string objects because these are far safer and more convenient than plain char arrays.

All std::string functions, though, are defined in such a way that you normally never need to worry about the terminating null character anymore. For instance, you can obtain the length of the string for a string object using its length() function, which takes no arguments. This length never includes the string termination character:

std::cout << proverb.length(); // Outputs 29

This statement calls the length() function for the proverb object and outputs the value it returns to cout. The record of the string length is guaranteed to be maintained by the object itself. That is, to find out the length of the encapsulated string, the string object does not have to traverse the entire string looking for the terminating null character. When you append one or more characters, the length is increased automatically by the appropriate amount and decreased if you remove characters.

There are some other possibilities for initializing a string object. You can use an initial sequence from a string literal, for instance:

std::string part_literal { "Least said soonest mended.", 5 }; // "Least"

The second initializer in the list specifies the length of the sequence from the first initializer to be used to initialize the part_literal object.

You can’t initialize a string object with a single character between single quotes—you must use a string literal between double quotes, even when it’s just one character. However, you can initialize a string with any number of instances of a given character. You can define and initialize a sleepy time string object like this:

std::string sleeping(6, 'z');

The string object, sleeping, will contain "zzzzzz". The string length will be 6. If you want to define a string object that’s more suited to a light sleeper, you could write this:

std::string light_sleeper(1, 'z');

This initializes light_sleeper with the string literal "z".

Caution

To initialize a string with repeated character values, you must not use curly braces like this:

std::string sleeping{6, 'z'};

This curly braces syntax does compile but for sure won’t do what you expect. In our example, the literal 6 would be interpreted as the code for a letter character, meaning sleeping would be initialized to some obscure two-letter word instead of the intended "zzzzzz". If you recall, you encountered an analogous quirk of C++’s near-uniform initialization syntax already before with std::vector<> in the previous chapter.

A further option is to use an existing string object to provide the initial value. Given that you’ve defined proverb previously, you can define another object based on that:

std::string sentence {proverb};

The sentence object will be initialized with the string literal that proverb contains, so it too will contain "Many a mickle makes a muckle." and have a length of 29.

You can reference characters within a string object using an index value starting from 0, just like an array. You can use a pair of index values to identify part of an existing string and use that to initialize a new string object. Here’s an example:

std::string phrase {proverb, 0, 13}; // Initialize with 13 characters starting at index 0

Figure 7-1 illustrates this process.

../images/326945_5_En_7_Chapter/326945_5_En_7_Fig1_HTML.gif — Figure 7-1.
Creating a new string from part of an existing string

The first element in the braced initializer is the source of the initializing string. The second is the index of the character in proverb that begins the initializing substring, and the third initializer in the list is the number of characters in the substring. Thus, phrase will contain "Many a mickle".

Caution

The third entry in the {proverb, 0, 13} initializer, 13, is the length of the substring, not an index, indicating the last (or one past the last) character of the substring. So, to extract, for instance, the substring "mickle", you should use the initializer {proverb, 7, 6} and not {proverb, 7, 13}. This is a common source of confusion and bugs, especially for beginning C++ developers who have past experience in languages such as JavaScript or Java where substrings are commonly designated using start and end indices.

To show which substring is created, you can insert the phrase object in the output stream, cout:

std::cout << phrase << std::endl;

Thus, you can output string objects just like C-style strings. Extraction from cin is also supported for string objects:

std::string name;

std::cout << "enter your name: ";

std::cin >> name; // Pressing Enter ends input

This reads characters up to the first whitespace character, which ends the input process. Whatever was read is stored in the string object, name. You cannot enter text with embedded spaces with this process. Of course, reading entire phrases complete with spaces is possible as well, just not with >>. We’ll explain how you do this later.

To summarize, we have described six options for defining and initializing a string object; the following comments identify the initializing string in each case:

No initializer (or empty braces, {}):

std::string empty; // The string ""
An initializer containing a string literal:

std::string proverb{ "Many a mickle makes a muckle." }; // The given literal
An initializer containing an existing string object:

std::string sentence{ proverb }; // Duplicates proverb
An initializer containing a string literal followed by the length of the sequence in the literal to be used to initialize the string object:

std::string part_literal{ "Least said soonest mended.", 5 }; // "Least"
An initializer containing a repeat count followed by the character literal that is to be repeated in the string that initializes the string object (mind the round parentheses!):

std::string open_wide(5, 'a'); // "aaaaa"
An initializer containing an existing string object, an index specifying the start of the substring, and the length of the substring:

std::string phrase{proverb, 5, 8}; // "a mickle"

Operations with String Objects

Many operations with string objects are supported. Perhaps the simplest is assignment. You can assign a string literal or another string object to a string object. Here’s an example:

std::string adjective {"hornswoggling"}; // Defines adjective

std::string word {"rubbish"}; // Defines word

word = adjective; // Modifies word

adjective = "twotiming"; // Modifies adjective

The third statement assigns the value of adjective, which is "hornswoggling", to word, so "rubbish" is replaced. The last statement assigns the literal "twotiming" to adjective, so the original value "hornswoggling" is replaced. Thus, after executing these statements, word will contain "hornswoggling", and adjective will contain "twotiming".

Concatenating Strings

You can join strings using the addition operator; the technical term for this is concatenation. You can concatenate the objects defined earlier like this:

std::string description {adjective + " " + word + " whippersnapper"};

After executing this statement, the description object will contain the string "twotiming hornswoggling whippersnapper". You can see that you can concatenate string literals with string objects using the + operator. This is because the + operator has been redefined to have a special meaning with string objects. When one operand is a string object and the other operand is either another string object or a string literal, the result of the + operation is a new string object containing the two strings joined together.

Note that you can’t concatenate two string literals using the + operator. One of the two operands of the + operator must always be an object of type string. The following statement, for example, won’t compile:

std::string description {" whippersnapper" + " " + word}; // Wrong!!

The problem is that the compiler will try to evaluate the initializer value as follows:

std::string description {(" whippersnapper" + " ") + word}; // Wrong!!

In other words, the first expression that it evaluates is (" whippersnapper" + " "), and the + operator doesn’t work with both operands as two string literals. The good news is that you have at least five ways around this:

Naturally, you can write the first two string literals as a single string literal: {" whippersnapper " + word}.
You can omit the + between the two literals: {" whippersnapper" " " + word}. Two or more string literals in sequence will be concatenated into a single literal by the compiler.
You can introduce parentheses: {"whippersnapper" + (" " + word)}. The expression between parentheses that joins " " with word is then evaluated first to produce a string object, which can subsequently be joined to the first literal using the + operator.
You can turn one or both of the literals into a std::string object using the familiar initialization syntax: {std::string{" whippersnapper"} + " " + word}.
You can turn one or both of the literals into a std::string object by adding the suffix s to the literal, such as in {" whippersnapper"s + " " + word}. For this to work, you first have to add a using namespace std::string_literals; directive. You can add this directive either at the beginning of your source file or locally inside your function. Once this directive is in scope, appending the letter s to a string literal turns it into a std::string object, much like, for instance, adding u to an integer literal creates an unsigned integer.

That’s enough theory for the moment. It’s time for a bit of practice. This program reads your first and second names from the keyboard:

// Ex7_01.cpp

// Concatenating strings

#include <iostream>

#include <string>

int main()

{

std::string first; // Stores the first name

std::string second; // Stores the second name

std::cout << "Enter your first name: ";

std::cin >> first; // Read first name

std::cout << "Enter your second name: ";

std::cin >> second; // Read second name

std::string sentence {"Your full name is "}; // Create basic sentence

sentence += first + " " + second + "."; // Augment with names

std::cout << sentence << std::endl; // Output the sentence

std::cout << "The string contains " // Output its length

<< sentence.length() << " characters." << std::endl;

}

Here’s some sample output:

Enter your first name: Phil

Enter your second name: McCavity

Your full name is Phil McCavity.

The string contains 32 characters.

After defining two empty string objects, first and second, the program prompts for input of a first name and then a second name. The input operations will read anything up to the first whitespace character. So, if your name consists of multiple parts, say Van Weert, this program won’t let you enter it. If you do enter, for instance, Van Weert for the second name, the >> operator will only extract the Van part from the stream. You’ll learn how you can read a string that includes whitespace later in this chapter.

After getting the names, you create another string object that is initialized with a string literal. The sentence object is concatenated with the string object that results from the right operand of the += assignment operator:

sentence += first + " " + second + "."; // Augment with names

The right operand concatenates first with the literal " ", then second is appended to that result, and finally the literal "." is appended to that to produce the final result that is concatenated with the left operand of the += operator. This statement demonstrates that the += operator also works with objects of type string in a similar way to the basic types. The statement is equivalent to this statement:

sentence = sentence + (first + " " + second + "."); // Augment with names

Finally, the program uses the stream insertion operator to output the contents of sentence and the length of the string it contains.

Tip

The append() function of a std::string object is an alternative for the += operator. Using this, you could write the previous example as follows:

sentence.append(first).append(" ").append(second).append(".");

In its basic form, append() is not all that interesting—unless you enjoy typing, that is, or if the + key on your keyboard is broken. But of course there’s more to it than that. The append() function is more flexible than += because it allows, for instance, the concatenation of substrings, or repeated characters:

std::string compliment("∼∼∼ What a beautiful name... ∼∼∼");

sentence.append(compliment, 3, 22); // Appends " What a beautiful name"

sentence.append(3, '!'); // Appends "!!!"

Concatenating Strings and Characters

Next to two string objects, or a string object and a string literal, you can also concatenate a string object and a single character. The string concatenation in Ex7_01, for example, could also be expressed as follows (see also Ex7_01A.cpp):

sentence += first + ' ' + second + '.';

Another option, just to illustrate the possibilities, is to use the following two statements:

sentence += first + ' ' + second;

sentence += '.';

What you cannot do, though, as before, is concatenate two individual characters. One of the operands to the + operand should always be a string object.

To observe an additional pitfall of adding characters together, you could replace the concatenation in Ex7_01 with this variant:

sentence += second;

sentence += ',' + ' ';

sentence += first;

Surprisingly, perhaps, this code does compile. But a possible session might then go as follows:

Enter your first name: Phil

Enter your second name: McCavity

Your full name is McCavityLPhil.

The string contains 32 characters.

Of particular interest is the third line in this session; notice the comma and space characters between McCavity and Phil have somehow mysteriously fused into a single capital letter L. The reason this happens is that the compiler does not concatenate two characters; instead, it adds the character codes for the two characters together. Nearly all compilers use ASCII codes for the basic Latin characters (ASCII encoding was explained in Chapter 1). The ASCII code for ',' is 44, and that of the ' ' character is 32. Their sum, 32 + 44, therefore equals 76, which happens to be the ASCII code for the capital letter 'L'.

Notice that this example would’ve worked fine if you had written it as follows:

sentence += second + ',' + ' ' + first;

The reason, analogous to before, is that the compiler would evaluate this statement from left to right, as if the following parentheses were present:

sentence += ((second + ',') + ' ') + first;

With this statement, one of the two concatenation operands is thus always a std::string. Confusing? Perhaps a bit. The general rule with std::string concatenation is easy enough, though: concatenation is evaluated left to right and will work correctly only as long as one of the operands of the concatenation operator, +, is a std::string object.

Note

Up to this point, we have always used literals to initialize or concatenate with string objects—either string literals or character literals. Everywhere we used string literals, you can of course also use any other form of C-style string: char[] arrays, char* variables, or any expression that evaluates to either of these types. Similarly, all expressions involving character literals will work just as well with any expression that results in a value of type char.

Concatenating Strings and Numbers

An important limitation in C++ is that you can only concatenate std::string objects with either strings or characters. Concatenation with most other types, such as a double, will generally fail to compile:

const std::string result_string{ "The result equals: "};

double result = 3.1415;

std::cout << (result_string + result) << std::endl; // Compiler error!

Worse, such concatenations might even compile, even though they’ll of course never produce the desired result. Any given number will again be treated as a character code, like in the following example (the ASCII code for the letter 'E' is 69):

std::string song_title { "Summer of '" };

song_title += 69;

std::cout << song_title << std::endl; // Summer of 'E

This limitation might frustrate you at first, especially if you’re used to working with strings in, for instance, Java or C#. In those languages, the compiler implicitly converts values of any type to strings. Not so in C++: in C++, you have to explicitly convert these values to strings yourself. There are several ways you might accomplish this. For values of fundamental numeric types, the easiest by far is to use the std::to_string() family of functions, defined in the string header:

const std::string result_string{ "The result equals: "};

double result = 3.1415;

std::cout << (result_string + std::to_string(result)) << std::endl;

std::string song_title { "Summer of '" };

song_title += std::to_string(69);

std::cout << song_title << std::endl; // Summer of '69

Accessing Characters in a String

You refer to a particular character in a string by using an index value between square brackets, just as you do with a character array. The first character in a string object has the index value 0. You could refer to the third character in sentence, for example, as sentence[2]. You can use such an expression on the left of the assignment operator, so you can replace individual characters as well as access them. The following loop changes all the characters in sentence to uppercase:

for (size_t i {}; i < sentence.length(); ++i)

sentence[i] = std::toupper(sentence[i]);

This loop applies the toupper() function to each character in the string in turn and stores the result in the same position in the string. The index value for the first character is 0, and the index value for the last character is one less than the length of the string, so the loop continues as long as i < sentence.length() is true.

A string object is a range, so you could also do this with a range-based for loop:

for (auto& ch : sentence)

ch = std::toupper(ch);

Specifying ch as a reference type allows the character in the string to be modified within the loop. This loop and the previous loop require the cctype header to be included to compile.

You can exercise this array-style access method in a version of Ex5_11.cpp that determined the number of vowels and consonants in a string. The new version will use a string object. It will also demonstrate that you can use the getline() function to read a line of text that includes spaces:

// Ex7_02.cpp

// Accessing characters in a string

#include <iostream>

#include <string>

#include <cctype>

int main()

{

std::string text; // Stores the input

std::cout << "Enter a line of text:\n";

std::getline(std::cin, text); // Read a line including spaces

unsigned vowels {}; // Count of vowels

unsigned consonants {}; // Count of consonants

for (size_t i {}; i < text.length(); ++i)

{

if (std::isalpha(text[i])) // Check for a letter

{

switch (std::tolower(text[i])) // Convert to lowercase

{

case 'a': case 'e': case 'i': case 'o': case 'u':

++vowels;

break;

default:

++consonants;

break;

}

std::cout << "Your input contained " << vowels << " vowels and "

<< consonants << " consonants." << std::endl;

}

Here’s an example of the output:

Enter a line of text:

A nod is as good as a wink to a blind horse.

Your input contained 14 vowels and 18 consonants.

The text object contains an empty string initially. You read a line from the keyboard into text using the getline() function. This version of getline() is declared in the string header; the versions of getline() that you have used previously were declared in the iostream header. This version reads characters from the stream specified by the first argument, cin in this case, until a newline character is read, and the result is stored in the string object specified by the second argument, which is text in this case. This time you don’t need to worry about how many characters are in the input. The string object will automatically accommodate however many characters are entered, and the length will be recorded in the object.

You can change the delimiter that signals the end of the input by a using a version of getline() with a third argument that specifies the new delimiter for the end of the input:

std::getline(std::cin, text, '#');

This reads characters until a '#' character is read. Because newline doesn’t signal the end of input in this case, you can enter as many lines of input as you like, and they’ll all be combined into a single string. Any newline characters that were entered will be present in the string.

You count the vowels and consonants in much the same way as in Ex5_11.cpp, using a for loop. Naturally, you could also use a range-based for loop instead:

for (const auto ch : text)

{

if (std::isalpha(ch)) // Check for a letter

{

switch (std::tolower(ch)) // Convert to lowercase

{

...

This code, available in Ex7_02A.cpp, is simpler and easier to understand than the original. The major advantage of using a string object in this example compared to Ex5_11.cpp, though, remains the fact that you don’t need to worry about the length of the string that is entered.

Accessing Substrings

You can extract a substring from a string object using its substr() function. The function requires two arguments. The first is the index position where the substring starts, and the second is the number of characters in the substring. The function returns the substring as a string object. Here’s an example:

std::string phrase {"The higher the fewer."};

std::string word1 {phrase.substr(4, 6)}; // "higher"

This extracts the six-character substring from phrase that starts at index position 4, so word1 will contain "higher" after the second statement executes. If the length you specify for the substring overruns the end of the string object, then the substr() function just returns an object containing the characters up to the end of the string. The following statement demonstrates this behavior:

std::string word2 {phrase.substr(4, 100)}; // "higher the fewer."

Of course, there aren’t 100 characters in phrase, let alone in a substring. In this case, the result will be that word2 will contain the substring from index position 4 to the end, which is "higher the fewer.". You could obtain the same result by omitting the length argument and just supplying the first argument that specifies the index of the first character of the substring:

std::string word {phrase.substr(4)}; // "higher the fewer."

This version of substr() also returns the substring from index position 4 to the end. If you omit both arguments to substr(), the whole of phrase will be selected as the substring.

If you specify a starting index for a substring that is outside the valid range for the string object, an exception of type std::out_of_range will be thrown, and your program will terminate abnormally—unless you’ve implemented some code to handle the exception. You don’t know how to do that yet, but we’ll discuss exceptions and how to handle them in Chapter 15.

Caution

As before, substrings are always specified using their begin index and length, not using their begin and end indexes. Keep this in mind, especially when migrating from languages such as JavaScript or Java!

Comparing Strings

In example Ex7_02 you used an index to access individual characters in a string object for comparison purposes. When you access a character using an index, the result is of type char, so you can use the comparison operators to compare individual characters. You can also compare entire string objects using any of the comparison operators. These are the comparison operators you can use:

> >= < <= == !=

You can use these to compare two objects of type string or to compare a string object with a string literal or C-style string. The operands are compared character by character until either a pair of corresponding characters contains different characters or the end of either or both operands is reached. When a pair of characters differs, numerical comparison of the character codes determines which of the strings has the lesser value. If no differing character pairs are found and the strings are of different lengths, the shorter string is “less than” the longer string. Two strings are equal if they contain the same number of characters and all corresponding character codes are equal. Because you’re comparing character codes, the comparisons are obviously going to be case sensitive.

The technical term for this string comparison algorithm is lexicographical comparison , which is just a fancy way of saying that strings are ordered in the same manner as they are in a dictionary.

You could compare two string objects using this if statement:

std::string word1 {"age"};

std::string word2 {"beauty"};

if (word1 < word2)

std::cout << word1 << " comes before " << word2 << '.' << std::endl;

else

std::cout << word2 << " comes before " << word1 << '.' << std::endl;

Executing these statements will result in the following output:

age comes before beauty.

This shows that the old saying must be true. The preceding code looks like a good candidate for using the conditional operator. You can produce a similar result with the following statement:

std::cout << word1 << (word1 < word2? " comes " : " does not come ")

<< "before " << word2 << '.' << std::endl;

Let’s compare strings in a working example. This program reads any number of names and sorts them into ascending sequence:

// Ex7_03.cpp

// Comparing strings

#include <iostream> // For stream I/O

#include <iomanip> // For stream manipulators

#include <string> // For the string type

#include <cctype> // For character conversion

#include <vector> // For the vector container

int main()

{

std::vector<std::string> names; // Vector of names

std::string input_name; // Stores a name

char answer {}; // Response to a prompt

{

std::cout << "Enter a name: ";

std::cin >> input_name; // Read a name and...

names.push_back(input_name); // ...add it to the vector

std::cout << "Do you want to enter another name? (y/n): ";

std::cin >> answer;

} while (std::tolower(answer) == 'y');

// Sort the names in ascending sequence

bool sorted {};

{

sorted = true; // remains true when names are sorted

for (size_t i {1}; i < names.size(); ++i)

{

if (names[i-1] > names[i])

{ // Out of order - so swap names

names[i].swap(names[i-1]);

sorted = false;

}

} while (!sorted);

// Find the length of the longest name

size_t max_length{};

for (const auto& name : names)

if (max_length < name.length())

max_length = name.length();

// Output the sorted names 5 to a line

const size_t field_width = max_length + 2;

size_t count {};

std::cout <<"In ascending sequence the names you entered are:\n";

for (const auto& name : names)

{

std::cout << std::setw(field_width) << name;

if (!(++count % 5)) std::cout << std::endl;

}

std::cout << std::endl;

}

Here’s some sample output:

Enter a name: Zebediah