Chapter 3. Literals

In C source code, a literal is a token that denotes a fixed value, which may be an integer, a floating-point number, a character, or a string. A literal’s type is determined by its value and its notation.

The literals discussed here are different from compound literals, which were introduced in the C99 standard. Compound literals are ordinary modifiable objects, similar to variables. For a full description of compound literals and the special operator used to create them, see Chapter 5.

Integer Constants

An integer constant can be expressed as an ordinary decimal numeral, or as a numeral in octal or hexadecimal notation. You must specify the intended notation by a prefix.

A decimal constant begins with a nonzero digit. For example, 255 is the decimal constant for the base-10 value 255.

A number that begins with a leading zero is interpreted as an octal constant. Octal (or base eight) notation uses only the digits from 0 to 7. For example, 047 is a valid octal constant representing 4 × 8 + 7, and is equivalent with the decimal constant 39. The decimal constant 255 is equal to the octal constant 0377.

A hexadecimal constant begins with the prefix 0x or 0X. The hexadecimal digits A to F can be upper- or lowercase. For example, 0xff, 0Xff, 0xFF, and 0XFF represent the same hexadecimal constant, which is equivalent to the decimal constant 255.

Because the integer constants you define will eventually be used in expressions and declarations, their type is important. The type of a constant is determined at the same time as its value is defined. Integer constants such as the examples just mentioned usually have the type int. However, if the value of an integer constant is outside the range of the type int, then it must have a bigger type. In this case, the compiler assigns it the first type in a hierarchy that is large enough to represent the value. For decimal constants, the type hierarchy is:

int, long, long long

For octal and hexadecimal constants, the type hierarchy is:

int, unsigned int, long, unsigned long, long long, unsigned long long

For example, on a 16-bit system, the decimal constant 50000 has the type long, as the greatest possible int value is 32,767, or 2¹⁵ − 1.

You can also influence the types of constants in your programs explicitly by using suffixes. A constant with the suffix l or L has the type long (or a larger type if necessary, in accordance with the hierarchies just mentioned). Similarly, a constant with the suffix ll or LL has at least the type long long. The suffix u or U can be used to ensure that the constant has an unsigned type. The long and unsigned suffixes can be combined. Table 3-1 gives a few examples.

Table 3-1. Examples of constants with suffixes
Integer constant	Type
`0x200`	`int`
`512U`	`unsigned int`
`0L`	`long`
`0Xf0fUL`	`unsigned long`
`0777LL`	`long long`
`0xAAAllu`	`unsigned long long`

Floating-Point Constants

Floating-point constants can be written either in decimal or in hexadecimal notation. These notations are described in the next two sections.

Decimal Floating-Point Constants

An ordinary floating-point constant consists of a sequence of decimal digits containing a decimal point. You may also multiply the value by a power of 10, as in scientific notation: the power of 10 is represented simply by an exponent, introduced by the letter e or E. A floating-point constant that contains an exponent does not need to have a decimal point. Table 3-2 gives a few examples of decimal floating-point constants.

Table 3-2. Examples of decimal floating-point constants
Floating-point constant	Value
`10.0`	`10`
`2.34E5`	`2.34 × 10⁵`
`67e-12`	`67.0 × 10⁻¹²`

The decimal point can also be the first or last character. Thus, 10. and .234E6 are permissible numerals. However, the numeral 10 with no decimal point would be an integer constant, not a floating-point constant.

The default type of a floating-point constant is double. You can also append the suffix F or f to assign a constant the type float, or the suffix L or l to give a constant the type long double, as this example shows:

float  f_var = 123.456F;              // Initialize a float variable.

long double ld_var = f_var * 987E7L;  // Initialize a long double
                                      // variable with the product of
                                      // a multiplication performed
                                      // with long double precision.

Hexadecimal Floating-Point Constants

The C99 standard introduced hexadecimal floating-point constants, which have a key advantage over decimal floating-point numerals: if you specify a constant value in hexadecimal notation, it can be stored in the computer’s binary floating-point format exactly, with no rounding error, whereas values that are “round numbers” in decimal notation—like 0.1—may be repeating fractions in binary, and have to be rounded for representation in the internal format. (For an example of rounding with floating-point numbers, see Example 2-2.)

A hexadecimal floating-point constant consists of the prefix 0x or 0X, a sequence of hexadecimal digits with an optional decimal point (which perhaps we ought to call a “hexadecimal point” in this case), and an exponent to base two. The exponent is a decimal numeral introduced by the letter p or P. For example, the constant 0xa.fP-10 is equal to the number (10 + 15/16) × 2⁻¹⁰ (not 2⁻¹⁶) in decimal notation. Equivalent ways of writing the same constant value are 0xA.Fp-10, 0x5.78p-9, 0xAFp-14, and 0x.02BCp0. Each difference of 1 in the exponent multiplies or divides the hexadecimal fraction by a factor of 2, and each shift of the hexadecimal point by one place corresponds to a factor (or divisor) of 16, or 2⁴.

In hexadecimal floating-point constants, you must include the exponent, even if its value is zero. This step is necessary in order to distinguish the type suffix F (after the exponent) from the hexadecimal digit F (to the left of the exponent). For example, if the exponent were not required, the constant 0x1.0F could represent either the number 1.0 with type float, or the number 1 + 15/256 with the default type double.

Like decimal floating-point constants, hexadecimal floating-point constants also have the default type double. Append the suffix F or f to assign a constant the type float, or the suffix L or l to give it the type long double.

Character Constants

A character constant consists of one or more characters enclosed in single quotation marks. Here are some examples:

'a'   'XY'   '0'   '*'

All the characters of the source character set are permissible in character constants, except the single quotation mark ', the backslash \, and the newline character. To represent these characters, you must use escape sequences:

'\''   '\\'   '\n'

In the fifth translation phase (see “How the C Compiler Works”), characters and escape sequences in character constants are converted into the corresponding characters of the execution character set. All the escape sequences that are permitted in character constants are described in “Escape Sequences”.

Wide-character constants are character constants defined with one of the prefixes L, u, or U. They have a different type and value range from character constants defined without a prefix.

Types and Values of Character Constants

Character constants that are not wide characters have the type int. If a character constant contains one character which can be represented in a single byte in the execution character set, then its value is the character code of that character. For example, the constant 'a' in ASCII or ISO 8859-1 encoding has the decimal value 97. In all other cases, and in particular if a character constant contains more than one character, the value of a character constant can vary from one compiler to another.

The following code fragment tests whether the character read is a digit between 1 and 5, inclusive:

#include <stdio.h>
int c = 0;

/* ... */

c = getchar();                          // Read a character.
if ( c != EOF && c > '0' && c < '6' )   // Compare input to character
                                        // constants.
{
  /* This block is executed if the user entered a digit from 1 to 5. */
}

If the type char is signed, then the value of a character constant can also be negative, because the constant’s value is the result of a type conversion of the character code from char to int. For example, ISO 8859-1 is a commonly used 8-bit character set, also known as the ISO Latin 1 or ANSI character set. In this character set, the currency symbol for pounds sterling, £, is coded as hexadecimal A3:

int c = '\xA3';                       // Symbol for pounds sterling
printf("Character: %c     Code: %d\n", c, c);

If the execution character set is ISO 8859-1, and the type char is signed, then the printf statement in the preceding example generates the following output:

Character: £     Code: -93

In a program that uses characters that are not representable in a single byte, you can use wide-character constants. A wide-character constant is written with one of the prefixes L, u, or U. The prefix determines the type of the character constant, as shown in Table 3-3.

Table 3-3. The types of character constants
Prefix	Examples	Type
none	`'a'` `'\t'`	`int`
`L`	`L'a'` `L'\u0100'`	`wchar_t` (defined in stddef.h)
`u`	`u'a'` `u'\x3b3'`	`char16_t` (defined in uchar.h)
`U`	`U'a'` `U'\u27FA'`	`char32_t` (defined in uchar.h)

The value of a wide-character constant that contains a single multibyte character which is representable in the execution character set is the code of the corresponding wide character. That is the value that would be returned for that multibyte character by the standard function mbtowc() (“multibyte to wide character”), or by mbrtoc16() or mbrtoc32(), depending on the type of the wide-character constant.

The Unicode types char16_t and char32_t, and the corresponding conversion functions, were introduced in the C11 standard. Characters of the type char16_t are encoded in UTF-16 if the macro __STDC_UTF_16__ is defined in the given implementation. Similarly, characters of the type char32_t are encoded in UTF-32 if the implementation defines the macro __STDC_UTF_32__.

Tip

The value of a character constant containing several characters, such as L'xy', is not specified. To ensure portability, make sure your programs do not depend on such a character constant having a specific value.

Escape Sequences

An escape sequence begins with a backslash \, and represents a single character. Escape sequences allow you to represent any character in character constants and string literals, including nonprintable characters and characters that otherwise have a special meaning, such as ' and ". Table 3-4 lists the escape sequences recognized in C.

Table 3-4. Escape sequences
Escape sequence	Character value	Action on output device
`\'`	A single quotation mark (')	Prints the character
`\"`	A double quotation mark (`"`)
`\?`	A question mark (?)
`\\`	A backslash character (\)
`\a`	Alert	Generates an audible or visible signal
`\b`	Backspace	Moves the active position back one character
`\f`	Form feed	Moves the active position to the beginning of the next page
`\n`	Newline	Moves the active position to the beginning of the next line
`\r`	Carriage return	Moves the active position to the beginning of the current line
`\t`	Horizontal tab	Moves the active position to the next horizontal tab stop
`\v`	Vertical tab	Moves the active position to the next vertical tab stop
`\``o`, `\``oo`, or `\``ooo` (where `o` is an octal digit)	The character with the given octal code	Prints the character
`\x``h``[``h``…]` (where `h` is a hexadecimal digit)	The character with the given hexadecimal code
`\u``hhhh``\U``hhhhhhhh`	The character with the given universal character name

In the table, the active position refers to the position at which the output device prints the next output character, such as the position of the cursor on a console display. The behavior of the output device is not defined in the following cases: if the escape sequence \b (backspace) occurs at the beginning of a line; if \t (tab) occurs at the end of a line; or if \v (vertical tab) occurs at the end of a page.

As Table 3-4 shows, universal character names are also considered escape sequences. Universal character names allow you to specify any character in the extended character set, regardless of the encoding used. See “Universal Character Names” for more information.

You can also specify any character code in the value range of the type unsigned char—or any wide-character code in the value range of wchar_t—using the octal and hexadecimal escape sequences, as shown in Table 3-5.

Table 3-5. Examples of octal and hexadecimal escape sequences
Octal	Hexadecimal	Description
`'\0'`	`'\x0'`	The null character
`'\033'` `'\33'`	`'\x1B'`	The control character `ESC` (“escape”)
`'\376'`	`'\xfe'`	The character with the decimal code 254
`'\417'`	`'\x10f'`	Illegal, as the numeric value is beyond the range of the type `unsigned char`
`L'\417'`	`L'\x10f'`	That’s better! It’s now a wide-character constant; the type is `wchar_t`
-	`L'\xF82'`	Another wide-character constant
-	`U'\x222B'`	A wide-character constant with the type `char32_t`

There is no equivalent octal notation for the last two constants in the table because octal escape sequences cannot have more than three octal digits. For the same reason, the wide-character constant L'\3702' consists of two characters: L'\370' and L'2'.

String Literals

A string literal consists of a sequence of characters (and/or escape sequences) enclosed in double quotation marks. For example:

"Hello world!\n"

The individual characters of a string literal are governed by the same rules described for the values of characters in character constants. String literals may contain all the multibyte characters of the source character set. The only exceptions are the double quotation mark ", the backslash \, and the newline character, which must be represented by escape sequences. For example, each backslash character in Windows directory paths must be written as \\. The following printf statement first produces an alert tone, and then indicates a documentation directory in quotation marks, substituting the string literal addressed by the pointer argument doc_path for the conversion specification %s:

char doc_path[128] = ".\\share\\doc";    // That is, ".\share\doc"
printf("\aSee the documentation in the directory \"%s\"\n", doc_path);

A string literal is a static array of char that contains character codes followed by a string terminator, the null character \0 (see also Chapter 8). The empty string "" occupies exactly one byte in memory, which holds the terminating null character. Characters that cannot be represented in one byte are stored as multibyte characters.

As illustrated in the previous example, you can use a string literal to initialize a char array. A string literal can also be used to initialize a pointer to char:

char *pStr = "Hello, world!";     // pStr points to the first
                                  // character, 'H'

In such an initializer, the string literal represents the address of its first element, just as an array name would.

In Example 3-1, the array error_msg contains three pointers to char, each of which is assigned the address of the first character of a string literal.

Example 3-1. Sample function error_exit()

#include <stdlib.h>
#include <stdio.h>
void error_exit(unsigned int error_n)  // Print a last error message
{                                      // and exit the program.
  char * error_msg[] = { "Unknown error code.\n",
                         "Insufficient memory.\n",
                         "Illegal memory access.\n" };
  unsigned int arr_len = sizeof(error_msg)/sizeof(char *);

  if ( error_n >= arr_len )
     error_n = 0;
  fputs( error_msg[error_n], stderr );
  exit(1);
}

The C11 standard provides a new prefix, u8, which allows you to define a UTF-8 string literal. The multibyte characters in the char array defined by a UTF-8 string literal are encoded in UTF-8. A string literal of the form u8"…" is thus no different from a string literal without a prefix if the implementation’s default encoding for multibyte characters is UTF-8.

Like wide-character constants, you can also specify string literals as strings of wide characters by using one of the prefixes L, u, or U. In this way, you define what is called a wide-string literal, which yields an array of wide characters ending with a character with the value 0. The prefix determines the array elements’ type.

A wide-string literal is defined using the prefix L:

L"Here's a wide-string literal."

This expression defines a static null-terminated array of elements of the type wchar_t. The array is initialized by converting the multibyte characters in the string literal to wide characters in the same way as the standard function mbstowcs() (“multibyte string to wide-character string”) would do.

The prefixes u and U, introduced in C11, yield a static array of wide characters of the type char16_t or char32_t. The multibyte characters in these wide-string literals are implicitly converted to wide characters by successive calls to the function mbrtoc16() or mbrtoc32().

If a multibyte character or an escape sequence in a string literal is not representable in the execution character set, the value of the string literal is not specified—that is, it depends on the compiler.

In the following example, \u03b1 is the universal name for the character α, and wprintf() is the wide-character version of the printf function, which formats and prints a string of wide characters:

double angle_alpha = 90.0/3;
wprintf( L"Angle \u03b1 measures %lf degrees.\n", angle_alpha );

The compiler’s preprocessor concatenates any adjacent string literals—that is, those that are separated only by whitespace—into a single string. As the following example illustrates, this concatenation also makes it simple to break up a string into several lines for readability:

#define PRG_NAME "EasyLine"
char msg[ ] = "The installation of " PRG_NAME
              " is now complete.";

If any of the string literals involved has a prefix, then the resulting string is treated as a string literal with that prefix. Whether string literals with different prefixes can be concatenated depends on the compiler.

Another way to break a string literal into several lines is to end a line with a backslash, as in this example:

char info[ ] =
"This is a string literal broken up into\
  several source code lines.\nNow one more line:\n\
 that's enough, the string ends here.";

The string continues at the beginning of the next line: any spaces at the left margin, such as the space before several in the preceding example, are part of the string literal. Furthermore, the string literal defined here contains exactly two newline characters: one immediately before Now, and one immediately before that's; in other words, only the two that are explicitly written as \n.

The compiler interprets escape sequences before concatenating adjacent strings (see “The C Compiler’s Translation Phases”). As a result, the following two string literals form one wide-character string that begins with the two characters '\xA7' and '2':

L"\xA7" L"2 et cetera"

However, if the string is written in one piece as L"\xA72 et cetera", then the first character in the string is the wide character '\xA72'.

Although C does not strictly prohibit modifying string literals, you should not attempt to do so. In the following example, the second statement is an attempt to replace the first character of a string:

char *p = "house";         // Initialize a pointer to char.
*p = 'm';                  // This is *not* a good idea!

This statement is not portable, and causes a runtime error on some systems. For one thing, the compiler, treating the string literal as a constant, may place it in read-only memory, in which case the attempted write operation causes a fault. For another, if two or more identical string literals are used in the program, the compiler may store them at the same location, so that modifying one causes unexpected results when you access another.

However, if you use a string literal to initialize an array variable, you can then modify the contents of the array:

char s[] = "house";      // Initialize an array of char.
s[0] = 'm';              // Now the array contains the string "mouse".

In the same way, arrays whose elements have the type wchar_t, char16_t, or char32_t can be initialized using an appropriate wide-string literal.

Table of Contents for C in a Nutshell, 2nd Edition