Categories
Computer Science / Information Technology Language: C

Strings

Introduction

The way a group of integers can be stored in an integer array, similarly, a group of characters can be stored in a character array. A string constant is a one-dimensional array of characters terminated by a
null ( ‘\0’ ).

Each character in the array occupies one byte of memory. NULL may look like two characters, but it is actually only one character, with the \ indicating that what follows it is something special. Note that ‘\0’ and ‘0’ are not the same. The ASCII value of ‘\0’ is 0, whereas
the ASCII value of ‘0’ is 48.

Generally, string characters are selected only from the printable character set. Nothing in C prevents any character other than the NULL from being used in a string. In fact, it is quite common to use formatting characters, such as tabs, in strings. Before jumping into this section, please go through the ‘Arrays’ section if you haven’t yet.

String literals

A string literal, also known as a string constant, is a sequence of characters enclosed in double-quotes. For example, each of the following is string literal:

"Hello world"
"Quants"
"Welcome to C language!"

When string literals are used in a program, C automatically creates an array of characters, initializes it to a null-delimited string, and stores it, remembering its address. It does all this because we use the double quotes that immediately identify the data as a string value.

Declaration and initialisation of a string

Declaring and initializing a string is similar to any other array. Two of the ways of doing it are as follows:

Syntax:

char string_id[size] = {'Q', 'u', 'a', 'n', 't', 's', '\0'};
char string_id[size] = {'Q', 'u', 'a', 'n', 't', 's'};
char string_id[size] = "Quants";
char string_id[size] = "Quants\0";

Where,

  1. string_id is any valid identifier for the string.
  2. size is a non-zero positive integer denoting the size of the string.

Note:

  1. The strings can also be initialized at the time of declaration using a string literal.
  2. size value is optional in this case as the initialisation is happening at the time of declaration. This should always be 1 plus the length of string planned to be stored in the array. This is to store the NULL delimiter at the end.
  3. The NULL character (‘\0’) at the end of the 1st and 4th example is optional as the modern compilers append NULL implicitly. But it is advised to put it explicitly.
  4. If the size value is smaller than the length of the string being initialized, the compiler will throw a warning saying “excess elements in array initializer” and initializes only the ‘size‘ number of characters into the array.
  5. It is important to remember that strings cannot be initialized with string literals in the following way:
char string[10];
string = "Quants"; // Error
  1. Note that you cannot copy the contents of one string to another using the equals operator. To do that there are string functions supported by seeing which we shall see at a later point.
char string[10] = "Quants";
char str[10];
str = string; // Error

Significance of NULL delimiter

The string in C is not a data type but a data structure. This means that its implementation is logical and not physical. The physical structure is the array in which the string is stored. Since the string is a variable-length structure, we need to identify the logical end of the data within the physical structure.

The terminating null (‘\0’) is important also because it is the only way the functions that work with a string can know where the string ends. In fact, a string not terminated by a ‘\0’ is not really a string, but merely a collection of characters.

Strings and characters

A character can be stored in 2 ways: as a character literal or as a string literal. To store in character literal, use single inverted commas. The character occupies a single memory location and a string containing a single character requires two bytes of memory location: 1 for the character and the other for the null delimiter. Hence, ‘a’ is a character, “a” is a string and “” is an empty string.

A character can be copied from one location (or variable) to another using an assignment operator but for strings, we would need library functions which we shall see later in this section.
Furthermore, a character is nothing but an unsigned short integer which is mapped to the corresponding value in the ASCII table. The following ASCII table describes the mapping:

Another important difference between string and character is how we represent the absence of data. A character cannot be empty. It can hold a null character or a space but simply cannot be empty. Hence ‘’ doesn’t make sense in C. On the other hand, strings can be empty. A string that contains no data consists of only a delimiter.

Referencing string literals

As it is clear by now that the strings are nothing but a character array with null as the last character, it should be obvious to note that the individual characters in the string can also be accessed just the way we do in arrays. The following are the valid ways to refer to characters in
the string:

string[0];     // First character
*(string + 1); // Second character
*(&string[2]); // Third character

Notice that the 3rd way is just referencing and dereferencing the address. The above three methods can be illustrated in the following example:

#include <stdio.h>
int main( ) 
{
     char string[10] = "ABC";
     printf("%c", string[0]);     // First character
     printf("%c", *(string + 1)); // Second character
     printf("%c", *(&string[2])); // Third character
     return 0;
}

Iterating through characters of a string

The following are the ways to iterate through the string using for loop:

char string[10] = "ABC";
for (int i = 0 ; string[i] != '\0' ; i++)
printf("%c", string[i]);

Using while loop:

char string[10] = "ABC";
int i = 0 ;
while ( string[i] != '\0') {
    printf("%c", string[i]);
    i++;
}

String Input/Output Functions

scanf( )

  • scanf( ) is a function that reads data from stdin (i.e, the standard input stream, which is usually the keyboard, unless redirected) and then writes the results into the arguments given.
  • The entered data is formatted as integers or floating-point numbers etc
  • Syntax:
int scanf(const char *format, ...);
  • The ‘format’ is a string in itself and contains conversion code. In this case for string, the conversion code is s. The simplest of examples of this is as follows:
char str[10];
scanf("%s", str);
  • scanf( ) ignores all the leading whitespaces in the input and reads only till it finds whitespace. All the characters from the first non-whitespace character till the last non-whitespace character are put into the memory location pointed by the pointer passed as an argument.
  • For example, if ” Hello World ” is entered, the leading whitespace is ignored and only “Hello” is stored in str and a null character is also stored at the end of the str. The rest of the input string is left in the input stream.
  • To remove the rest of the input from the input stream, one can use fflush( ) or define a macro. Simply calling fflush with stdin as a parameter should flush out all the residual input characters from the input stream. This is illustrated as follows:
char str[10];
scanf("%s", str);
fflush(stdin);
  • A macro to flush the input stream is illustrated as follows:
#define FLUSH while (getchar() != '\n') {
char str[10];
scanf("%s", str);
FLUSH;
}
  • We can protect against the user entering too much data by using width in the field specification. Width specifies the maximum number of characters to be read. The modified scanf statement is shown below:
scanf("%9s", str);
  • Any number of characters entered beyond the first 9 characters in the above case will be stored in buffer but not stored to str. Hence it is a recommended practice to flush after using scanf().

The scan set conversion code ([…])

  • The scan set conversion specification consists of the open bracket ([), followed by the edit characters, and terminated by the closing
    bracket (]).
  • The characters in the scan set identify the valid characters, known as the scan set, that are to be allowed in the string. All characters except the close bracket can be included in the set.
  • Edited conversion reads the input stream as a string. Each character read by scanf() is compared against the scan set. If the character just read is in the scan set, it is placed in
    the string and the scan continues.
  • The read will stop when the following conditions are met:
    • The first character does not match the scan set.
    • If the first character read is not in the scan set, the scanf() terminates and a null string is returned.
    • If an end-of-file is detected.
    • If a field width specification is included and the maximum number of characters has been read.
  • Scan set doesn’t skip the leading whitespace. Leading whitespace is either put into the string being read when the scan set contains the corresponding whitespace character or stops the conversion when it is not.
  • The non-matching character remains in the input stream for the next read operation. Hence it is advised to flush after using the scan set conversion code.
  • Example: A scanf() statement to read a string containing only digits, commas, periods, the minus sign, and a dollar sign and the maximum number of characters in the resulting string is 10. The format string for this operation would be:
scanf("%10[0123456789.,-$]", str);
  • Sometimes it is easier to specify what is not to be included in the scan set rather than what is valid. For instance, suppose that we want to read a whole line. We can do this by stating that all characters except the newline (\n) are valid. To specify invalid characters, we start the scan set with the caret ( symbol. The caret is the negation symbol and in
    effect says that the following characters are not allowed in the string. To read a line, we would code the scanf as shown below.
scanf("%[^\n]", line);

printf()

  • C has four options of interest when we write strings using these print functions: the left-justify flag, width, precision, and size. The left-justify flag (-) and the width are almost always used together.
  • Justification flag
    • The justification flag (-) is used to left justify the output. It has meaning only when the width is also specified, and then only if the length of the string is less than the format width. Using the justification flag results in the output being left-justified, as shown below. If no flag is used, the justification is right.
printf("|%-30s|\n", "This is the string");

○ Output:

|This is the string |
  • Minimum width
    • The width sets the minimum size of the string in the output. If it is used without a flag, the string is printed right-justified as shown below.
printf("|%30s|\n", "This is the string");

○ Output

|            This is the string|
  • Precision
    • C also uses the precision option to set the maximum number of characters that will be written. In the following example, we set the maximum characters to one less than the width to ensure space between the string and the next column.
printf("|%-15.14s|", "12345678901234567890");

○ Output:

|12345678901234 |

Arrays of strings

Ragged/Jagged array

Before moving into Arrays of strings, one needs to understand what is a Jagged/Ragged array. A Jagged array is a 2-dimensional array where each row might have a non-identical number of columns than the other arrays.

Arrays of strings as Ragged array

It is obvious that an array of strings will mostly be a ragged array as it’s extremely unlikely that the strings in the array will have the same length. This can be illustrated in the following figure:

There are many ways to declare and initialize an array of strings. Some of them are as follows:

Consider the following program:

#https://www.onlinegdb.com/#editor_1include <stdio.h>
int main() 
{
    char* days_of_week[7];
    days_of_week[0] = "Monday";
    days_of_week[1] = "Tuesday";
    days_of_week[2] = "Wednesday";
    days_of_week[3] = "Thursday";
    days_of_week[4] = "Friday";
    days_of_week[5] = "Saturday";
    days_of_week[6] = "Sunday";
    for(int i = 0 ; i < 7 ; i++)
    printf("%s\n", days_of_week[i]);
    return 0;
}

The output of the above program:

Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday

Points to note from the above program:

  • The string array days_of_week[] is an array of character pointers.
  • The array indices are getting initialized with the static strings which get stored in the text segment. These are not editable strings. When tried to edit, the process will throw a segmentation fault.


Now the previous example program saw how to create a string array where the strings are read-only as they are stored in the text segment. Please note that the array is not read-only.

At any given point, an index of the array can point to a different string but the string it is pointing to will be read-only.

At times, we need to store a string in the array which we can edit. The following program shows how to do the same:

#include <stdio.h>
#include <string.h>
int main()
{
    char days_of_week[7][10];
    strcpy(days_of_week[0], "Monday");
    strcpy(days_of_week[1], "Tuesday");
    strcpy(days_of_week[2], "Wednesday");
    strcpy(days_of_week[3], "Thursday");
    strcpy(days_of_week[4], "Friday");
    strcpy(days_of_week[5], "Saturday");
    strcpy(days_of_week[6], "Sunday");
    for(int i = 0 ; i < 7 ; i++)
    printf("%s\n", days_of_week[i]);
    return 0;
}

The output of the above program is:

Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday

Points to note from the above program:

  • Memory allocation for all the characters to be initialised is done during declaration.
  • The strings are copied (using strcpy( ) library function which we shall see in the next section) to the memory locations of the array, unlike the previous example where only the pointers to the strings were stored in a single dimensional array

String Manipulation Functions

  • Strings in C is not a primitive data type hence one cannot use operators such as ‘+’(presumably to concatenate two strings) or ‘=’ (presumable to copy one string to a memory location). To do all of these tasks, there is a standard string library defined in C.
  • The following are some of the string manipulation functions defined in string.h:

String length

  • The string length function returns the length of a string passed as an argument to it. The length of a string is defined as the number of characters in the string till the first occurrence of the null character in the string, excluding the null character.
  • The prototype of the string length function is as follows:
    • size_t strlen(const char *s);
  • strlen( ) is the function name and the return type is size_t which mostly is unsigned int.
  • The following program demonstrates strlen() without using the library function:
#include <stdio.h>
size_t strlen(char *str)
    {
        size_t len = 0;
        while (*str != '\0')
        {
          str++;
          len++;
        }
     return len;
    }
int main()
{
    char str[100];
    scanf("%s", str);
    printf("%ld is the length of the string.", strlen(str));
    return 0;
}

The output of the above program is as follows:

Quantmasters
12 is the length of the string.

The following program illustrates library function strlen()

#include <stdio.h>
#include <string.h>
int main()
{
    char str[100];
    scanf("%s", str);
    printf("%ld is the length of the string.", strlen(str));
    return 0;
}

Output:

Quantmasters
12 is the length of the string.

String Copy

  • Often we would need to copy the contents of one string variable to another. As C would not allow using of assignment operator on strings to achieve deep copy, string copy functions strcpy() and strncpy() come in handy.
  • The function signature of strcpy() is as follows:
char *strcpy(char *dest, const char *src);
  • The strcpy() function copies the string pointed to by src, including the terminating null byte (‘\0’), to the string pointed to by dest.
  • The strncpy() function is similar, except that at most n bytes of src are copied. Warning: If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated.
  • If the length of src is less than n, strncpy() writes additional null bytes to dest to ensure that a total of n bytes are written.
  • The strcpy() and strncpy() functions return a pointer to the destination string dest.
  • The following program illustrates the use of library function strcpy():
#include <stdio.h>
#include <string.h>
int main()
{
    char src[100], dest[100];
    scanf("%s", src);
    printf("%s is the dest string.", strcpy(dest, src));
    return 0;
}

Output:

HelloHi
HelloHi is the dest string.

The following program illustrates what a simple implementation of strcpy() might be:

#include <stdio.h>
char *
strcpy(char *dest, const char *src)
{
    size_t i;
    for (i = 0; src[i] != '\0'; i++)
    dest[i] = src[i];
    return dest;
}
int main()
{
    char src[100], dest[10] = {'\0'};
    scanf("%s", src);
    printf("%s is the dest string.", strcpy(dest, src));
    return 0;
}

Output

HelloHi
HelloHi is the dest string.

The following program illustrates what a simple implementation of strncpy() might be:

#include <stdio.h>
char *
strncpy(char *dest, const char *src, size_t n)
{
    size_t i;
    for (i = 0; i < n && src[i] != '\0'; i++)
    dest[i] = src[i];
    for ( ; i < n; i++)
    dest[i] = '\0';
    return dest;
}
int main()
{
    char src[100], dest[10] = {'\0'};
    scanf("%s", src);
    printf("%s is the dest string.", strncpy(dest, src, 9));
    return 0;
}

The output of the above programs is as follows

HelloHi
HelloHi is the dest string.

Things to note while using string copy functions in C:

Overlapping strings

  • It is important to note that the src and dest strings should not overlap. That is, src should not point to any character of dest and dest should not point to any character of src. A classic example of string overlap is as follows:
  • One can notice that dest and src are pointing to the same string but at two different indices. This is an overlap and the string copy functions should not be applied to such types of functions.

Size of dest string

  • The dest string should be large enough to accommodate the src string

Buffer overflow

  • One should use strcpy() only when it is 100% guaranteed that the src string is shorter or has an equal length to that of the capacity of dest string. If not, buffer-overflow attacks can be launched and important data on the stack can be written to point to a malicious code.
  • To avoid this, one can use strncpy() and mention the capacity of dest as the argument ‘n’.

What to use when?

  • Some programmers consider strncpy() to be inefficient and error-prone. If the programmer knows (i.e., includes code to test!) that the size of dest is greater than the length of src, then strcpy() can be used.
  • One valid (and intended) use of strncpy() is to copy a C string to a fixed-length buffer while ensuring both that the buffer is not overflowed and those unused bytes in the target buffer are zeroed out (perhaps to prevent information leaks if the buffer is to be written to media or transmitted to another process via an interprocess communication technique).
  • If there is no terminating null byte in the first n bytes of src, strncpy() produces an unterminated string in dest.
  • If buf has length buflen, you can force termination using something like the following:
strncpy(buf, str, buflen - 1);
if (buflen > 0)
buf[buflen - 1]= '\0';
  • Of course, the above technique ignores the fact that, if src contains more than buflen – 1 bytes, information is lost in the copying to dest.

strlcpy()

  • Some systems (the BSDs, Solaris, and others) provide the following function:
size_t strlcpy(char *dest, const char *src, size_t size);
  • This function is similar to strncpy(), but it copies at most size-1 bytes to dest, always adds a terminating null byte, and does not pad the target with (further) null bytes.
  • This function fixes some of the problems of strcpy() and strncpy(), but the caller must still handle the possibility of data loss if the size is too small.
  • The return value of the function is the length of src, which allows truncation to be easily detected: if the return value is greater than or equal to size, truncation occurred.
  • If loss of data matters, the caller must either check the arguments before the call or test the function return value.
  • strlcpy() is not present in Glibc and is not standardized by POSIX, but is available on Linux via the libbsd library.

String compare

C provides two functions to compare two strings: strcmp() and strncmp(). These are included with the header file string.h

strcmp()

  • Function signature:
int strcmp(const char *s1, const char *s2);
  • The strcmp() function performs case sensitive comparison of the two strings s1 and s2. It returns an integer less than, equal to, or greater than zero if s1 is found, respectively, to be less than, to match, or be greater than s2.
  • The return value of this function depends on the comparison operation and the following table illustrates the 3 possibilities:
Return valueCase
0if the s1 and s2 are equal
Negative integers1 is less than s2
Positive non zero integers1 is greater than s2

The following program illustrates strcmp():

#include <stdio.h>
#include <string.h>
int main()
{
    char str1[100], str2[100];
    printf("Enter first string: ");
    scanf("%s", str1);
    printf("Enter second string: ");
    scanf("%s", str2);
    int i = strcmp(str1, str2);
    if (i == 0)
         printf("%s == %s\n", str1, str2);
    else if (i < 1)
         printf("%s < %s\n", str1, str2);
    else
         printf("%s > %s\n", str1, str2);
    return 0;
}

Output: Run #1

Enter first string: hello
Enter second string: Hello
hello > Hello

Output: Run #2

Enter first string: Hello
Enter second string: Hello
Hello == Hello

Output: Run #3

Enter first string: Hello
Enter second string: hello
Hello < hello

strncmp()

  • The strncmp() function is similar, except it compares only the first (at most) n bytes of s1 and s2. The signature is as follows:
int strncmp(const char *s1, const char *s2, size_t n);
  • The program below is from the official man page of Linux. This demonstrates the operation of strcmp() (when given two arguments) and strncmp() (when given three arguments). First, some examples using strcmp():
/* string_comp.c
Licensed under GNU General Public License v2 or later.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int
main(int argc, char *argv[])
{
    int res;
    if (argc < 3) 
{
        fprintf(stderr, "Usage: %s <str1> <str2> [<len>]\n", argv[0]);
        exit(EXIT_FAILURE);
}

if (argc == 3)
res = strcmp(argv[1], argv[2]);
else
    res = strncmp(argv[1], argv[2], atoi(argv[3]));
    if (res == 0) {
    printf("<str1> and <str2> are equal");
    if (argc > 3)
    printf(" in the first %d bytes\n", atoi(argv[3]));
    printf("\n");
    } else if (res < 0) {
    printf("<str1> is less than <str2> (%d)\n", res);
    } else {
    printf("<str1> is greater than <str2> (%d)\n", res);
}
exit(EXIT_SUCCESS);

Output

$ ./string_comp ABC ABC
<str1> and <str2> are equal
$ ./string_comp ABC AB # 'C' is ASCII 67; 'C' - '\0' = 67
<str1> is greater than <str2> (67)
$ ./string_comp ABA ABZ # 'A' is ASCII 65; 'Z' is ASCII 90
<str1> is less than <str2> (-25)
$ ./string_comp ABJ ABC
<str1> is greater than <str2> (7)

And then some examples using strncmp():

$ ./string_comp ABC AB 3
<str1> is greater than <str2> (67)
$ ./string_comp ABC AB 2
<str1> and <str2> are equal in the first 2 bytes

String concatenate: strcat()

The strcat() function appends the src string to the dest string, overwriting the terminating null byte (‘\0’) at the end of dest, and then adds a terminating null byte hence the resulting string in dest is always null-terminated.

The strings may not overlap, and the dest string must have enough space for the result. The strcat() function returns a pointer to the resulting string dest. If dest is not large enough, program behaviour is unpredictable. The function signature is as follows:

int strcat(const char *s1, const char *s2);

Demonstration of strcat():

#include <stdio.h>
#include <string.h>
int main() 
{
    char *src = "World";
    char dest[20];
    strcpy(dest, "Hello ");
    printf("%s", strcat(dest, src)); // strcat() returns dest
    return 0;
}

Output:

Hello World

String concatenate: strncat()

The strncat() function is similar, except that
● it will use at most n bytes from src
● src does not need to be null-terminated if it contains n or more bytes.
The function signature is as follows:

char *strncat(char *restrict dest, const char *restrict src, size_t n);

If src contains n or more bytes, strncat() writes n+1 bytes to dest (n from src plus the terminating null byte). Therefore, the size of dest must be at least strlen(dest)+n+1. A simple implementation of strncat() might be:

char *
strncat(char *dest, const char *src, size_t n) 
{
    size_t dest_len = strlen(dest);
    size_t i;
    for (i = 0 ; i < n && src[i] != '\0' ; i++)
    dest[dest_len + i] = src[i];
    dest[dest_len + i] = '\0';
    return dest;
}

Other important string manipulation functions

Exercises

  1. Write a function that accepts a string (a pointer to a character) and deletes the last character by moving the null character 1 position to the left.
  2. Write a function that accepts a string (a pointer to a character) and deletes all the trailing spaces at the end of the string. Make sure that the resultant string is terminated with the null character.
  3. Write a function that accepts a string (a pointer to a character) and deletes all the leading spaces.
  4. Write a function that returns the number of times the character is found in a string. The function has two parameters: the first parameter is a pointer to a string and the second parameter is the character to be continued.
  5. Write a function that inserts a string into another string at a specified position. It returns a positive number if it is successful or zero if it has any problems, such as an insertion location greater than the length of the receiving string. The first parameter is the receiving string. The second parameter is the string to be inserted. And the third parameter is the index of the insertion position in the first string.
  6. Write a program that extracts part of the given string from the specified position. For example, if the string is “Working with strings is fun”, then if from position 4, four characters are to be extracted, then the program should print the string as “king”. If the number of characters to be extracted is 0, then the program should print the entire string
    from the specified position.
  7. Write a program that converts a string like “123” to an integer 123.
  8. Write a program that generates and prints the Fibonacci words of order 0 through 5.
    • f(0) = “a”
    • f(1) = “b”
    • f(2) = “ba”
    • f(3) = “bab”
    • f(4) = “babba”
  9. To uniquely identify a book a 10 digit ISBN (international standard book number) is used. The rightmost digit is a checksum digit. This digit is determined from the other 9 digits using the condition that d1 + 2d2 + 3d3 + … + 10d10 must be a multiple of 11 (where di
    22 denotes the ith digit from the right). The checksum digit d1 can be any value from 0 to 10 the ISBN convention is to use the value x to denote 10. Write a program that receives a 10 digit integer, computes the checksum, and reports whether the ISBN number is correct or not.
  10. A credit card number is usually a 16 digit number. A valid credit card number could satisfy a rule explained below with the help of a dummy credit card number- 4567 1234 5678 9129. Start with the rightmost – 1 digit and multiply every other digit by 2.

Then subtract 9 from any number larger than 10. Thus we get:
8 3 2 6 1 5 9 4
Add them all up to get 38.
Add all the other digits to get 42.
The sum of 38 and 42 is 80. Since 80 is divisible by 10, the credit card number is valid.
Write a program that receives a credit card number and checks using the above rule whether the credit card number is valid

Leave a Reply

Your email address will not be published. Required fields are marked *

You cannot copy content of this page