Structures and Unions

Structures

A structure is a user-defined data type used to group items of possibly different types into a single type. The structure is a very important data type which is very handy to solve real work problems. A primary data type can only hold a single value of a particular type, an array can hold a set of values of the same data type but a structure can hold a congregation of values which could be of different data types, perhaps a structure too.

Structures: The need

Before diving deep into structures, we have to realize their need in the first place. Consider a program where you are trying to keep track of book data. Each book has the following attributes associated with it:

  1. Book name (string)
  2. Author (string)
  3. Publisher (string)
  4. Genre (string)
  5. The number of pages (unsigned int)
  6. Price (float)

One way to solve this is to have 6 different arrays, 1 for each attribute. The attributes associated with a book would have a common index value across the arrays. That is, if a book’s name is stored in the 4th index of the name[] array, its author’s name would be saved in the 4th index of the author[] array and so forth.

There are certain challenges which come with this approach:

  1. If one has to sort the values with respect to the name, the other arrays should also trace the sorting movements of the name[] array. Maintaining this tracing will become cumbersome and tedious. Even after accomplishing this, the code readability would take a really bad hit.
  2. It is an unwieldy approach that obscures the fact that you are dealing with a group of characteristics related to a single entity—the book.
  3. The program becomes more difficult to handle as the number of items relating to the book goes on increasing. If an extra attribute, say the ISBN number is to be added, keeping track of it would become a nightmare.

The C programming language gives a powerful tool to handle such types of problems: Structures.

Structures: The declaration

A structure contains a number of data types grouped together. These data types may or may not be of the same type. The syntax to declare a structure is as follows:

struct book {
    char name[100];
    char author[100];
    char genre[100];
    unsigned int pages;
    float price;
};

The following syntax shows how to declare a variable of the struct book type:

struct book first_book;

The structure variables can be declared in the struct declaration statement itself. It can be done as follows:

struct book {
    char name[100];
    char author[100];
    char genre[100];
    unsigned int pages;
    float price;
} first_book;

Where first_book becomes a variable of type struct book. The variable first_book gets its memory allocated on the stack. The memory allocated is theoretically equal to the sum of the memory allocated to the individual members of the struct. In this case, it is as follows:

  1. name[100] -> 100 bytes
  2. author[100] -> 100 bytes
  3. genre[100] -> 100 bytes
  4. pages -> 4 bytes (assuming the program is running on 32-bit machine)
  5. price -> 4 bytes

Total: 308 bytes.

But in practice, the memory is not allocated unless the data is populated into the structure. And also, the order of fields in a struct does matter – the compiler is not allowed to reorder fields, so the size of the struct may change as the result of adding some padding. More on structure padding in the later part of the notes.

Another way to declare the structures and structure variables is using the typedef keyword. This can done as follows:

// Structure declaration

typedef struct {

    char name[100];
    char author[100];
    char genre[100];
    unsigned int pages;
    float price;
} book;

// Structure variable declaration
book first_book;

Note the following points while declaring a structure type:

  1. The closing brace in the structure type declaration must be followed by a semicolon.
  2. Structure type declaration does not tell the compiler to reserve any space in memory. All a structure declaration does is, it defines the ‘form’ of the structure.
  3. Usually structure type declaration appears at the top of the source code file, before any variables or functions are defined. In very large programs they are usually put in a separate header file, and the file is included (using the preprocessor directive #include) in whichever program we want to use this structure type.

Accessing and modifying structure elements

The syntax to access each member of the structure from the variable first_book is to use the dot operator. It is demonstrated as follows:

// Dot operator
first_book.pages = 509;

If the variable is a pointer to the struct, the following is the syntax to access member elements:

// Arrow operator
first_book->pages = 509

The following program illustrates how to use the struct by initializing it and fetching the values.

#include <stdio.h>
#include <string.h>

struct book {
    char name[100];
    char author[100];
    char genre[100];
    unsigned int pages;
    float price;
} first_book;

int main() {
    strcpy(first_book.name, "C by Quantmasters");
    strcpy(first_book.author, "Prajwal");
    strcpy(first_book.genre, "Programming");
    first_book.pages = 509;
    first_book.price = 2000;

    printf("Name    : %s\n", first_book.name);
    printf("Author  : %s\n", first_book.author);
    printf("Genre   : %s\n", first_book.genre);
    printf("Pages   : %u\n", first_book.pages);
    printf("Price   : %.2f", first_book.price);
    return 0;
}

Output:

Name    : C by Quantmasters
Author  : Prajwal
Genre   : Programming
Pages   : 509
Price   : 2000.00

A structure can be initialized at the time of declaration just like an array initialization. It is illustrated as follows:

book first_book = {"C by Quantmasters", "Prajwal", "Programming", 509, 2000};

How Structure Elements are Stored

Whatever be the elements of a structure, they are always stored in contiguous memory locations. The following program would illustrate this:

#include <stdio.h>
#include <string.h>

typedef struct {
    char name[100];
    char author[100];
    char genre[100];
    unsigned int pages;
    float price;
} book;

book first_book;

int main()
{
    strcpy(first_book.name, "C by Quantmasters");
    strcpy(first_book.author, "Prajwal");
    strcpy(first_book.genre, "Programming");
    first_book.pages = 509;
    first_book.price = 2000;

    printf("Address of Name    : %p\n", first_book.name);
    printf("Address of Author  : %p\n", first_book.author);
    printf("Address of Genre   : %p\n", first_book.genre);
    printf("Address of Pages   : %p\n", &first_book.pages);
    printf("Address of Price   : %p", &first_book.price);
    return 0;

}

Output:

Address of Name    : 0x55f1b2e10040
Address of Author  : 0x55f1b2e100a4
Address of Genre   : 0x55f1b2e10108
Address of Pages   : 0x55f1b2e1016c
Address of Price   : 0x55f1b2e10170

A pictorial explanation of the above output is as follows:

+----------------+
| 0x55f1b2e10040 |
|           +100 | // Decimal: 100 ; Hex: 64 (The size of name[])
+----------------+
| 0x55f1b2e100a4 |
|           +100 | // Decimal: 100 ; Hex: 64 (The size of author[])
+----------------+
| 0x55f1b2e10108 |
|           +100 | // Decimal: 100 ; Hex: 64 (The size of genre[])
+----------------+
| 0x55f1b2e1016c |
|            + 4 | // Decimal: 4 ; Hex: 4 (The size of pages)
+----------------+
| 0x55f1b2e10170 |
|            + 4 | // Decimal: 4 ; Hex: 4 (The size of price)
+----------------+

Structure padding

Structure padding is a concept in C that adds the one or more empty bytes between the memory addresses to align the data in memory. Consider the following program:

#include <stdio.h>

typedef struct {
   char a;  
   char b;  
   int c;  
} student;

int main() {
    printf("%ld", sizeof(student));
    return 0;
}

When we create a variable of this structure, then the contiguous memory will be allocated to the structure members. First, the memory will be allocated to the ‘a’ variable, then ‘b’ variable, and then ‘c’ variable.

It is natural to think that the above program prints 6 (2 bytes of 2 characters and 4 bytes of the integer). But it might not be true when running on a 32-bit compiler. This is due to structure padding.

The processor does not read 1 byte at a time. It reads 1 word at a time. The size of 1 word varies from processor to processor. In a 32-bit processor, 1 word is equal to 4 bytes. Whereas in a 64-bit processor 1 word is 8 bytes.

The problem is that in one CPU cycle on a 32-bit machine, one byte of char a, one byte of char b, and 2 bytes of int c can be accessed (a total of 4 bytes). We will not face any problem while accessing the char a and char b as both the variables can be accessed in one CPU cycle, but we will face the problem when we access the int c variable as 2 CPU cycles are required to access the value of the ‘c’ variable. In the first CPU cycle, the first two bytes are accessed, and in the second cycle, the other two bytes are accessed. The following picture illustrates the memory layout without structure padding:

+--------+-----------------+-----------------+-----------------+----------------+

| word-1 |   1 byte of'a'  |  1 byte of 'b'  | 1st byte of 'c' | 2nd byte of 'c'|

+--------+-----------------+-----------------+-----------------+----------------+

| word-2 | 3rd byte of 'c' | 4th byte of 'c' |

+--------+-----------------+-----------------+

Suppose we want to access the variable ‘c’, which requires two cycles although it could be fetched in a single cycle. This is an unnecessary wastage of CPU cycles. This is where structure padding comes into picture and saves the number of CPU cycles. The structure padding is done automatically by the compiler.

Structure padding brings the first half of variable ‘c’ from word-1 to word-2 and the 3rd and 4th byte of the first word is left empty. This is shown in the following diagram:

+--------+-----------------+-----------------+-----------------+----------------+

| word-1 |   1 byte of'a'  |  1 byte of 'b'  |     |                |

+--------+-----------------+-----------------+-----------------+----------------+

| word-2 | 1st byte of 'c' | 2nd byte of 'c' | 3rd byte of 'c' | 4th byte of 'c'|

+--------+-----------------+-----------------+-----------------+----------------+

Although the memory is increased, the variable can be accessed within a single cycle which increases the efficiency by 50%.

Array of structures

We have seen two powerful data types provided by C: Arrays and Structures. It is possible to get the best of both of these types by creating an Array of structures. In the books example we have seen earlier, maintaining a variable for each book is simply redundant and dumb. Array of structures help in maintaining records of many books easily.

Declaration

The syntax of static array of structures declaration is as follows:

struct book {
	char name[100];
	char author[100];
	char genre[100];
	unsigned int pages;
	float price;
};
struct book books[100];
/********** OR **********/
// Structure declaration
typedef struct {
	char name[100];
	char author[100];
	char genre[100];
	unsigned int pages;
	float price;
} book;
// Structure variable declaration
book books[100];

The following program shows dynamic array of structures declaration:

#include <stdio.h>
#include <stdlib.h>
typedef struct {
	char name[100];
	char author[100];
	char genre[100];
	unsigned int pages;
	float price;
} book;
int main() {
    size_t n_books = 100;
    book *books = (book *) malloc(n_books * n_books);
    free(books);
    return 0;
}

This provides space in memory for 100 structures of the type struct book.

Accessing elements

Accessing an array of structure elements is just as same as any other array element access. It can be done using subscript notation or using pointers. The syntax is as follows:

  1. Access using subscript notation:
struct_var_name[index].member;
  1. Access using pointer notation:
(struct_var_name + index)->member;

Both of these ways are illustrated in the following program:

#include <stdio.h>
#include <stdlib.h>
typedef struct {
        char name[100];
        char author[100];
        char genre[100];
        unsigned int pages;
        float price;
} book;
int main() {
    size_t n_books;
    printf("Enter the number of books: ");
    scanf("%lu", &n_books);
    book books[n_books];
    // Dynamic array:
    // book *books = (book *) malloc (n_books * sizeof(book));
// Accessing elements through subscript notation
    for (size_t i = 0 ; i < n_books ; i++) {
        getchar();
        printf("Enter the name of the book: ");
        scanf("%[^\n]", books[i].name);
        getchar();
        printf("Enter the author of the book: ");
        scanf("%[^\n]", books[i].author);
        getchar();
        printf("Enter the genre of the book: ");
        scanf("%[^\n]", books[i].genre);
        getchar();
        printf("Enter the number of pages in the book: ");
        scanf("%u", &books[i].pages);
        printf("Enter the price of the book: ");
        scanf("%f", &books[i].price);
    }
// Accessing elements through pointer notation
    for (size_t i = 0 ; i < n_books ; i++) {
        printf("Name: %s\nAuthor: %s\nGenre: %s\nPages: %u\nPrice: %.2f\n"
                                        , (books+i)->name
                                        , (books+i)->author
                                        , (books+i)->genre
                                        , (books+i)->pages
                                        , (books+i)->price);
    printf("------------------------------------------------------\n");
    }
    return 0;
}

Output:

Enter the number of books: 2
Enter the name of the book: C by QM
Enter the author of the book: Prajwal
Enter the genre of the book: Programming
Enter the number of pages in the book: 508
Enter the price of the book: 600
Enter the name of the book: Python by QM
Enter the author of the book: Prateek
Enter the genre of the book: Programming
Enter the number of pages in the book: 510
Enter the price of the book: 800
Name: C by QM
Author: Prajwal
Genre: Programming
Pages: 508
Price: 600.00
------------------------------------------------------
Name: Python by QM
Author: Prateek
Genre: Programming
Pages: 510
Price: 800.00
------------------------------------------------------

Points to note

Copying of structures

  • Copying a structure from one variable to another is as simple as using an assignment operator.
struct_1 = struct_2;
  • Unlike copying operations on arrays, the copying of two structures is always a deep copy unless the copying entities are pointers. Hence, when an assignment operator is used, all the elements of struct_2 from the memory allocated to struct_2 are copied to struct_1 in a different memory location allocated to struct_1.

Nested structures

  • It is possible to have nested structures. Consider the following example:
#include <stdio.h>
int main() {
    typedef struct
    {
        char phone[15] ;
        char city[25] ;
        unsigned int pin ;
    } address ;
    typedef struct
    {
        char name[25] ;
        address a ;
    } employee ;
    employee e = { "Prajwal", "+91 9876543210", "Bengaluru", 501 };
    printf ( "Name = %s\nPhone = %s\n", e.name, e.a.phone ) ;
    printf ( "City = %s\nPin = %hu", e.a.city, e.a.pin ) ;
}
  • There are two important points to be noticed:
    • The initialisation at the declaration of ‘e’ is continuous for the elements of the nested structure too. It is intuitive to initialize the following way:
employee e = { "Prajwal", {"+91 9876543210", "Bengaluru", 501} };
  • The above initialization is also valid.
  • The way to access elements of the nested structure is also pretty intuitive.
    • parent_structure.child_structure.element_of_child_structure

Pointers to structures

  • Just like any other pointers to data types, there can be a pointer to structure as well. This is extensively used in practical applications to save storage space.
  • Consider the following statement:
struct address {
    char name[100];
    unsigned int door_no;
    char street[100];
    char locality[100];
    char city[100];
    char state[100];
    int pin;
};
struct address addr = {"Prajwal", 58, "street name", "locality name", "B'lore", "KA", 123456};
struct address *ptr_addr;	// Declaration
ptr_addr = &addr;			// Initialisation

Passing structure to functions

  • Passing a structure to functions is one of the most common operations in Kernel and Network programming.
  • Passing a structure to a function is no different than passing any other variable of basic data types to a function in C.
  • Consider the following program:
#include <stdio.h>

struct address {
    char name[100];
    unsigned int door_no;
    char street[100];
    char locality[100];
    char city[100];
    char state[100];
    int pin;
};

void display(struct address addr);

int main()
{
    struct address addr = {"Prajwal", 58, "street name", "locality name", "B'lore", "KA", 123456};
    display(addr);
    return 0;
}

void display(struct address addr) {
    printf("Name        : %s\n", addr.name);
    printf("Door number : %d\n", addr.door_no);
    printf("Street      : %s\n", addr.street);
    printf("Locality    : %s\n", addr.locality);
    printf("City        : %s\n", addr.city);
    printf("State       : %s\n", addr.state);
    printf("Pin         : %d"  , addr.pin);
}

Output:

Name        : Prajwal
Door number : 58
Street      : street name
Locality    : locality name
City        : B'lore
State       : KA
Pin         : 123456
  • Points to note from the above program
    • The structure definition is in global scope. This is important because the definition should be visible to the display() function as well.
    • The structure addr in the parameter list of display() makes a copy of the structure being passed from the calling function. Hence any changes made to the structure addr inside of display() will be local to the function display(). If the programmer wants an implementation where the changes made to addr structure in display() is reflected in main() function as well, he can pass the address of addr from main() and make the parameter addr a pointer in display().
    • In fact, passing the address of a structure to a function and function populating the data to the structure is a common application we get to see in kernel and network programming as this enables functions to return complex data and multiple data in a single shot.

Exercises

  1. Create a structure containing the following elements relating to a student
    1. Roll number
    2. Name
    3. Department
    4. Course
    5. Year of joining

Assume that there are not more than 100 students in the college.

  1. Write a function to read the year and print names of all students who joined in the given year.
  2. Write a function to read the roll number of the student and print the data.
  3. Write a function to read the department and print the names and roll numbers of all the students in that department.
  4. Write a function to sort the array of structures with respect to the attribute name.
  5. Create a structure to store the following data of customers in a bank:
    1. Account number
    2. Name
    3. Balance in account

Assume a maximum of 200 customers in the bank.

  1. Write a function to print the Account number and name of each customer with balance below Rs. 100.
  2. Write a function which reads account number, amount and debits the specified amount from the corresponding account and returns the updated balance. Print the message, “The balance is insufficient for the specified withdrawal” if the amount requested is greater than the balance.
  3. Write a function which reads account number, amount and credits the account with the specified amount and returns the updated balance.

Unions

The union is a construct that allows memory to be shared by different types of data. Union follows the same format syntax as structure. In fact, with the exception of keywords struct and union, the formats are the same.

This begs for the question, why are unions necessary at all? Consider you want to store a number of 2 digits. There are two ways of doing it. One by declaring a variable with data type short int and another by declaring a character array of size 2. The requirement is to use any one of these approaches and not both of it. This is where the need for Union arises. Union allocates memory equal to its largest member.

The declaration syntax of union is as follows:

union efficient_num {
	short num;
	char num_str[2];
};

It is important to understand the differences between Union and Structure. The following table summarizes the same:

structunion
Keyword used to define: structThe size of a structure is equal to the sum of the sizes of each of its members and padding if any.Used when more than one of the member elements are going to be used to initialize and store data.Altering one data member will not affect the value of another data member.Keyword used to define: unionThe size of a union is equal to the size of the biggest element in the union.Used when any one of the member elements is going to be used.Altering one data member might affect the value of another data member.

Referencing Unions

The rules for referencing union elements are the same as those for structures. To reference individual elements within the union, we can use the dot operator. When union is referenced through a pointer, the arrow operator is to be used. Considering the previous example of union efficient_num, the following are the ways to access its elements considering the variable is named short_var:

short_var.num
short_var.num_str[0];

Initializers

Only the first type declared in the union can be initialized when the variable is defined. The other types can only be initialized by assigning values or reading values into the union. When initializing a union, the values should be enclosed in a set of braces, even if there is only one value.

The following snippet summarizes the concept:

union efficient_num short_var;
short_var.num = 16;
// or
union efficient_num short_var = {16};

Data overflow

By now, it must be clear that the memory allocated to a union is equal to the memory required by its largest member element. What if a member element in the union is initialized to a constant which requires greater memory space than what is allocated to that member element? Let’s find out using the following program:

#include <stdio.h>
union efficient_num {
    short num;
    char num_str[2];
};

int main()
{
    union efficient_num short_var;
    short_var.num = 16706;
    printf("num       : %hu\n", short_var.num);
    printf("num_str[0]: %c\n", short_var.num_str[0]);
    printf("num_str[1]: %c", short_var.num_str[1]);
    return 0;
}

The output of the above program is as follows:

num       : 16706
num_str[0]: B
num_str[1]: A

Notice num_str’s 0th index is initialized with ‘B’ and 1st index is initialized with ‘A’. The reason this happens is because that union is allocated a single block of memory and all its elements point to the same memory block, unlike structures where each of its elements point to their own dedicated memory location. Hence both num and num_str in union efficient_num point to the same memory location. This can be proven using the following program:

#include <stdio.h>
union efficient_num {
    short num;
    char num_str[2];
};

int main()
{
    union efficient_num short_var;
    short_var.num = 16706;
    printf("Address of num    : %p\n", &short_var.num);
    printf("Address of num_str: %p\n", &short_var.num_str);
    return 0;
}

The output of the above program is as follows:

Address of num    : 0x7ffc01c9fbe0
Address of num_str: 0x7ffc01c9fbe0

So, it can be concluded that both the elements of the union are pointing to the same address. Now, when the num is initialized to 16706, the binary pattern stored is: 0100 0001 0100 0010

The size of the short num is 2 bytes and size of a character is 1 byte. Hence the first index of num_str will be pointing to the 1st half of the binary of num and the 0th index of num_str will be pointing to the 2nd half of the binary of num considering that the program is run on a little-endian machine. Hence:

num = 0100 0001 0100 0010
num_str[0]  = 0100 0010
num_str[1]  = 0100 0001

Notice that the binary value num_str[0] is pointing is equivalent to 66 which is the ASCII value of the character ‘B’. Also, the binary value pointed by num_str[1] is equivalent to 65 which is the ASCII value of the character ‘A’. Hence the output is num_str[0]: B, num_str[1]: A.

You cannot copy content of this page