Categories
Computer Science / Information Technology Language: C++

Range based for-loop

While C++ supports all the loops that C supports (for, while and do-while), C++ provides another loop called a range-based for loop.

The idea of the range-based for loop is to loop through a collection of elements and be able to easily access each element without having to worry about the length of the collection or incrementing and decrementing the loop counter. The syntax is as follows:

for (var_type var_name: collection) {
	statement_1;
	statement_2;
	…
	statement_n;
}
// Or
for (var_type var_name: collection)
	statement_1;

Here, the var_type is essentially the data type of the elements in the collection. This loop is entry controlled and the minimum number of iterations in this loop is 0 when the collection is empty. With each iteration, the values in the collection get assigned to var_name and this can be used inside of the loop.

Example:

#include <iostream>
using namespace std;
int main() {
    int num[] = {1, 2, 3, 4};
    for (int i: num)
        cout << i << " ";
    return 0;
}

Output: 1 2 3 4

Use of auto keyword

We do not have to explicitly provide the type of the variable in the range-based for loop. Instead, we can use the auto keyword. This tells the C++ compiler to deduce the type of the collection elements itself. Consider the following example:

#include <iostream>
using namespace std;
int main() {
    int num[] = {1, 2, 3, 4};
    for (auto i: num)
        cout << i << " ";
    return 0;
}

Output: 1 2 3 4

Here the compiler figures out that the variable i has to iterate through a collection of integer elements and hence implicitly considers the data type of i as int.

Iterate through the initialiser list

The range-based for loop doesn’t mandate the collection to be strictly a variable. We can use the loop to iterate through the initialiser list too. This is demonstrated in the following examples:

Example 1
#include <iostream>
using namespace std;
int main() {
    for (auto i: {1, 2, 3, 4})
        cout << i << " ";
    return 0;
}

Output: 1 2 3 4

Example 2
#include <iostream>
using namespace std;
int main() {
    for (auto i: "String")
        cout << i << " ";
    return 0;
}

Output: S t r i n g

Categories
Computer Science / Information Technology Language: C++

Dynamic memory allocation in C++

Despite having classes like vectors in C++, the knowledge of dynamic memory allocation is necessary as the vectors use this in the background and abstract these details to the programmer. Relatively, C++ dynamic memory allocation is simpler than C. As opposed to C’s 4 memory allocation functions (malloc(), calloc(), realloc() and free()), C++ has only 2 keywords (new and delete) to deal with dynamic memory allocation.

Memory handling using new and delete

The new keyword in C++ is conceptually equivalent to calloc() function in C. It allocates the memory from the heap and returns a pointer to the starting of the newly allocated memory.

The delete keyword can be used to deallocate the allocated memory in the heap by the new keyword. It is always recommended to deallocate the memory after use. If not done, memory leaks happen and eventually the application crashes.

For a single entity

  1. The syntax for allocation:
<data_type> *variable_name = new <data_type>;
// Where the data_type is user-defined or C++ built-in.
  1. Explanation:
    • This allocates a single block of memory in heap with a size equivalent to storing one element of the given data_type.
    • The data initialised by default is garbage in relatively older compilers. Hence it is always safer to initialise data manually.
  2. The syntax for deallocation:
delete variable_name;
  1. Example
#include <iostream>
#include <string>
using namespace std;
int main() {
    int *ptr_val = new int;
    cout << ptr_val;
    delete ptr_val;
    return 0;
}

Output: 0x562f303f7eb0

Allocate storage for an array

  1. The syntax for allocation:
<data_type> *variable_name = new <data_type>[size];

Where,

  • The data_type is user-defined or C++ built-in.
  • The size is the number of locations to be allocated. 
  1. Explanation:
    • This allocates a series of size number of contiguous blocks of memory in the heap.
    • The data initialised by default is garbage in relatively older compilers. Hence it is always safer to initialise data manually.
  1. The syntax for deallocation:
delete [] variable_name;
  1. Example
#include <iostream>
#include <string>
const size_t size = 4;
using namespace std;
int main() {
    int *ptr_val = new int[size];
    int val = 1;
    for (size_t i = 0 ; i < 4 ; i++)
        ptr_val[i] = val++;
    for (size_t i = 0 ; i < 4 ; i++)
        cout << ptr_val[i] << " ";
    delete [] ptr_val;
    return 0;
}

Output: 1 2 3 4

Categories
Computer Science / Information Technology Language: C++

Vectors

The vectors are sequence containers to store data of a similar type. Vectors represent arrays that can change in size. Just like arrays, vectors use contiguous storage locations for their elements, which means that their elements can also be accessed using offsets on regular pointers to their elements, and just as efficiently as in arrays. But unlike arrays, their size can change dynamically, with their storage being handled automatically by the container.

Internally, vectors use a dynamically allocated array to store their elements. This array may need to be reallocated in order to grow in size when new elements are inserted, which implies allocating a new array and moving all elements to it. This is a relatively expensive task in terms of processing time, and thus, vectors do not reallocate each time an element is added to the container. Instead, vector containers may allocate some extra storage to accommodate for possible growth, and thus the container may have an actual capacity greater than the storage strictly needed to contain its elements (i.e., its size). 

Libraries can implement different strategies for growth to balance between memory usage and reallocations, but in any case, reallocations should only happen at logarithmically growing intervals of size so that the insertion of individual elements at the end of the vector can be provided with amortized constant time complexity.

Therefore, compared to arrays, vectors consume more memory in exchange for the ability to manage storage and grow dynamically in an efficient way.

Declaration

To use a vector in a C++ program, one needs to include the vector library:

#include <vector>

The syntax of the declaration of a vector is as follows:

vector <data_type> vector_var;

Where, data_type can be any valid primary, secondary or user-defined data type permitted by C++. The data type can be a vector itself.

Example:

vector <int> whole_numbers;

The above example just declares the vector but the size of the vector is 0. The size of the vector can also be mentioned at the time of declaration. The syntax for the same is as follows:

vector <data_type> vector_var (size);

Where size should be strictly an unsigned integer.

Example:

vector <int> whole_numbers (10);

The advantage of mentioning the size at the time of declaration is that it is efficient to allocate memory at compile time and also, all the 10 values of the vector whole_numbers in the above declaration will be initialised to 0. The vector library provides a method size() which returns the number of values the vector object holds.

Example 1
#include <iostream>
#include <vector>
using namespace std;

int main() {
    vector <int> whole_numbers;
    cout << "The size of the vector 'whole_numbers' is "
         << whole_numbers.size();
    return 0;
}

Output: The size of the vector ‘whole_numbers’ is 0

Conclusion: If the size is not mentioned, the default size is 0

Example 2
#include <iostream>
#include <vector>
using namespace std;

int main() {
    vector <int> whole_numbers (10);
    cout << "The size of the vector 'whole_numbers' is "
         << whole_numbers.size();
    return 0;
}

Output: The size of the vector ‘whole_numbers’ is 10

Conclusion: The vector gets declared and the mentioned size is allocated to the vector.

Example 3
#include <iostream>
#include <vector>
using namespace std;

int main() {
    vector <int> whole_numbers (10);
    cout << "The size of the vector 'whole_numbers' is "
         << whole_numbers.size() << endl;
    cout << "The elements in the vector are: ";
    for (size_t i = 0 ; i < whole_numbers.size() ; i++)
        cout << whole_numbers[i] << " ";
    return 0;
}

Output:

The size of the vector 'whole_numbers' is 10
The elements in the vector are: 0 0 0 0 0 0 0 0 0 0

Conclusion: If just the size is mentioned, all the values in the vector will be initialized to 0

Initialising vectors

Using initializer lists

When the values to be present in the vector are known at the time of declaration, the following syntax can be used:

Syntax:

vector <data_type> vector_var {value_1, value_2,..., value_n};
// Or
vector <data_type> vector_var = {value_1, value_2,..., value_n};
// Or
vector <data_type> vector_var = ({value_1, value_2,..., value_n});

Where the values being initialised are strictly of the type data_type.

#include <iostream>
#include <vector>
using namespace std;

int main() {
    vector <char> vowels = {'a', 'e', 'i', 'o', 'u'};
    for (size_t i = 0 ; i < vowels.size() ; i++)
        cout << vowels[i] << " ";
    return 0;
}

Output: a e i o u

Conclusion: Vector initialisation using initialiser lists

Using constructor of vector container

When the value to be present in all n indices of the vector is known, one can use the following syntax:

vector <data_type> vector_var (len, val);

The above syntax declares a vector named vector_var of size len and all the indices will be initialized to val.

Example:

vector <int> max_marks (10, 50);

The above syntax declares a vector named max_marks of size 10 and initializes all 10 indices to the value 50.

#include <iostream>
#include <vector>
using namespace std;

int main() {
    vector <int> whole_numbers (10, 10);
    cout << "The size of the vector 'whole_numbers' is "
         << whole_numbers.size() << endl;
    cout << "The elements in the vector are: ";
    for (size_t i = 0 ; i < whole_numbers.size() ; i++)
        cout << whole_numbers[i] << " ";
    return 0;
}

Output:

The size of the vector 'whole_numbers' is 10
The elements in the vector are: 10 10 10 10 10 10 10 10 10 10

Conclusion: If the size and the value is mentioned, all the values in the vector will be initialized accordingly.

Using another vector

A new vector can be initialised with another vector. All the values of the source vector will be deep copied to the newly declared vector. The following is the syntax:

vector <data_type> vector_var (src_vector);

Where the src_vector strictly contains the values of the same data_type as that of the new vector. Any changes made to vector_var are local to vector_var and don’t reflect in src_vector.

Accessing values of a vector

Using subscripts – [ ]

Accessing vector elements could be done similar to accessing array elements – through subscripts. The syntax to do so is as follows:

vector_var[index];

Where the index is strictly unsigned integer which should be between 0 and size - 1 inclusive.

Array accessing using the subscripts provides no bound checking. That is, the programmer can mention an index value greater than size - 1. This causes the return value to be garbage.

#include <iostream>
#include <vector>
using namespace std;

int main() {
    vector <int> whole_numbers (10, 10);
    cout << whole_numbers[11];
    return 0;
}

Output: 0

Conclusion: The vector element access using subscript notation doesn’t provide bounds checking.

Using at() method

The vector library provides at() method for vector objects to fetch/modify the values in the vector object. The syntax for the same is as follows:

vector_var.at(index);

This behaves in a similar way as that of the subscript access but at() method provides bounds checking. That is, the value of the index should be strictly between 0 and size - 1. Else, the method throws an exception.

#include <iostream>
#include <vector>
using namespace std;

int main() {
    vector <int> whole_numbers (10, 10);
    cout << whole_numbers.at(11);
    return 0;
}

Output:

Error:
terminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check: __n (which is 11) >= this->size() (which is 10)

Conclusion: at() method provides bound checking.

Methods of vector class

C++ provides a rich library of methods to use with vectors. They are listed and described in the below table. For an example, consider a vector named num.

MethodDescriptionReturn typeParameters
sizeReturn size of the vectorsize_tNone
max_sizeReturn the maximum size which the vector is capable of holdingsize_tNone
resizeChange sizevoid(int new_size, int initialisation_value)
initialisation_value is optional and by default is 0
capacityReturn size of allocated storage capacitysize_tNone
emptyTest whether the vector is emptybooleanNone
reserveRequests that the vector capacity is at least enough to contain ‘n’ elements.voidn
shrink_to_fitRequests the container to reduce its capacity to fit its size.voidNone
atAccess element at the specified indexThe data type of the vector elementsIndex value between 0 and size – 1 inclusive
frontAccess first elementThe data type of the vector elementsNone
backAccess last elementThe data type of the vector elementsNone
dataReturns a direct pointer to the memory array used internally by the vector to store its owned elements.Pointer to vector element data typeNone
assignAssigns new contents to the vector, replacing its current contents, and modifying its size accordingly.voidsize_t n: the new size of the vector
value: to fill the vector with
push_backAdd element at the endvoidElement of same data type as that of the vector
pop_backDelete the last elementvoidNone
insertInsert elementsAn iterator that points to the first of the newly inserted elements.Iterator position and value to be inserted
eraseRemoves from the vector either a single element (position) or a range of elements ([first_iterator, last_iterator))An iterator pointing to the new location of the element that followed the last element erased by the function call.Iterator to the element to be deleted or 2 iterators: one to the beginning and the other to the end
swapExchanges the content of the container by the content of x, which is another vector object of the same type. Sizes may differ.voidReference to another vector x
clearRemoves all elements from the vector (which are destroyed), leaving the container with a size of 0.voidnone
emplaceConstruct and insert elementIterator to the new element insertedIterator position to insert the new value and the new value
emplace_backConstruct and insert an element at the endvoidThe new value to insert
Vector class methods

2D Vectors

It is possible to create multi-dimensional vectors in C++. The syntax of the declaration of a 2D vector is as follows:

vector <vector <data_type>> vector_var;

Exercises

  1. Eleven integer values are entered from the keyboard. The first 10 are to be stored in a vector. Search for the 11th integer in the vector. Write a program to display the number of times it appears in the vector.
  2. Read the size of the vector and declare the vector with the size entered. Read the vector elements and also read a key value. Display the number of times the key has appeared in the vector.
  3. Implement the Selection Sort, Bubble Sort and Insertion sort algorithms on a set of 10 numbers using vectors.
  4. Read the size of the vector and declare the vector with the size entered. Read the vector elements and sort the array using selection sort, bubble sort and insertion sort.
  5. Implement the Sieve of Eratosthenes and print the first 100 prime numbers. The algorithm is as follows:
    1. Fill vector values[100] with numbers from 1 to 100.
    2. Starting with the second entry in the array, set all its multiples to zero.
    3. Proceed to the next non-zero element and set all its multiples to zero.
    4. Repeat step 3 till you have set up the multiples of all the non-zero elements to zero.
    5. After step 4, all the non-zero entries left in the array would be prime numbers, so print out these numbers.
  6. Write a program to copy the contents of one vector into another in reverse order.
  7. Write a program to check if a vector is symmetric. That is, in a vector array_id check if array_id[0] = array_id[n-1], array_id[1] = array_id[n-2] and so on, where n is the size of the vector. Statically initialize the array or take user input.
  8. Find the smallest number, the largest number, and an average of values in a vector of floating-point numbers.
  9. Write a function to find the norm of a matrix. The norm is defined as the square root of the sum of squares of all elements in the matrix.
  10. Write a program to read 10 coordinates. Each coordinate contains 2 floating-point values X and Y. Print the two farthest points and 2 nearest points in the vector.

Categories
Computer Science / Information Technology Language: C++

Fundamentals of a C++ program

Overview

These notes focus on C++17 which is one of the latest and most widely used versions of C++ in the industry. To go through this set of documents, the pre-requisite is to have a good knowledge of the C programming language whose notes are available in the previous section. This set of notes on C++ only focuses on what C++ has to offer on top of already what C is offering.

The execution of the coding examples in these notes should be done on standard GNU G++ compilers. One can use various IDEs such as VS Code, vi, Codelite etc.

All C programs can be executed in the C++ compiler but not the other way round. Hence, it is evident that C++ supports all the features of C and can use in its programs.

Structure of a C++ program

C++ has ~90 keywords. More the keywords, more the complexity of the language’s grammar. Although, some of the keywords are seldom used. One need not memorise all the keywords as there is online documentation (https://en.cppreference.com/) for things you seldom need. But it is essential to know a subset of these which are most commonly used. The set of keywords in C++ is as follows:

alignas

alignof


and


and_eq


asm


atomic_cancel


atomic_commit


atomic_noexcept


auto


bitand


bitor


bool


break


case


catch


char

char8_t

char16_t

char32_t

class

compl

concept

const

consteval

constexpr

constinit

const_cast

continue

co_await

co_return

co_yield
decltype

default

delete

do

double

dynamic_cast

else

enum

explicit

export

extern

false

float

for

friend

goto

if

inline

int

long

mutable

namespace

new

noexcept

not

not_eq

nullptr

operator

or

or_eq

private

protected

public
reflexpr

register

reinterpret_cast

requires

return

short

signed

sizeof

static

static_assert

static_cast

struct

switch

synchronized

template

this

thread_local

throw

true

try

typedef

typeid

typename

union

unsigned

using

virtual

void

volatile

wchar_t

while

xor

xor_eq
C++ keywords

Preprocessor directives

A preprocessor also called a precompiler, is a program that processes the source code before the compiler gets the code. It looks for preprocessor directives and executes them. Preprocessor directives start from # (pound symbol/ hashtag symbol). Examples of preprocessor directives are as follows:

#includes <header_files.h>

In the C/C++ programming languages, the #include directive tells the preprocessor to insert the contents of another file into the source code at the point where the #include directive is found. Include directives are typically used to include the header files for C functions that are held outside of the current source file.

For example, we use math functions such as pow and log which we call from the source code we write. But the definition of these functions is made in the standard header files present. By including math.h using this preprocessor, the definitions are fetched from the header files and put into the source code before the compilation phase. The object code and binary generated would contain the definition of these functions too.

#include "header_file.h"

When certain function calls we make in our source code are not present in the standard library provided by C/C++, we can declare them in a separate header file and write the corresponding definition in the C/C++ file. This needs to be included in the source code we write. This user-defined header file inclusion is done using the double inverted comma specification in contrast to the angular bracket specification which is done for the built-in header files.

#if, #elif, #else, #endif

The #if directive, with the #elif, #else, and #endif directives, controls the compilation of portions of a source file. If the expression written (after the #if) has a nonzero value, the line group immediately following the #if directive is kept in the translation unit.

Each #if directive in a source file must be matched by a closing #endif directive. Any number of #elif directives can appear between the #if and #endif directives, but at most one #else directive is allowed. The #else directive, if present, must be the last directive before #endif. The #else directive is optional to exist.

The #if, #elif, #else, and #endif directives can nest in the text portions of other #if directives. Each nested, or #endif directive belongs to the closest preceding #if directive.

All conditional-compilation directives, such as #if and #ifdef, must match a closing #endif directive before the end of the file. Otherwise, an error message is generated. When conditional-compilation directives are contained in include files, they must satisfy the same conditions: There must be no unmatched conditional-compilation directives at the end of the include file.

Macro replacement is done within the part of the line that follows an #elif command, so a macro call can be used in the constant expression. The preprocessor selects one of the given occurrences of text for further processing. A block specified in the text can be any sequence of text. It can occupy more than one line. Usually, the text is program text that has meaning to the compiler or the preprocessor.

The preprocessor processes the selected text and passes it to the compiler. If the text contains preprocessor directives, the preprocessor carries out those directives. Only text blocks selected by the preprocessor are compiled.

The preprocessor selects a single text item by evaluating the constant expression following each #if or #elif directive until it finds a true (non-zero) constant expression. It selects all text (including other preprocessor directives beginning with #) up to its associated #elif, #else, or #endif.

If all occurrences of constant expression are false, or if no #elif directives appear, the preprocessor selects the text block after the #else clause. When there’s no #else clause, and all instances of constant expression in the #if block is false, no text block is selected.

The constant expression is an integer constant expression with these additional restrictions:

  1. Expressions must have an integral type and can include only integer constants, character constants, and the defined operator.
  2. The expression can’t use sizeof or a type-cast operator.
  3. The target environment may be unable to represent all ranges of integers.
  4. The translation represents type int the same way as type long, and unsigned int the same way as unsigned long.
  5. The translator can translate character constants to a set of code values different from the set for the target environment. To determine the properties of the target environment, use an app built for that environment to check the values of the LIMITS.H macros.
  6. The expression must not query the environment and must remain insulated from implementation details on the target computer.

#ifdef

In the C Programming Language, the #ifdef directive allows for conditional compilation. The preprocessor determines if the provided macro exists before including the subsequent code in the compilation process. If the definition of the macro exists, the block inside #ifdef is skipped to be sent to the translator and sent to the translator otherwise.

Syntax:

#ifdef macro_definition
    Macro_definition
#endif

The macro definition must be defined for the preprocessor to include the C source code in the compiled application. Note that the #ifdef directive must be closed by an #endif directive.

Example:

#include <stdio.h>

#define DEFINED 1

int main()
{
    #ifdef DEFINED
    printf("DEFINED!");
    #endif
    return 0;
}

Output: DEFINED!

A common use for the #ifdef directive is to enable the insertion of platform-specific source code into a program.

#ifndef

In the C Programming Language, the #ifndef directive allows for conditional compilation. The preprocessor determines if the provided macro does not exist before including the subsequent code in the compilation process. If the definition of the macro exists, the block inside #ifndef is included to be sent to the translator and skipped otherwise.

Syntax:

#ifndef macro_definition
    Macro_definition
#endif

The macro definition must be defined for the preprocessor to include the C source code in the compiled application. Note that the #ifndef directive must be closed by an #endif directive.

Example:

#include <stdio.h>

// #define DEFINED 0

int main()
{
    #ifndef DEFINED
    printf("NOT DEFINED!");
    #endif
    return 0;
}

Output: NOT DEFINED!

A common use for the #ifndef directive is to enable the insertion of platform-specific source code into a program.

#define

The #define directive allows the definition of macros within your source code. These macro definitions allow constant values to be declared for use throughout your code. All the occurrences of the name of the constant done using #define throughout the source code are replaced with the defined constant value

Macro definitions are not variables and cannot be changed in the program code. This syntax is generally used while creating constants that represent numbers, strings or expressions.

Syntax

#define CONSTANT value
// OR
#define CONSTANT (expression)
  • CONSTANT is the name of the constant. It is a common practice to define the constants in all uppercase but there is no explicit rule that this has to be done.
  • value is the value of the constant.
  • expression: The expression whose value is assigned to the constant. The expression must be enclosed in parentheses if it contains operators.

Note that the semicolon character should not be put at the end of #define statements. This is a common mistake.

Examples:

#include <iostream>
#define COMPANY "Quantmasters"
int main()
{
    std::cout << COMPANY << " is an ed-tech company";
    return 0;
}

Output: Quantmasters is an ed-tech company

#include <iostream>
#define NUMBER (10/2)

int main()
{
    std::cout << "10 / 2 = " << NUMBER;
    return 0;
}

Output: 5

#undef

The #undef directive tells the preprocessor to remove all definitions for the specified macro. A macro can be redefined after it has been removed by the #undef directive. Once a macro is undefined, an #ifdef directive on that macro will evaluate as false.

Syntax

#undef macro_definition

Example:

#include <stdio.h>

#define DEFINED 1

#undef DEFINED

int main()
{
    #ifdef DEFINED
        printf("DEFINED");
    #endif
    #ifndef DEFINED
        printf("NOT DEFINED");
    #endif
    return 0;
}

Output: NOT DEFINED

In this example, the DEFINED macro is first defined with a value of 1 and then undefined using the #undef directive. Since the macro no longer exists, the statement #ifdef DEFINED evaluates to false. This causes the subsequent printf function to be skipped.

#line

The #line directive tells the preprocessor to set the compiler’s reported values for the line number and filename to a given line number and filename.

Syntax

#line digit-sequence ["filename"]

Where filename is an optional attribute and digit-sequence is strictly an integer value

Examples:

#include <stdio.h>	

int main()
{	
	printf("Line: %d\n",__LINE__);	// printing line number, line 7

// resetting line to 23, although next line number is line 10. 
	#line 2
	printf("Line: %d\n",__LINE__);	// printing line number
	
	printf( "Line: %d, File: %s\n", __LINE__, __FILE__ );
	// now we use line to reset filename to "new_filename.c"
	// line number is set to 83
	#line 83 "new_filename.c"
	printf( "Line: %d, File: %s\n", __LINE__, __FILE__ );
	return 0;
}

Output:

Line: 5
Line: 23
Line: 25, File: main.cpp
Line: 83, File: new_filename.c

#error

The #error directive causes preprocessing to stop at the location where the directive is encountered. Information following the #error directive is output as a message prior to stopping preprocessing.

Syntax

#error message

Examples:

#include <stdio.h>
#define NUM 50
int main()
{
    #ifndef NUM
        #error NUM not defined
    #endif
    return 0;
}

The above code doesn’t produce any output.

#include <stdio.h>
#define NUM 50
int main()
{
    #ifdef NUM
        #error NUM defined
    #endif
    return 0;
}

Output:

main.c: In function ‘main’:
main.c:6:10: error: #error NUM defined
    6 |         #error NUM defined

#pragma

This directive is a special purpose directive and is used to turn on or off some features. These types of directives are compiler-specific i.e., they vary from compiler to compiler.

#warning

The #warning directive is similar to a #error directive but does not result in the cancellation of preprocessing. Information following the #warning directive is output as a message prior to preprocessing continuing.

Syntax

#warning message

Examples:

#include <stdio.h>
#define NUM 50
int main()
{
    #ifndef NUM
        #warning NUM not defined
    #endif
    return 0;
}

The above code doesn’t produce any output.

#include <stdio.h>
#define NUM 50
int main()
{
    #ifdef NUM
        #warning NUM defined
    #endif
    return 0;
}

Output:

main.c: In function ‘main’:
main.c:6:10: warning: #warning NUM defined [-Wcpp]
    6 |         #warning NUM defined
Overview of preprocessor directives
Preprocessor directives mind map

One of the most used of the above examples is #include directive. The preprocessor sees the #include statement and replaces it with the corresponding header file that it is referring to. The header files usually contain the prototypes and the signatures of the functions the program will use. Then the preprocessor recursively processes the replaced content as well so as to get the definitions of the files declared in the header file. This process’s output is a file that contains the program the programmer has written along with the function signatures and definitions the user has used in the program. This makes it easy for the compiler to do its job.

Sometimes, the programmer might want to compile code to generate platform-dependent binaries such as windows or mac. In this case, the code has to look for the libraries supporting the corresponding operating system. This is a conditional compilation. This is widely used in the context of cross-compilation where the compilation happens on one machine which generates binaries and these binaries are executed on another machine. To achieve this, preprocessor directives such as #if, #elif, #else etc are extensively used.

It is important to note that the C++ preprocessor does not understand C++. It simply follows the preprocessor directives and gets the source code ready for the compiler. The compiler is the program that understands C++.

Comments

The comments are the programmer’s readable explanations so as to understand what a piece of code does. The comments in the C++ source code are the same as those in C. Double forward-slash (//) for single-line comments and multiline comments enclosed in /* and */.

The main() function

Every C++ program must have one and only main() function. A program may contain n files but the main() must be there in any one of them. When a C++ program executes, the main() function is called by the operating system. The logic present in the main() function is executed and the value returned by the main() function is received by the operating system. Conventionally, if the return value is 0, then the program has been executed successfully. If otherwise, there could be an error table maintained to check what went wrong.

There are 2 versions of main(), both of which are accepted as a standard version of main(). They are as follows:

int main() {
    // code
    return 0;
}
int main(int argc, char *argv[]) {
    // code
    return 0;
}

The first version of the main() is mostly used. The second version is used to receive command-line arguments from the operating system. The second version expects 2 pieces of information from the operating system:

  1. Argument count – Count of the arguments (argc).
  2. Argument vector – The 2D character array of the list of arguments (argv).

The common-line arguments handling and usage will be dealt with in the latter part of the notes. Note that the main should always return an integer.

Namespace

As and when the C++ programs we write get more complex, we often use the C++ library code by importing them, libraries that are written by 3rd party developers combined with, of course, our code. A variable or a function may be defined in the standard library and the same variable or a function may be redefined by 3rd party library which causes a conflict where the compiler doesn’t know which variable/function to use when called. This is called naming conflict.

C++ namespace is a feature which acts as a container that groups the code entities. If a programmer wants to use a variable or a function from a particular namespace, one can use the scope resolution operator (::). The syntax to do so is as follows:

<namespace>::<variable/function>

One might find it tedious to use this syntax every single time a variable has to be called or used. C++ provides a workaround to do that. One can use the using namespace directive at the beginning of the program to use a particular namespace. But note that, once the using namespace directive is used, it brings all the variables and functions which are defined in that namespace. This might lead to conflict too.

C++ provides a solution to this problem too. The programmer can mention the specific coding entities being used in the program at the beginning of the program. Only those entities will be imported. The syntax to do this is as follows:

using <namespace>::<entity>

First C++ program

In this topic, we shall see our first C++ program using the concepts we have learned so far. Consider the following program:

#include <iostream>
using namespace std;
int main() {
    return 0;
}

This program neither performs any operations nor does it print anything. It simply imports the iostream library, uses the using namespace directive to let the compiler know that it is going to use entities from the std namespace (though it uses none of the entities), and defines the main() function which just returns 0 and exits.

Basic input-output using cin and cout

cin, cout, cerr and clog are defined in the C++ standard. To use these one must include the iostream library. C++ uses stream abstraction to handle IO and devices like keyboard and console. cout is an output stream that defaults to the output console/screen. cerr and clog are also output streams that default to standard error and standard log respectively. cin is an input stream that defaults to the keyboard.

The insertion operator (>>) and the extraction operator (<<) are used with input and output streams respectively.

The insertion operator

The insertion operator inserts the value from the operand from the right to the operand to the left. Consider the following statement:

std::cout << variable_1;

In this case, the insertion operator inserts the value of variable_1 to cout output stream. As the cout is the default console, this statement makes the value of variable_1 displayed on the screen.

Since we are using stream abstraction, we can chain multiple insertions into the same statement which simplifies the basic IO very easy. An example of this is as follows:

std::cout << “The value in variable_1 is: ” << variable_1;

The insertion operator does not automatically add linebreaks to move the cursor to the next line on the console. But it can be achieved in two ways which are depicted in the following examples:

std::cout << variable_1 << “\n”;
// Or
std::cout << variable_1 << std::endl;

If the end line manipulator (endl) is used, it flushes the stream too but the new line character \n just brings the cursor to the new line.

Examples: Insertion operator
#include <iostream>
using namespace std;
int main() {
    cout << “Hello World”;
    return 0;
}

Output: Hello world

#include <iostream>
using namespace std;
int main() {
    cout << “Hello”;
    cout << “World”;
    return 0;
}
  • Output: HelloWorld
  • Conclusion: The insertion operator does not add any extra characters such as space or newline.
The extraction operator

This operator extracts the information from the operand to the left and stores this information in the operand to the right. Consider the following example:

std::cin >> variable_2;

In this case, the value from cin, which by default is keyboard, is taken and stored in variable_2. The way in which the information is interpreted is based on the type of the variable. If the data type of variable_2 is an integer, the input value is converted to an integer and then passed onto variable_2. If the variable is float, the value is converted to float.

The extraction operator can be chained to read multiple values in a single line. This is illustrated in the following example:

std::cin >> variable_2 >> variable_3;

The above statement reads a value from the keyboard converts the value to the data type of variable_2 and stores it in variable_2. The same process is repeated again for variable_3. So the precedence is from left to right.

Note that the extraction operation could fail if the value entered cannot be interpreted in the data type of the target variable on the right side. For example, if the data type of the variable is an integer and the user enters “Prajwal” which is a string. This will cause the operation to fail.

Example 1
#include <iostream>
#include <string>
using namespace std;
int main() {
    int variable_1;
    cin >> variable_1;
    cout << "variable_1: " << variable_1 << endl;
    string variable_2;
    cin >> variable_2;
    cout << "variable_2: " << variable_2;
    return 0;
}

Output

1
variable_1: 1
asdf
variable_2: asdf

Conclusion: The input value is interpreted based on the data type of the variable in which it is going to be stored. Hence the input value for variable_1 was an integer and for variable_2, it was interpreted as a string.

Example 2
#include <iostream>
#include <string>
using namespace std;
int main() {
    int variable_1;
    string variable_2;
    cin >> variable_1 >> variable_2;
    cout << "variable_1: " << variable_1 << endl;
    cout << "variable_2: " << variable_2;
    return 0;
}

Output

1
asdf
variable_1: 1
variable_2: asdf

Conclusion: The input value can be chained into a single cin statement and it works accordingly.

C++ primitive data types

The computer stores the information in binary representation. And the size of the primitive data types is expressed in bits. The number of bits allocated to a data type is directly proportional to the number of unique values it can store and also the size of memory it needs. Hence it is important to choose the data type necessary for the application being used. Size and precision are compiler dependent. climits library contains the size and precision of the compiler the program is being compiled by.

Size in bitsNumber of unique values representable
121
222
424
828
16216
Character types

This is used to represent character types such as the ones in ASCII table. This is often represented in 8 bits (1 byte). From the above table, it is clear that one can represent a maximum of 2^8 = 256 distinct characters using 8 bits. However, some languages such as Mandarin have thousands of characters which cannot be represented in just 8 bits. In order to support these languages, C++ supports wider character types which can be as large as necessary. The following table describes some of the character types C++ supports

Type NameSize/Precision
charExactly 1 byte
char16_t16 bits
char32_t32 bits
wchar_tCan represent the largest available character set

Unicode is a common standard used to represent multiple character sets in any language.

Integer types

This is used to represent whole numbers, both signed and unsigned. There are many versions of this data type. The following table shows the C++ integer data types for both signed and unsigned integers.

Type nameSizeclimits macrosRange
signed short int16 bitsSHRT_MIN / SHRT_MAX[−32,767, +32,767]
signed int16 bitsINT_MIN / INT_MAX[−32,767, +32,767]
signed long int32 bitsLONG_MIN / LONG_MAX[−2,147,483,647, +2,147,483,647]
signed long long int64 bitsLLONG_MIN / LLONG_MAX[−9,223,372,036,854,775,807, +9,223,372,036,854,775,807]
unsigned short int16 bits0 / USHRT_MAX[0, 65,535]
unsigned int16 bits0 / UINT_MAX[0, 65,535]
unsigned long int32 bits0 / ULONG_MAX[0, 4,294,967,295]
unsigned long long int64 bits0 / ULLONG_MAX[0, +18,446,744,073,709,551,615]

In addition to this, it is possible to store both signed and unsigned integers in character data type. This capability of C/C++ is often exploited to efficiently store integers of a shorter range.

Floating-point types

The usual method used by computers to represent real numbers is floating-point notation. There are many varieties of floating-point notation and each has individual characteristics. The key concept is that a real number is represented by a number called mantissa, times a base raised to an integer power called an exponent. The base is usually fixed, and the mantissa and the exponent vary to represent different real numbers. For example, if the base is fixed at 10, the number 123.45 could be represented as 12345 x 10-2. The mantissa is 12345, and the exponent is -2. Other possible representations are 0.12345 x 103 and 123.45×100. We choose the representation in which the mantissa is an integer with no trailing 0s.

In the floating-point notation, a real number is represented by a 32-bit value consisting of a 24-bit mantissa followed by an 8-bit exponent. The base is fixed at 10. Both the mantissa and the exponent are twos complement binary integers. For example. The 24-bit binary representation of 12345 is 0000 0000 0011 0000 0011 1001. And the 8-bit twos complement binary representation of -2 is 1111 1110; the representation of 123.45 is 0000 0000 0011 0000 0011 1001 1111 1110.

The advantage of floating-point notation is that it can be used to represent numbers with extremely large or extremely small absolute values. The floating point has 3 types: float, double and long double. The following table describes these types:

TypePrecisionRange
float7 decimal digits1.2 x 10-38 to 3.4 x 1038
double15 decimal digits2.2 x 10-308 to 1.8 x 10308
long double19 decimal digits3.3 x 10-4932 to 1.2 x 104932
Boolean type

The boolean data type is used to represent true or false values. In C++, 0 is false and any non-zero value is true. C++ also supports ‘true’ and ‘false’ keywords to often use with the boolean data types. Boolean data type usually takes up to 8 bits of memory size.

Declaring and using variables

Apart from using the assignment operator to initialise identifiers with constants, C++ provides one more way to do the same job:

int variable_1 {100};

does the same job as

int variable_1 = 100;
Categories
Computer Science / Information Technology Language: C

Pointers

A pointer is a variable whose value is the address of another variable, i.e., the direct address of the memory location.

Pointer Notation

Consider the declaration,

int i = 3 ;

This declaration tells the C compiler to:

  1. Reserve space in memory to hold the integer value.
  2. Associate the name ‘i’ with this memory location.
  3. Store the value 3 at this location. We may represent i’s location in memory by the following memory map.

We see that the computer has selected memory location 65524 as the place to store the value 3. The location number 65524 is not a number to be relied upon, because some other time the computer may choose a different location for storing the value 3. The important point is, i’s address in memory is a number. We can print this address number through the following program:

int main( )
{
   int i = 3 ;
   printf ( "\nAddress of i = %u", &i ) ;
   printf ( "\nValue of i = %d", i ) ;
}

The output of the above program would be:

Address of i = 65524

Value of i = 3
  • The ‘&’ used in printf( ) statement is C’s ‘address of’ operator. It returns the address of the variable ‘i’, which in this case happens to be 65524.
  • As the address cannot be negative, it is printed out using %u, which is a format specifier for printing an unsigned integer.
  • The other pointer operator available in C is ‘*’, called the ‘value at address’ operator. It gives the value stored at a particular address. The ‘value at address’ operator is also called the ‘indirection’ operator. Observe the output of the following program:
#include <stdio.h>
int main()
{
    int i = 3 ;
    printf ( "\nAddress of i = %u", &i ) ;
    printf ( "\nValue of i = %d", i ) ;
    printf ( "\nValue of i = %d", *( &i ) ) ;
    return 0;
}

The output of the above program would be:

Address of i = 65524
Value of i = 3
Value of i = 3

Note that printing the value of *( &i ) is the same as printing the value of i. The expression &i gives the address of the variable i. This address can be collected in a variable, by saying,

j = &i ;

But remember that j is not an ordinary variable like any other integer variable. It is a variable that contains the address of another variable (i in this case). Since j is a variable the compiler must provide it space in the memory. Once again, the following memory map would illustrate the contents of i and j.

As you can see, i’s value is 3 and j’s value is i’s address. But wait, we can’t use j in a program without declaring it. And since j is a variable that contains the address of ‘i’, it is declared as,

int *j ;

Like any variable or constant, you must declare a pointer before using it to store any variable address. The general form of a pointer variable declaration is −

type *var-name;

Here, type is the pointer’s base type; it must be a valid C data type and
var-name is the name of the pointer variable. The asterisk * used to declare a pointer is the same asterisk used for multiplication. However, in this statement, the asterisk is being used to designate a variable as a pointer. Take a look at some of the valid pointer declarations −

int *ip;     /* pointer to an integer */
double *dp;  /* pointer to a double */
float *fp;   /* pointer to a float */
char *ch     /* pointer to a character */

The actual data type of the value of all pointers, whether integer, float, character or otherwise, is the same, a long hexadecimal number that represents a memory address. The only difference between pointers of different data types is the data type of the variable or constant that the pointer points to.

How to Use Pointers?

There are a few important operations, which we will do with the help of pointers very frequently.

  1. We define a pointer variable.
  2. Assign the address of a variable to the pointer.
  3. Access the value at the address available in the pointer variable, also called dereferencing the pointer.

This is done by using the unary operator * that returns the value of the variable located at the address specified by its operand. The following example makes use of these operations −

#include <stdio.h>
int main ()
{
    int var = 20; /* actual variable declaration */
    int *ip; /* pointer variable declaration */

    ip = &var; /* store address of var in pointer variable*/

    printf("Address of var variable: %x\n", &var );

    /* address stored in pointer variable */
    printf("Address stored in ip variable: %x\n", ip );

    /* access the value using the pointer */
    printf("Value of *ip variable: %d\n", *ip );

    return 0;
}

When the above code is compiled and executed, it produces the following result −

Address of var variable: bffd8b3c
Address stored in ip variable: bffd8b3c
Value of *ip variable: 20

The assignment operation (=) between two pointers makes them point to the same pointee. It’s a simple rule for a potentially complex situation, so it is worth repeating: assigning one pointer to another makes them point to the same thing. Consider the following example:

#include <stdio.h>
int main() {
    int num = 42;
    int *numPtr, *second;
    numPtr = &num;
    second = numPtr; // Same as: second = &num;
    printf("Value of num: %d\n", num);
    printf("Address of num: %x\n", &num);
    printf("Value of numPtr: %x\n", numPtr);
    printf("Value numPtr pointing to: %d\n", *numPtr);
    printf("Value of second: %x\n", second);
    printf("Value second pointing to: %d\n", *second);
    return 0;
}

Output

Value of num:             42
Address of num:           6a6e55f4
Value of numPtr:          6a6e55f4
Value numPtr pointing to: 42
Value of second:          6a6e55f4
Value second pointing to: 42

The above program can be summed up with the following block diagram:

Tip: Make drawings. Memory drawings are the key to thinking about pointer code.
When you are looking at code, thinking about how it will use memory at run time, make a quick drawing to work out your ideas. That’s the way to do it.

Sharing

Two pointers referring to the same memory address are said to be “sharing”. That two or more entities can cooperatively share a single memory structure is a key advantage of pointers in all computer languages. Sharing can be used to provide efficient communication between parts of a program.

Pointer Arithmetic

A pointer in c is an address, which is a numeric value. Therefore, you can
perform arithmetic operations on a pointer just as you can on a numeric value. There are four arithmetic operators that can be used on pointers: ++, –, +, and -. Before jumping into the examples, please make sure you have a basic idea of hexadecimal values and arithmetic operations on them as all virtual addresses are hexadecimal values. Consider the following program running on a 32-bit compiler (hence the size of an integer is 4 bytes):

#include <stdio.h>
int main() 
{
    int i = 5;
    int* ptr = &i;
    printf ("%p\n", ptr);
    return 0;
}

The output is:

0x7ffd58311d7c

In the above simple program, the address of an integer value is assigned to a pointer. And the address stored in the pointer is being printed. Now, consider adding 1 to the pointer directly and also by using post increment, which is done in the following program:

#include <stdio.h>
int main()
{
    int i = 5;
    int* ptr = &i;
    printf ("ptr : %p\n", ptr);
    ptr = ptr + 1;
    printf ("ptr + 1 : %p\n", ptr);
    ptr++;
    printf ("ptr++ : %p\n", ptr);
    return 0;
}

The output where the last two characters of the address is highlighted for convenience:

ptr : 0x7ffd58311d7c
ptr + 1 : 0x7ffd58311d80
ptr++ : 0x7ffd58311d84

One may notice that ptr + 1 is 80 and not 7d. Consider another example where we attempt to add 1 to a pointer pointing to a value of data type double:

#include <stdio.h>
int main()
{
    double i = 5;
    double* ptr = &i;
    printf("Size of double: %ld\n", sizeof(double));
    printf ("ptr : %p\n", ptr);
    ptr = ptr + 1;
    printf ("ptr + 1 : %p\n", ptr);
    ptr++;
    printf ("ptr++ : %p\n", ptr);
    return 0;
}

The output where the last two characters of the address are highlighted for convenience. It is important to remember that the addition is happening to hex values and not decimal values.

Size of double: 8
ptr : 0x7ffe92406338

ptr + 1 : 0x7ffe92406340
ptr++ : 0x7ffe92406348

Points to note from the above two examples:

  • One can add integer values to a pointer either by using the ‘+’ operator or by post (and pre) incrementation.
ExpressionData type ptr pointing toSize on 16-bit compilerOperation (hex addition)Result
ptr + 1int47c + 480
ptr++int490 + 484
ptr + 1double838 + 840
ptr++double840 + 848
  • Hence, it can be concluded that the compiler is smart enough to add the integral multiple of the size of the data type it is pointing to with the factor by which it is getting added. That is, if the factor is 1, the addition will be size * 1, if the factor is 2, the addition will be size * 2.

Pointer arithmetic is extensively used in arrays. It is further explored in the ‘Arrays’ section.

Comparison of pointers

Consider the following example:

#include <stdio.h>
int main()
{
    double i = 5;
    double* ptr_1 = &i;
    double* ptr_2 = &i; // Same as double* ptr_2 = ptr_1
    if (ptr_1 == ptr_2)
    printf("ptr_1 and ptr_2 are pointing to same location");
    return 0;
}

Output:

ptr_1 and ptr_2 are pointing to same location

From the above example, it is clear that a comparison operator of equality between two pointers checks if the pointers are pointing to the same memory location. It is often that we use a comparison operator to check if a pointer is a NULL pointer (discussed in the ‘NULL pointers’ subsection).

Caution

It is very important to realize that the pointers are pointing to an unallocated area or possibly an area allocated to some other process or variable local to the program. Any attempt to dereference a pointer in the above examples and similar scenarios will result in segmentation fault errors.
Do not attempt the following operations on pointers… they would never work out.

  1. Addition of two pointers
  2. Multiplication of a pointer with a constant
  3. Division of a pointer with a constant

Exercise: Write a program which decrements ptr values pointing to various other data types, using arithmetic ‘-’ operator and pre and post decrement unary operators. Analyze the behaviour of the pointers in the program.

Shallow and Deep Copying

In particular, sharing can enable communication between two functions. One function passes a pointer to the value of interest to another function. Both functions can access the value of interest, but the value of interest itself is not copied. This communication is called “shallow” since instead of making and sending a (large) copy of the value of interest, a simple pointer is sent.

The alternative where a complete copy is made and sent is known as a “deep” copy. Deep copies are simpler in a way since each function can change its copy without interfering with the other copy, but deep copies run slower because of all the copying. This topic is explained in detail with examples under “Heaps” in the “Advanced topics in Pointers” section.

NULL Pointers

A pointer that is assigned NULL is called a null pointer. It is always a good practice to assign a NULL value to a pointer variable in case you do not have an exact address to be assigned. This is done at the time of variable declaration.

The NULL pointer is a constant with a value of zero defined in several standard libraries. Consider the following program −

#include <stdio.h>
int main ()
{
    int *ptr = NULL;
    printf("The value of ptr is : %x\n", ptr );
    return 0;
}

Output:

The value of ptr is 0

In most operating systems, programs are not permitted to access memory at address 0 because that memory is reserved by the operating system. However, the memory address 0 has special significance; it signals that the pointer is not intended to point to an accessible memory location. But by convention, if a pointer contains the null (zero by address convention) value, it is assumed to point to nothing.

To check for a null pointer, you can use an ‘if’ statement as follows −

if(ptr) /* succeeds if p is not null */
if(!ptr) /* succeeds if p is null */

Void pointers

It is a pointer that has no associated data type with it. A void pointer can hold addresses of any type and can be typecast to any type. It is also called a generic pointer and does not have any standard data type. It is created by using the keyword void.

#include <stdio.h>
int main()
{
    void *p = NULL; //void pointer assigned to NULL
    printf("The size of pointer is:%ld\n", sizeof(p)); //size of p is
    platform dependant
    return 0;
}

Important Points:

  • void pointers cannot be dereferenced directly. It can however be done using type casting the void pointer. The following example illustrates the rule:
#include<stdlib.h>
#include<stdio.h>
int main()
{
    int x = 4;
    void *ptr = &x;
    
    // (int*)ptr - does type casting from void pointer to
    Integer pointer
    // *((int*)ptr) dereferences the typecasted void pointer
    variable.
    
    printf("Integer variable is = %d", *( (int*) ptr) );
    // Similarly, the same void pointer can hold address of a
    float variable
    float y = 3.6;
    ptr = &y;
    printf("\nFloat variable is= %f", *( (float*) ptr) );
    
    return 0;
}
  • In the above example, it is clear that one needs to typecast the pointer to the desired data type and then use it accordingly. The syntax to do this is as follows:
○ Typecast:
     (target_data_type *) pointer_variable
○ Dereference:
    * ((target_data_type *) pointer_variable)
  • Pointer arithmetic is not possible on pointers of void due to lack of concrete value and thus size.

Wild/Bad/Uninitialized pointers

Wild pointers are also called uninitialized pointers or bad pointers. Because they point to some arbitrary memory location and may cause a program to crash or behave badly.

Bad pointers are very common. In fact, every pointer starts out with a bad value. Correct code overwrites the bad value with a correct reference to a pointee, and thereafter the pointer works fine. There is nothing automatic that gives a pointer a valid initialisation.

The following program illustrates an example of wild pointers. This might produce an unpredictable output.

#include <stdio.h>
int main() {
    int *p; //wild pointer
    printf("\n%d",*p);
    return 0;
}

A very common bad pointer example is as follows:

void BadPointer() {
int* p;
// allocate the pointer, but not the pointee
*p = 42;
// this dereference is a serious runtime error
}

As one can see in the above program, a pointer is declared and is uninitialized but an attempt is made to dereference it which will cause the program to crash.

The bad code will compile fine, but at run-time, each dereferences with a bad pointer will corrupt memory in some way. The program will crash sooner or later. It is up to the programmer to ensure that each pointer is assigned a pointee before it is used.

Dangling pointer

To understand what dangling pointers are, first one needs to understand what dynamic memory allocation is along with their corresponding functions such as malloc( ), calloc( ), realloc( ) and most importantly free( ). These functions are discussed in detail in the “Memory allocation functions” of the “Memory allocation” subsection of the “Arrays” section.

When a pointer pointing to a freed memory location is a dangling pointer. Predominantly there are 3 scenarios that can give rise to dangling pointers.

  1. When allocated memory is freed.
#include <stdlib.h>
#include <stdio.h>
int main()
{
      int *ptr = (int *)malloc(sizeof(int));

      // After below free call, ptr becomes a
      // dangling pointer
      free(ptr);

      // No more a dangling pointer
      ptr = NULL; 
}
  1. Returning address of a memory location of a non-static member from function.
#include<stdio.h>
char *foo()
{
    // x is local non-static variable that goes out
    // of scope once foo returns control to caller
    char ch = 'a';
    return &ch;
}
int main()
{
    char *ptr = foo();
    // p points to something which is not
    // valid anymore
    printf("%c", *ptr);
    return 0;
}
  1. This type is an extension of the 2nd type. When a pointer is pointing to the address of a local variable and we are trying to access the pointer values out of the scope of the local variable.
int main()
{
   float *ptr;
   .....
   .....
{
    float marks = 90.7;
    ptr = &marks;
}
.....
// Here ptr is dangling pointer
}

Dangling pointers are one of the leading causes of segmentation faults which cause programs to crash erratically. These are one of the most non-descriptive errors and sometimes can get very difficult to trace in a large program without any specialized tools such as GNU GDB (GNU Debugger).

Hence, it is always recommended to make the pointer a NULL pointer soon after the deallocation of the memory to avoid dangling pointers. It is also recommended to resolve all the compiler warnings as compilers are smart enough to detect the possibilities of dangling pointers.

Advanced topics in Pointers

Memory layouts in C

A typical memory representation of a C program consists of the following sections. A pictorial representation:

  1. Text segment (i.e. instructions)
    • A text segment, also known as a code segment or simply as text, is one of the sections of a program in an object file or in memory, which contains executable instructions.
    • This begs a question, what is an object file? An object file is an output of an engine called an assembler which is a part of the compiler. This is an intermediary code that is not directly executable but can be relocated to other machine architectures and executed. To know more about this, please read about the phases of the compilation of a C program.
    • As a memory region, a text segment may be placed below the heap or stack in order to prevent heaps and stack overflows from overwriting it.
    • This segment is write-protected so as to prevent accidental modification of code. It is shared to increase memory utilization efficiency as many applications such as shell or debugger may use the same code. It is also helpful that this segment is shared when a process spawns multiple processes or threads.
  1. Initialized data segment
    • This segment is a portion of the virtual address space of a program, which contains the global, static and extern variables that are initialized by the programmer.
#include <stdio.h>
/* global variables stored in Initialized Data Segment in
read-write area*/
    char c[] = "Quant Masters";
    const char s[] = "C Programming";
int main()
{
    static int i=11; /* static variable stored in Initialized Data
    Segment*/
    return 0;
}
  1. Uninitialized data segment (bss)
    • Data in this segment is initialized to arithmetic 0 before the program starts executing. Uninitialized data starts at the end of the data segment and contains all global variables and static variables that are initialized to 0 or do not have explicit initialization in the source code.
#include <stdio.h>
char c; /* Uninitialized variable stored in bss*/
int main()
{
   static int i; /* Uninitialized static variable stored
   in bss */
   return 0;
}
  1. Heap
    • To understand heap and its uses, one must have the basic knowledge of dynamic memory allocation which is discussed in the Arrays chapter.
    • This is a place in the memory segment where dynamic allocation
      happens. All the memory is given to pointers when malloc, calloc and realloc are from the heap. Heaps are especially efficient when arrays are to be passed among multiple functions as just the pointers are getting copied into the function stack (shallow copy). Consider the following example:
#include <stdio.h>
#include <stdlib.h>
#define SIZE 5
void foo(int *array)
{
    for (int i = 0 ; i < SIZE ; i++)
    array[i] = i * 10;
}
int main()
{
    int *array = (int *) malloc (sizeof(int) * SIZE);
    for (int i = 0 ; i < SIZE ; i++)
    array[i] = i;
    foo (array);
    for (int i = 0 ; i < SIZE ; i++)
    printf("%d ", array[i]);
    return 0;
}
  • The output of the above function:
0 10 20 30 40
  • In the above example, the array is pointing to a dynamically allocated
    location at heap as malloc is being used. Then the array elements are
    getting initialized to their corresponding index values. Post the
    initialization, the array is sent to foo as an argument. Here it is important to note that calling foo doesn’t create a deep copy array by allocating a new memory set in heap or stack. Instead, a new pointer in the stack frame (go to the advanced section of the functions chapter to learn more about stack frames) is created and is made to point to the same memory location in the heap. Hence the modification made in the function foo( ) is sustained even after the control comes back to the main ( ) function. The following example prints the address of the memory location. This should make the above point clearer.
#include <stdio.h>
#include <stdlib.h>
#define SIZE 5
void foo(int *array)
{
    printf("In foo function:\n");
    for (int i = 0 ; i < SIZE ; i++)
    printf("Address of array[%d]: %p\n", i, &array[i]);
}
int main()
{
    int *array = (int *) malloc (sizeof(int) * SIZE);
    for (int i = 0 ; i < SIZE ; i++)
    array[i] = i;
    foo (array);
    printf("In main function:\n");
    for (int i = 0 ; i < SIZE ; i++)
    printf("Address of array[%d]: %p\n", i, &array[i]);
    return 0;
}
  • The output of the above program:
In foo function:
Address of array[0]: 0x559b11c572a0
Address of array[1]: 0x559b11c572a4
Address of array[2]: 0x559b11c572a8
Address of array[3]: 0x559b11c572ac
Address of array[4]: 0x559b11c572b0
In main function:
Address of array[0]: 0x559b11c572a0
Address of array[1]: 0x559b11c572a4
Address of array[2]: 0x559b11c572a8
Address of array[3]: 0x559b11c572ac
Address of array[4]: 0x559b11c572b0
  • As the addresses resemble each other in both the functions, it can be
    concluded that the memory location the pointers are pointing to in both foo( ) and main( ) is the same.
  • It is very important to note that the heap grows upwards. The direction of the arrow in the block diagram should make this point obvious.
  1. Stack
    • The stack segment focuses on storing the contents of a function call. It consists of stack frames where each frame corresponds to a function call.
    • When a function is called, a dedicated stack frame is created and pushed on top of the stack segment. The stack frame just below it belongs to the function that called the currently executing function, and so forth. The bottom of the stack essentially should be the stack frame containing the main( ) function call. This frame is popped once the execution of the function is completed.
    • A stack frame stores the following data:
      • Local variables used in the function. Includes the received parameters from the calling function.
      • The return address of the instruction in the caller function that is to be executed after the function call is over.
      • Saved copies of registers modified by subprograms that could need restoration. This is done mainly for optimisation and keeping track of register contents.
    • Consider main( ) function calls a function foo( ) and passes an integer asargument.
1. void foo(int i)
2. {
3.      printf("%d", i);
4. }
5. int main()
6. {
7.      foo(3);
8.      printf("Hello World");
9. }
  • The following is the sequence of execution:
    • The program starts execution from the main function. A stack frame for the main( ) function call is created and pushed onto the stack segment.
    • The first line in the main function calls a function called foo( ). This results in the transfer of the control to foo function at line #1. Along with this, a stack for foo( ) function call is also created. This contains the parameter value ‘i’ and return address where the control has to start execution once foo has completed execution. In this example, it is line #8. This newly created stack frame is pushed onto the stack and the control continues executing the statements in foo( ).
    • Once the value of ‘i’ is printed, the control refers to the stack frame of foo( ), which is at the top of the stack and determines where the control should start executing in the main( ) function. That is from line #8. Hence the control starts executing from line #8 and not from the beginning of the main( ) function.
  • Note that the stack frames are created for each function call. This means that there could be multiple stack frames of the same function in the stack segment in case of recursion. This is the reason why recursion is inefficient when compared to iterative alternatives of the same logic.
  • The stack and heap are traditionally located at opposite ends of the process’s virtual address space.
  • Memory is a finite resource and if too many functions are called before any returns, a program can run out of space on the stack. This usually happens only in the case of recursive functions that are misbehaving but for programs with very tight memory constraints, it may happen in some kinds of programs. This scenario where the process runs out of dedicated memory for the stack is called stack overflow.

Near pointers

A near pointer is a 16-bit pointer to an object contained in the current segment, be it code segment, data segment, stack segment, or extra segment. The compiler can generate code with a near pointer and does not have to concern itself with segment addressing, so using near pointers is the fastest, and generates the smallest code. The limitation is that you can only access 64kb of data at a time because that is the size of a segment – 64kb. A near pointer contains only the 16-bit offset of the object within the currently selected segment.

Far Pointers

A far pointer is a 32-bit pointer to an object anywhere in memory. In order to use it, the compiler must allocate a segment register (segment register is the one that points to the base of the current segment being addressed), load it with the segment portion of the pointer, and then reference memory using the offset portion of the pointer relative to the newly loaded segment register. This takes extra instructions and extra time, so it is the slowest and largest method of accessing memory, but it can access memory that is larger than 64kb, sometimes, such as when dealing with video memory, a needful thing. A far pointer contains a 16-bit segment part and a 16 bit offset part. Still, at any one instant of time, without “touching” segment registers, the program only has access to four 64kb chunks or segments of memory. If there is a 100kb object involved, code will need to be written to consider its segmentation, even with far pointers.

Now, segments overlap. Each segment is 64kb in length, but each one overlaps the next and the prior by 65520 bytes. That means that every address in memory can be addressed by 64kb-1 different combinations of the segment: offset pairs. The result is that the total addressable memory was only 1MB, and the total usable memory address space was 500kb to 600kb. That sounds odd, but Intel built it, Microsoft wrote
it, and DOS/Windows 3.1 grew up around it. I still have that computer, and it still works just fine.

Huge pointers

The far pointer suffers because you can not just add one to it and have it point to the next item in memory – you have to consider segment: offset rules, because of the 16-bit offset issue. The huge pointer is a monolithic pointer to some item with a large chunk of memory, and there is no segment: offset boundaries.

Double Pointers

The concept of pointers can be further extended. Pointer, we know, is a variable that contains the address of another variable. Now this variable itself might be another pointer. Thus, we now have a pointer that contains another pointer’s address. The following example should make this point clear.

#include <stdio.h>
int main( )
{
    int i = 3, *j, **k ;
    j = &i ;
    k = &j ;
    printf ( "\nAddress of i = %u", &i ) ;
    printf ( "\nAddress of i = %u", j ) ;
    printf ( "\nAddress of i = %u", *k ) ;
    printf ( "\nAddress of j = %u", &j ) ;
    printf ( "\nAddress of j = %u", k ) ;
    printf ( "\nAddress of k = %u", &k ) ;
    printf ( "\nValue of j = %u", j ) ;
    printf ( "\nValue of k = %u", k ) ;
    printf ( "\nValue of i = %d", i ) ;
    printf ( "\nValue of i = %d", * ( &i ) ) ;
    printf ( "\nValue of i = %d", *j ) ;
    printf ( "\nValue of i = %d", **k ) ;
    return 0;
}

The output of the above program would be:

Address of i = 65524
Address of i = 65524
Address of i = 65524
Address of j = 65522
Address of j = 65522
Address of k = 65520
Value of j = 65524
Value of k = 65522

Remember that when you run this program the addresses that get printed might turn out to be something different than the ones shown in the figure. However, with these addresses the relationship between i, j and k can be easily established.

Observe how the variables j and k have been declared,

int i, *j, **k ;

Here, i is an ordinary int, j is a pointer to an int (often called an integer pointer), whereas k is a pointer to an integer pointer. We can extend the above program still further by creating a pointer to a pointer to an integer pointer. In principle, you would agree that likewise there could exist a pointer to a pointer to a pointer to a pointer to a pointer.
There is no limit on how far we can go in extending this definition.

Function Pointers

A function pointer is a pointer that holds the address of a function. The ability of pointers to point to functions turns out to be an important and useful feature of C. This provides us with another way of executing functions in an order that may not be known at compile time and without using conditional statements.

Branch prediction is a technique whereby the processor will guess which multiple execution sequences will be executed. Pipelining is a hardware technology commonly used to improve processor performance and is achieved by overlapping instruction execution.

One concern regarding the use of function pointers is their inefficiency. The processor may not be able to use branch prediction in conjunction with pipelining.

Declaring Function Pointers

  • Syntax:
return_type (*fptr_id)([dt1, dt2, ...]);
  • Where,
    • return_type is the return data type of the function whose address the pointer is intending to hold.
    • fptr_id is a valid identifier name for the function pointer.
      ○ dt1, dt2, … are the optional. They are the datatypes of the
      parameter list the function has whose address the pointer is
      intending to hold.
  • When function pointers are used, the programmer must be careful to ensure it is used properly because C does not check to see whether the correct parameters are passed.
  • A simple example of a function pointer which can point to a function which has a void as a parameter and returns a void is as follows:
void (*foo) ();
  • Other valid examples of function pointers are:
int (*f1)(double); // Accepts a double value as parameter and
returns an int
void (*f2)(char*); // Accepts a char pointer as parameter and
returns void
double* (*f3)(int, int); // Accepts two integer values as
parameters and returns a pointer to a double
  • One suggested naming convention for function pointers is to always begin their name with the prefix: fptr. This increases the readability and code maintainability.

Using a Function Pointer

  • Consider the following example:
#include <stdio.h>
int (*fptr1)(int);

int square(int num) {
    return num*num;
}
int main() {
    int n = 5;
    fptr1 = square;
    printf("%d squared is %d\n",n, fptr1(n));
    return 0;
}

Output:

5 squared is 25
  • Points to note from the above example:
    • The return type and the data type and the number of elements in the parameter list match between the function square( ) and the pointer which points to square( ): fptr1. That is, the return type is int and it accepts only one parameter which is of type int.
    • The function pointer fptr1 is declared before it is initialized.
    • Once initialized fptr to point square( ), fptr1 behaves as a replacement to the identifier ‘square’. That is, fptr1(n) and square(n) would yield the same results.
  • Another way of fptr1 initialization in the above example is using the address-of operator (&). It can be done as follows:
fptr1 = &square;
  • The use of the address of operator is of no significance and the compiler effectively ignores it.
  • Another way of using a function pointer is to use typedef. It is illustrated in the following example:
#include <stdio.h>
typedef int (*funPtr)(int);
int square(int num) 
{
    return num*num;
}
int main()
{
    int n = 5;
    funPtr fptr1;
    fptr1 = square;
    printf("%d squared is %d\n",n, fptr1(n));
    return 0;
}
  • The above yields the same results as its immediate previous example.
  • It is quite non-intuitive to use typedef for function pointers as typedef behaves differently for function pointers. Usually, typedef’s name is the declaration’s last element.

Passing Function Pointers

  • Passing function pointers to functions is quite intuitive. The following examples illustrate the same:
#include <stdio.h>
int multiply(int num1, int num2) 
{
   return num1 * num2;
}
int divide(int num1, int num2)
{
    return num1 / num2;
}
typedef int (*fptrOperate)(int, int);
int calculate(fptrOperate operation, int num1, int num2)
{
    return operation(num1, num2);
}
int main()
{
    int op1 = 10, op2 = 5;
    fptrOperate fptr1;
    fptr1 = multiply;
printf("%d * %d = %d\n", op1, op2, calculate(fptr1, op1,
op2));
    fptr1 = divide;
    printf("%d / %d = %d\n", op1, op2, calculate(fptr1, op1,
op2));
    return 0;
}
  • The output of the above program is:
10 * 5 = 50
10 / 5 = 2
  • Points to note from the above example:
    • A function pointer can simply be declared and passed to functions just like any other variable.
    • A function pointer can point to different functions at different points of execution times. In this example, initially fptr1 points to multiple( ) functions and later it points to divide( ).
    • A function pointer type definition should be declared before the function prototype or definition which accepts the function type definition. In this example, the type definition fptrOperate is declared before calculate( ) which uses fptrOperate in its parameter list.

Returning Function Pointers

  • Returning a function pointer requires declaring the function’s return type as a function pointer. Consider the following example:
#include <stdio.h>
int multiply(int num1, int num2)
{
    return num1 * num2;
}

int divide(int num1, int num2) 
{
    return num1 / num2;
}
typedef int (*fptrOperate)(int, int);
fptrOperate select(char opcode) 
{
    switch(opcode) {
        case '*': return multiply;
        case '/': return divide;
    }
}
int evaluate(char opcode, int num1, int num2)
{
    fptrOperate operation = select(opcode);
    return operation(num1, num2);
}
int main()
{
    int op1 = 10, op2 = 5;
    printf("%d * %d = %d\n", op1, op2, evaluate('*', op1, op2));
    printf("%d / %d = %d\n", op1, op2, evaluate('/', op1, op2));
    return 0;
}

Output:

10 * 5 = 50
10 / 5 = 2
  • Structure of the above program:
  1. Type definition of function which returns an integer value and accepts 2 integer values are made to fptrOperate.
  2. A function select( ) reads the operator and returns the function pointer of the corresponding function.
  3. Function evaluate( ) which:
    • Reads operator and operands and sends the operator to select( ).
    • Stores the function pointer returned by select( ) in ‘operation’.
    • Calls function pointed by ‘operation’ and passes the operands which it received as parameters (num1, num2).
    • Returns the value which is returned by the call from ‘operation’.
  4. Functions multiply( ) and divide( ) accept two integers and return an
    integer after performing multiplication and division operations on the
    parameter list. Hence, the function pointer return operation is similar to any other value return from a function.

Using an Array of Function Pointers

Arrays of function pointers can be used to select the function to evaluate on the basis of some criteria. Declaring such an array is straightforward.

typedef int (*funPointer)(int, int);
funPointer fptr_array[64] = {NULL};

Alternatively, one can use the syntax used in the following snippet:

int (*fptr_array[64])(int, int) = {NULL};

Both of the above code snippets declare a function pointer array named fptr_array with the capacity to hold 64 function pointers, each one of which returns an integer and accepts 2 parameters, both of which are of type integer. The entire array is initialized to NULL.

The example given in ‘Returning Function Pointers’ can be rewritten using an array of function pointers as follows: (Note that this example includes addition, subtraction and modulus operation as well)

#include <stdio.h>
int add (int num1, int num2) {
    return num1 + num2;
}

int subtract (int num1, int num2) {
    return num1 - num2;
}

int multiply (int num1, int num2) {
    return num1 * num2;
}

int divide (int num1, int num2) {
    return num1 / num2;
}

int mod (int num1, int num2) {
    return num1 % num2;
}

typedef int (*fptrOperate) (int , int);

fptrOperate fptr_array[5] = {NULL};

void initFptrArray () {
    fptr_array[0] = add;
    fptr_array[1] = subtract;
    fptr_array[2] = multiply;
    fptr_array[3] = divide;
    fptr_array[4] = mod;
}

fptrOperate select (char opcode) {
    switch (opcode) {            
        case '+': return fptr_array[0];
        case '-': return fptr_array[1];
        case '*': return fptr_array[2];
        case '/': return fptr_array[3];
        case '%': return fptr_array[4];
        default : return NULL;
    }
}

int evaluate (char opcode, int num1, int num2) {
    fptrOperate operation = select (opcode);  
    return operation (num1, num2);
}

int main () {
    initFptrArray ();
    int op1 = 10, op2 = 5;
    printf ("%d + %d = %d\n", op1, op2, evaluate ('+', op1, op2));
    printf ("%d - %d = %d\n", op1, op2, evaluate ('-', op1, op2));
    printf ("%d * %d = %d\n", op1, op2, evaluate ('*', op1, op2));
    printf ("%d / %d = %d\n", op1, op2, evaluate ('/', op1, op2));
    printf ("%d %% %d = %d\n", op1, op2, evaluate ('%', op1, op2));
    return 0;
}
  • The above program produces the following output:
10 + 5 = 15
10 - 5 = 5
10 * 5 = 50
10 / 5 = 2
10 % 5 = 0
  • One may notice that the indexing scheme for the fptr_array is not so obvious and we need a select( ) function to identify and map the right index for the right operator. This can be eliminated using the operator characters as indices. For example, fptr_array[‘*’] should hold the function pointer to multiply( ).
  • This can be achieved in the following example which produces the same output as that of the previous example
#include <stdio.h>
int add(int num1, int num2) {
    return num1 + num2;
}
int subtract(int num1, int num2) {
    return num1 - num2;
}
int multiply(int num1, int num2) {
    return num1 * num2;
}
int divide(int num1, int num2) {
    return num1 / num2;
}
int mod(int num1, int num2) {
    return num1 % num2;
}
typedef int (*fptrOperate)(int, int);
fptrOperate fptr_array[47] = {NULL};
void initFptrArray() {
    fptr_array['+'] = add;
    fptr_array['-'] = subtract;
    fptr_array['*'] = multiply;
    fptr_array['/'] = divide;
    fptr_array['%'] = mod;
}
int evaluate(char opcode, int num1, int num2) {
    fptrOperate operation = fptr_array[(int)opcode];
    return operation(num1, num2);
}

int main() {
    initFptrArray();
    int op1 = 10, op2 = 5;
    printf("%d + %d = %d\n", op1, op2, evaluate('+', op1, op2));
    printf("%d - %d = %d\n", op1, op2, evaluate('-', op1, op2));
    printf("%d * %d = %d\n", op1, op2, evaluate('*', op1, op2));
    printf("%d / %d = %d\n", op1, op2, evaluate('/', op1, op2));
    printf("%d %% %d = %d\n", op1, op2, evaluate('%', op1, op2));
    return 0;
}
  • This begs the question, why is the size of fptr_array 47! As we are trying to subscript fptr_array using characters: ‘+’, ‘-’, ‘*’, ‘/’ and ‘%’, the highest ASCII value attained by this set of characters is 47 which is of ‘/’. Hence the size of the array is 47.
  • This begs one more question, is it not inefficient to declare an array of size 47 and just use 5 locations out of them? The C compiler doesn’t allocate memory right away just because it is declared. It allocates once the process attempts to use it. Hence the memory efficiency isn’t affected.

Comparing Function Pointers

Function pointers can be compared to one another using the equality and inequality operators. The following example illustrates the same:

#include <stdio.h>
int multiply(int num1, int num2) {
     return num1 * num2;
}
int divide(int num1, int num2) {
     return num1 / num2;
}
typedef int (*fptrOperate)(int, int);

int main() {
    fptrOperate fptr1 = multiply;
    if (fptr1 == multiply)
        printf("Pointing to multiply( )");
    if (fptr1 != multiply)
        printf("Not pointing to multiply( )");
    return 0;
}

The output of the above program is as follows:

Pointing to multiply( )

Casting Function Pointers

A pointer to one function can be cast to another type. This should be done with care since the runtime system does not verify that the parameters used by a function pointer are correct. It is also possible to cast a function pointer to a different function pointer and then back. The resulting pointer will be equal to the original pointer. The size of function pointers used is not necessarily the same. The following sequence illustrates this operation:

#include <stdio.h>
int multiply(int num1, int num2) {
        return num1 * num2;
}
int main() {
        typedef int (*fptr_1)(int);
        typedef int (*fptr_2)(int,int);
        fptr_2 fptrFirst = multiply;
        fptr_1 fptrSecond = (fptr_1)fptrFirst;
        fptrFirst = (fptr_2)fptrSecond;
        printf("%d", fptrFirst(5,6));
}

This sequence, when executed, will display 30 as its output.

The use of void* is not guaranteed to work with function pointers. That is, we should not assign a function pointer to void* as shown below:

void* pv = add;

However, when interchanging function pointers, it is common to see a “base” function pointer type as declared below. This declares fptrBase as a function pointer to a function, which is passed void and returns void:

typedef void (*fptrBase)();

The following sequence demonstrates the use of this base pointer, which duplicates the previous example:

fptrBase basePointer;
fptrFirst = multiply;
basePointer = (fptr_1)fptrFirst;
fptrFirst = (fptr_2)basePointer;
printf("%d",fptrFirst(5,6));

A base pointer is used as a placeholder to exchange function pointer values.

A word of warning: Function pointer casting is one of the most error-prone and vulnerable areas. It is always advised to use function pointers whose declaration matches the function signature it is pointing to

Categories
Computer Science / Information Technology Language: C

Strings

Introduction

The way a group of integers can be stored in an integer array, similarly, a group of characters can be stored in a character array. A string constant is a one-dimensional array of characters terminated by a
null ( ‘\0’ ).

Each character in the array occupies one byte of memory. NULL may look like two characters, but it is actually only one character, with the \ indicating that what follows it is something special. Note that ‘\0’ and ‘0’ are not the same. The ASCII value of ‘\0’ is 0, whereas
the ASCII value of ‘0’ is 48.

Generally, string characters are selected only from the printable character set. Nothing in C prevents any character other than the NULL from being used in a string. In fact, it is quite common to use formatting characters, such as tabs, in strings. Before jumping into this section, please go through the ‘Arrays’ section if you haven’t yet.

String literals

A string literal, also known as a string constant, is a sequence of characters enclosed in double-quotes. For example, each of the following is string literal:

"Hello world"
"Quants"
"Welcome to C language!"

When string literals are used in a program, C automatically creates an array of characters, initializes it to a null-delimited string, and stores it, remembering its address. It does all this because we use the double quotes that immediately identify the data as a string value.

Declaration and initialisation of a string

Declaring and initializing a string is similar to any other array. Two of the ways of doing it are as follows:

Syntax:

char string_id[size] = {'Q', 'u', 'a', 'n', 't', 's', '\0'};
char string_id[size] = {'Q', 'u', 'a', 'n', 't', 's'};
char string_id[size] = "Quants";
char string_id[size] = "Quants\0";

Where,

  1. string_id is any valid identifier for the string.
  2. size is a non-zero positive integer denoting the size of the string.

Note:

  1. The strings can also be initialized at the time of declaration using a string literal.
  2. size value is optional in this case as the initialisation is happening at the time of declaration. This should always be 1 plus the length of string planned to be stored in the array. This is to store the NULL delimiter at the end.
  3. The NULL character (‘\0’) at the end of the 1st and 4th example is optional as the modern compilers append NULL implicitly. But it is advised to put it explicitly.
  4. If the size value is smaller than the length of the string being initialized, the compiler will throw a warning saying “excess elements in array initializer” and initializes only the ‘size‘ number of characters into the array.
  5. It is important to remember that strings cannot be initialized with string literals in the following way:
char string[10];
string = "Quants"; // Error
  1. Note that you cannot copy the contents of one string to another using the equals operator. To do that there are string functions supported by seeing which we shall see at a later point.
char string[10] = "Quants";
char str[10];
str = string; // Error

Significance of NULL delimiter

The string in C is not a data type but a data structure. This means that its implementation is logical and not physical. The physical structure is the array in which the string is stored. Since the string is a variable-length structure, we need to identify the logical end of the data within the physical structure.

The terminating null (‘\0’) is important also because it is the only way the functions that work with a string can know where the string ends. In fact, a string not terminated by a ‘\0’ is not really a string, but merely a collection of characters.

Strings and characters

A character can be stored in 2 ways: as a character literal or as a string literal. To store in character literal, use single inverted commas. The character occupies a single memory location and a string containing a single character requires two bytes of memory location: 1 for the character and the other for the null delimiter. Hence, ‘a’ is a character, “a” is a string and “” is an empty string.

A character can be copied from one location (or variable) to another using an assignment operator but for strings, we would need library functions which we shall see later in this section.
Furthermore, a character is nothing but an unsigned short integer which is mapped to the corresponding value in the ASCII table. The following ASCII table describes the mapping:

Another important difference between string and character is how we represent the absence of data. A character cannot be empty. It can hold a null character or a space but simply cannot be empty. Hence ‘’ doesn’t make sense in C. On the other hand, strings can be empty. A string that contains no data consists of only a delimiter.

Referencing string literals

As it is clear by now that the strings are nothing but a character array with null as the last character, it should be obvious to note that the individual characters in the string can also be accessed just the way we do in arrays. The following are the valid ways to refer to characters in
the string:

string[0];     // First character
*(string + 1); // Second character
*(&string[2]); // Third character

Notice that the 3rd way is just referencing and dereferencing the address. The above three methods can be illustrated in the following example:

#include <stdio.h>
int main( ) 
{
     char string[10] = "ABC";
     printf("%c", string[0]);     // First character
     printf("%c", *(string + 1)); // Second character
     printf("%c", *(&string[2])); // Third character
     return 0;
}

Iterating through characters of a string

The following are the ways to iterate through the string using for loop:

char string[10] = "ABC";
for (int i = 0 ; string[i] != '\0' ; i++)
printf("%c", string[i]);

Using while loop:

char string[10] = "ABC";
int i = 0 ;
while ( string[i] != '\0') {
    printf("%c", string[i]);
    i++;
}

String Input/Output Functions

scanf( )

  • scanf( ) is a function that reads data from stdin (i.e, the standard input stream, which is usually the keyboard, unless redirected) and then writes the results into the arguments given.
  • The entered data is formatted as integers or floating-point numbers etc
  • Syntax:
int scanf(const char *format, ...);
  • The ‘format’ is a string in itself and contains conversion code. In this case for string, the conversion code is s. The simplest of examples of this is as follows:
char str[10];
scanf("%s", str);
  • scanf( ) ignores all the leading whitespaces in the input and reads only till it finds whitespace. All the characters from the first non-whitespace character till the last non-whitespace character are put into the memory location pointed by the pointer passed as an argument.
  • For example, if ” Hello World ” is entered, the leading whitespace is ignored and only “Hello” is stored in str and a null character is also stored at the end of the str. The rest of the input string is left in the input stream.
  • To remove the rest of the input from the input stream, one can use fflush( ) or define a macro. Simply calling fflush with stdin as a parameter should flush out all the residual input characters from the input stream. This is illustrated as follows:
char str[10];
scanf("%s", str);
fflush(stdin);
  • A macro to flush the input stream is illustrated as follows:
#define FLUSH while (getchar() != '\n') {
char str[10];
scanf("%s", str);
FLUSH;
}
  • We can protect against the user entering too much data by using width in the field specification. Width specifies the maximum number of characters to be read. The modified scanf statement is shown below:
scanf("%9s", str);
  • Any number of characters entered beyond the first 9 characters in the above case will be stored in buffer but not stored to str. Hence it is a recommended practice to flush after using scanf().

The scan set conversion code ([…])

  • The scan set conversion specification consists of the open bracket ([), followed by the edit characters, and terminated by the closing
    bracket (]).
  • The characters in the scan set identify the valid characters, known as the scan set, that are to be allowed in the string. All characters except the close bracket can be included in the set.
  • Edited conversion reads the input stream as a string. Each character read by scanf() is compared against the scan set. If the character just read is in the scan set, it is placed in
    the string and the scan continues.
  • The read will stop when the following conditions are met:
    • The first character does not match the scan set.
    • If the first character read is not in the scan set, the scanf() terminates and a null string is returned.
    • If an end-of-file is detected.
    • If a field width specification is included and the maximum number of characters has been read.
  • Scan set doesn’t skip the leading whitespace. Leading whitespace is either put into the string being read when the scan set contains the corresponding whitespace character or stops the conversion when it is not.
  • The non-matching character remains in the input stream for the next read operation. Hence it is advised to flush after using the scan set conversion code.
  • Example: A scanf() statement to read a string containing only digits, commas, periods, the minus sign, and a dollar sign and the maximum number of characters in the resulting string is 10. The format string for this operation would be:
scanf("%10[0123456789.,-$]", str);
  • Sometimes it is easier to specify what is not to be included in the scan set rather than what is valid. For instance, suppose that we want to read a whole line. We can do this by stating that all characters except the newline (\n) are valid. To specify invalid characters, we start the scan set with the caret ( symbol. The caret is the negation symbol and in
    effect says that the following characters are not allowed in the string. To read a line, we would code the scanf as shown below.
scanf("%[^\n]", line);

printf()

  • C has four options of interest when we write strings using these print functions: the left-justify flag, width, precision, and size. The left-justify flag (-) and the width are almost always used together.
  • Justification flag
    • The justification flag (-) is used to left justify the output. It has meaning only when the width is also specified, and then only if the length of the string is less than the format width. Using the justification flag results in the output being left-justified, as shown below. If no flag is used, the justification is right.
printf("|%-30s|\n", "This is the string");

○ Output:

|This is the string |
  • Minimum width
    • The width sets the minimum size of the string in the output. If it is used without a flag, the string is printed right-justified as shown below.
printf("|%30s|\n", "This is the string");

○ Output

|            This is the string|
  • Precision
    • C also uses the precision option to set the maximum number of characters that will be written. In the following example, we set the maximum characters to one less than the width to ensure space between the string and the next column.
printf("|%-15.14s|", "12345678901234567890");

○ Output:

|12345678901234 |

Arrays of strings

Ragged/Jagged array

Before moving into Arrays of strings, one needs to understand what is a Jagged/Ragged array. A Jagged array is a 2-dimensional array where each row might have a non-identical number of columns than the other arrays.

Arrays of strings as Ragged array

It is obvious that an array of strings will mostly be a ragged array as it’s extremely unlikely that the strings in the array will have the same length. This can be illustrated in the following figure:

There are many ways to declare and initialize an array of strings. Some of them are as follows:

Consider the following program:

#https://www.onlinegdb.com/#editor_1include <stdio.h>
int main() 
{
    char* days_of_week[7];
    days_of_week[0] = "Monday";
    days_of_week[1] = "Tuesday";
    days_of_week[2] = "Wednesday";
    days_of_week[3] = "Thursday";
    days_of_week[4] = "Friday";
    days_of_week[5] = "Saturday";
    days_of_week[6] = "Sunday";
    for(int i = 0 ; i < 7 ; i++)
    printf("%s\n", days_of_week[i]);
    return 0;
}

The output of the above program:

Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday

Points to note from the above program:

  • The string array days_of_week[] is an array of character pointers.
  • The array indices are getting initialized with the static strings which get stored in the text segment. These are not editable strings. When tried to edit, the process will throw a segmentation fault.


Now the previous example program saw how to create a string array where the strings are read-only as they are stored in the text segment. Please note that the array is not read-only.

At any given point, an index of the array can point to a different string but the string it is pointing to will be read-only.

At times, we need to store a string in the array which we can edit. The following program shows how to do the same:

#include <stdio.h>
#include <string.h>
int main()
{
    char days_of_week[7][10];
    strcpy(days_of_week[0], "Monday");
    strcpy(days_of_week[1], "Tuesday");
    strcpy(days_of_week[2], "Wednesday");
    strcpy(days_of_week[3], "Thursday");
    strcpy(days_of_week[4], "Friday");
    strcpy(days_of_week[5], "Saturday");
    strcpy(days_of_week[6], "Sunday");
    for(int i = 0 ; i < 7 ; i++)
    printf("%s\n", days_of_week[i]);
    return 0;
}

The output of the above program is:

Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday

Points to note from the above program:

  • Memory allocation for all the characters to be initialised is done during declaration.
  • The strings are copied (using strcpy( ) library function which we shall see in the next section) to the memory locations of the array, unlike the previous example where only the pointers to the strings were stored in a single dimensional array

String Manipulation Functions

  • Strings in C is not a primitive data type hence one cannot use operators such as ‘+’(presumably to concatenate two strings) or ‘=’ (presumable to copy one string to a memory location). To do all of these tasks, there is a standard string library defined in C.
  • The following are some of the string manipulation functions defined in string.h:

String length

  • The string length function returns the length of a string passed as an argument to it. The length of a string is defined as the number of characters in the string till the first occurrence of the null character in the string, excluding the null character.
  • The prototype of the string length function is as follows:
    • size_t strlen(const char *s);
  • strlen( ) is the function name and the return type is size_t which mostly is unsigned int.
  • The following program demonstrates strlen() without using the library function:
#include <stdio.h>
size_t strlen(char *str)
    {
        size_t len = 0;
        while (*str != '\0')
        {
          str++;
          len++;
        }
     return len;
    }
int main()
{
    char str[100];
    scanf("%s", str);
    printf("%ld is the length of the string.", strlen(str));
    return 0;
}

The output of the above program is as follows:

Quantmasters
12 is the length of the string.

The following program illustrates library function strlen()

#include <stdio.h>
#include <string.h>
int main()
{
    char str[100];
    scanf("%s", str);
    printf("%ld is the length of the string.", strlen(str));
    return 0;
}

Output:

Quantmasters
12 is the length of the string.

String Copy

  • Often we would need to copy the contents of one string variable to another. As C would not allow using of assignment operator on strings to achieve deep copy, string copy functions strcpy() and strncpy() come in handy.
  • The function signature of strcpy() is as follows:
char *strcpy(char *dest, const char *src);
  • The strcpy() function copies the string pointed to by src, including the terminating null byte (‘\0’), to the string pointed to by dest.
  • The strncpy() function is similar, except that at most n bytes of src are copied. Warning: If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated.
  • If the length of src is less than n, strncpy() writes additional null bytes to dest to ensure that a total of n bytes are written.
  • The strcpy() and strncpy() functions return a pointer to the destination string dest.
  • The following program illustrates the use of library function strcpy():
#include <stdio.h>
#include <string.h>
int main()
{
    char src[100], dest[100];
    scanf("%s", src);
    printf("%s is the dest string.", strcpy(dest, src));
    return 0;
}

Output:

HelloHi
HelloHi is the dest string.

The following program illustrates what a simple implementation of strcpy() might be:

#include <stdio.h>
char *
strcpy(char *dest, const char *src)
{
    size_t i;
    for (i = 0; src[i] != '\0'; i++)
    dest[i] = src[i];
    return dest;
}
int main()
{
    char src[100], dest[10] = {'\0'};
    scanf("%s", src);
    printf("%s is the dest string.", strcpy(dest, src));
    return 0;
}

Output

HelloHi
HelloHi is the dest string.

The following program illustrates what a simple implementation of strncpy() might be:

#include <stdio.h>
char *
strncpy(char *dest, const char *src, size_t n)
{
    size_t i;
    for (i = 0; i < n && src[i] != '\0'; i++)
    dest[i] = src[i];
    for ( ; i < n; i++)
    dest[i] = '\0';
    return dest;
}
int main()
{
    char src[100], dest[10] = {'\0'};
    scanf("%s", src);
    printf("%s is the dest string.", strncpy(dest, src, 9));
    return 0;
}

The output of the above programs is as follows

HelloHi
HelloHi is the dest string.

Things to note while using string copy functions in C:

Overlapping strings

  • It is important to note that the src and dest strings should not overlap. That is, src should not point to any character of dest and dest should not point to any character of src. A classic example of string overlap is as follows:
  • One can notice that dest and src are pointing to the same string but at two different indices. This is an overlap and the string copy functions should not be applied to such types of functions.

Size of dest string

  • The dest string should be large enough to accommodate the src string

Buffer overflow

  • One should use strcpy() only when it is 100% guaranteed that the src string is shorter or has an equal length to that of the capacity of dest string. If not, buffer-overflow attacks can be launched and important data on the stack can be written to point to a malicious code.
  • To avoid this, one can use strncpy() and mention the capacity of dest as the argument ‘n’.

What to use when?

  • Some programmers consider strncpy() to be inefficient and error-prone. If the programmer knows (i.e., includes code to test!) that the size of dest is greater than the length of src, then strcpy() can be used.
  • One valid (and intended) use of strncpy() is to copy a C string to a fixed-length buffer while ensuring both that the buffer is not overflowed and those unused bytes in the target buffer are zeroed out (perhaps to prevent information leaks if the buffer is to be written to media or transmitted to another process via an interprocess communication technique).
  • If there is no terminating null byte in the first n bytes of src, strncpy() produces an unterminated string in dest.
  • If buf has length buflen, you can force termination using something like the following:
strncpy(buf, str, buflen - 1);
if (buflen > 0)
buf[buflen - 1]= '\0';
  • Of course, the above technique ignores the fact that, if src contains more than buflen – 1 bytes, information is lost in the copying to dest.

strlcpy()

  • Some systems (the BSDs, Solaris, and others) provide the following function:
size_t strlcpy(char *dest, const char *src, size_t size);
  • This function is similar to strncpy(), but it copies at most size-1 bytes to dest, always adds a terminating null byte, and does not pad the target with (further) null bytes.
  • This function fixes some of the problems of strcpy() and strncpy(), but the caller must still handle the possibility of data loss if the size is too small.
  • The return value of the function is the length of src, which allows truncation to be easily detected: if the return value is greater than or equal to size, truncation occurred.
  • If loss of data matters, the caller must either check the arguments before the call or test the function return value.
  • strlcpy() is not present in Glibc and is not standardized by POSIX, but is available on Linux via the libbsd library.

String compare

C provides two functions to compare two strings: strcmp() and strncmp(). These are included with the header file string.h

strcmp()

  • Function signature:
int strcmp(const char *s1, const char *s2);
  • The strcmp() function performs case sensitive comparison of the two strings s1 and s2. It returns an integer less than, equal to, or greater than zero if s1 is found, respectively, to be less than, to match, or be greater than s2.
  • The return value of this function depends on the comparison operation and the following table illustrates the 3 possibilities:
Return valueCase
0if the s1 and s2 are equal
Negative integers1 is less than s2
Positive non zero integers1 is greater than s2

The following program illustrates strcmp():

#include <stdio.h>
#include <string.h>
int main()
{
    char str1[100], str2[100];
    printf("Enter first string: ");
    scanf("%s", str1);
    printf("Enter second string: ");
    scanf("%s", str2);
    int i = strcmp(str1, str2);
    if (i == 0)
         printf("%s == %s\n", str1, str2);
    else if (i < 1)
         printf("%s < %s\n", str1, str2);
    else
         printf("%s > %s\n", str1, str2);
    return 0;
}

Output: Run #1

Enter first string: hello
Enter second string: Hello
hello > Hello

Output: Run #2

Enter first string: Hello
Enter second string: Hello
Hello == Hello

Output: Run #3

Enter first string: Hello
Enter second string: hello
Hello < hello

strncmp()

  • The strncmp() function is similar, except it compares only the first (at most) n bytes of s1 and s2. The signature is as follows:
int strncmp(const char *s1, const char *s2, size_t n);
  • The program below is from the official man page of Linux. This demonstrates the operation of strcmp() (when given two arguments) and strncmp() (when given three arguments). First, some examples using strcmp():
/* string_comp.c
Licensed under GNU General Public License v2 or later.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int
main(int argc, char *argv[])
{
    int res;
    if (argc < 3) 
{
        fprintf(stderr, "Usage: %s <str1> <str2> [<len>]\n", argv[0]);
        exit(EXIT_FAILURE);
}

if (argc == 3)
res = strcmp(argv[1], argv[2]);
else
    res = strncmp(argv[1], argv[2], atoi(argv[3]));
    if (res == 0) {
    printf("<str1> and <str2> are equal");
    if (argc > 3)
    printf(" in the first %d bytes\n", atoi(argv[3]));
    printf("\n");
    } else if (res < 0) {
    printf("<str1> is less than <str2> (%d)\n", res);
    } else {
    printf("<str1> is greater than <str2> (%d)\n", res);
}
exit(EXIT_SUCCESS);

Output

$ ./string_comp ABC ABC
<str1> and <str2> are equal
$ ./string_comp ABC AB # 'C' is ASCII 67; 'C' - '\0' = 67
<str1> is greater than <str2> (67)
$ ./string_comp ABA ABZ # 'A' is ASCII 65; 'Z' is ASCII 90
<str1> is less than <str2> (-25)
$ ./string_comp ABJ ABC
<str1> is greater than <str2> (7)

And then some examples using strncmp():

$ ./string_comp ABC AB 3
<str1> is greater than <str2> (67)
$ ./string_comp ABC AB 2
<str1> and <str2> are equal in the first 2 bytes

String concatenate: strcat()

The strcat() function appends the src string to the dest string, overwriting the terminating null byte (‘\0’) at the end of dest, and then adds a terminating null byte hence the resulting string in dest is always null-terminated.

The strings may not overlap, and the dest string must have enough space for the result. The strcat() function returns a pointer to the resulting string dest. If dest is not large enough, program behaviour is unpredictable. The function signature is as follows:

int strcat(const char *s1, const char *s2);

Demonstration of strcat():

#include <stdio.h>
#include <string.h>
int main() 
{
    char *src = "World";
    char dest[20];
    strcpy(dest, "Hello ");
    printf("%s", strcat(dest, src)); // strcat() returns dest
    return 0;
}

Output:

Hello World

String concatenate: strncat()

The strncat() function is similar, except that
● it will use at most n bytes from src
● src does not need to be null-terminated if it contains n or more bytes.
The function signature is as follows:

char *strncat(char *restrict dest, const char *restrict src, size_t n);

If src contains n or more bytes, strncat() writes n+1 bytes to dest (n from src plus the terminating null byte). Therefore, the size of dest must be at least strlen(dest)+n+1. A simple implementation of strncat() might be:

char *
strncat(char *dest, const char *src, size_t n) 
{
    size_t dest_len = strlen(dest);
    size_t i;
    for (i = 0 ; i < n && src[i] != '\0' ; i++)
    dest[dest_len + i] = src[i];
    dest[dest_len + i] = '\0';
    return dest;
}

Other important string manipulation functions

Exercises

  1. Write a function that accepts a string (a pointer to a character) and deletes the last character by moving the null character 1 position to the left.
  2. Write a function that accepts a string (a pointer to a character) and deletes all the trailing spaces at the end of the string. Make sure that the resultant string is terminated with the null character.
  3. Write a function that accepts a string (a pointer to a character) and deletes all the leading spaces.
  4. Write a function that returns the number of times the character is found in a string. The function has two parameters: the first parameter is a pointer to a string and the second parameter is the character to be continued.
  5. Write a function that inserts a string into another string at a specified position. It returns a positive number if it is successful or zero if it has any problems, such as an insertion location greater than the length of the receiving string. The first parameter is the receiving string. The second parameter is the string to be inserted. And the third parameter is the index of the insertion position in the first string.
  6. Write a program that extracts part of the given string from the specified position. For example, if the string is “Working with strings is fun”, then if from position 4, four characters are to be extracted, then the program should print the string as “king”. If the number of characters to be extracted is 0, then the program should print the entire string
    from the specified position.
  7. Write a program that converts a string like “123” to an integer 123.
  8. Write a program that generates and prints the Fibonacci words of order 0 through 5.
    • f(0) = “a”
    • f(1) = “b”
    • f(2) = “ba”
    • f(3) = “bab”
    • f(4) = “babba”
  9. To uniquely identify a book a 10 digit ISBN (international standard book number) is used. The rightmost digit is a checksum digit. This digit is determined from the other 9 digits using the condition that d1 + 2d2 + 3d3 + … + 10d10 must be a multiple of 11 (where di
    22 denotes the ith digit from the right). The checksum digit d1 can be any value from 0 to 10 the ISBN convention is to use the value x to denote 10. Write a program that receives a 10 digit integer, computes the checksum, and reports whether the ISBN number is correct or not.
  10. A credit card number is usually a 16 digit number. A valid credit card number could satisfy a rule explained below with the help of a dummy credit card number- 4567 1234 5678 9129. Start with the rightmost – 1 digit and multiply every other digit by 2.

Then subtract 9 from any number larger than 10. Thus we get:
8 3 2 6 1 5 9 4
Add them all up to get 38.
Add all the other digits to get 42.
The sum of 38 and 42 is 80. Since 80 is divisible by 10, the credit card number is valid.
Write a program that receives a credit card number and checks using the above rule whether the credit card number is valid

Categories
Computer Science / Information Technology Language: C

Arrays

Introduction

  • An array in C is defined as the collection of similar types of data items stored at contiguous memory locations.
  • Arrays are the derived data type in C programming language which can store the primitive type of data such as int, char, double, float, etc. It also can store the collection of derived data types, such as pointers, structure, etc.
  • The array is the simplest data structure where each data element can be randomly accessed by using its index number.
  • Arrays make it a lot easier to deal with a large number of similar types of values such as marks of a student or age of students etc.
  • An array is also known as a subscripted variable.

Basic operations on an Array

Declaration

Syntax:

data_type array_identifier[size];

Where,

  • data_type is any primitive, non-primitive or user-defined data type.
  • array_identifier is any valid C identifier.
  • size is a positive integer value denoting the size of the array. example:
int seats[10]; // Declares an array named seats which can hold 10
integers.
float age[50]; // Declares an array named age which can hold 50
floating-point numbers.
char division[5]; // Declares an array named division which can hold
5 characters.

Initialisation

  • Indexing of an array in C starts from 0. That is, to access or modify the first element of an array, one has to use the following notation:
array_identifier[0] = value;
  • Note that the value should be of the same data type as that of the array.
  • Generalizing the above syntax:
array_identifier[index] = value;
  • Where, the index can attain any value from 0 to size – 1. And this notation of accessing values using an index enclosed in square brackets is called subscript notation.
  • Eg:
#include <stdio.h>
int main()
{
    int integers[5];
    integers[0] = 0;
    integers[1] = 1;
    integers[2] = 2;
    integers[3] = 3;
    integers[4] = 4;
    for(int i = 0 ; i < 5 ; i ++)
    printf("%d ", integers[i]);
    return 0;
}

output:

0 1 2 3 4
  • It can be noted from the example that array elements can be accessed through a loop as well. Similarly, one can initialize the array using a loop. It can be illustrated in the following example:
#include <stdio.h>
int main()
{
    int integers[5];
    for(int i =0 ; i < 5 ; i ++)
    integers[i] = i;
    for(int i =0 ; i < 5 ; i ++)
    printf("%d ", integers[i]);
    return 0;
}
  • In this particular case, we initialize the array elements to their corresponding indices. That is, integers[0] is 0, integers[1] is 1 and so forth. This produces the same output as the previous example.
  • It is always a best practice to use a preprocessor directive to define the size of the array. This helps a lot when we want to change the size of the array. It is illustrated in the following example:
#include <stdio.h>
#define SIZE 5
int main()
{
    int integers[SIZE];
    for(int i =0 ; i < SIZE ; i ++)
    integers[i] = i;
    for(int i =0 ; i < SIZE ; i ++)
    printf("%d ", integers[i]);
    return 0;
}
  • This produces the same output as previous examples but is much more
    maintainable. If one has to change the size of the array, a modification to the preprocessor directive takes care of the rest. This reduces the errors and makes debugging easier.
  • C also provides a way to declare and initialize an array in a single statement. The following is the syntax to achieve the same:
#include <stdio.h>
#define SIZE 5
int main()
{
    int integers[SIZE]={0, 1, 2, 3, 4};
    for(int i =0 ; i < SIZE ; i ++)
    printf("%d ", integers[i]);
    return 0;
}
  • This also produces the same output as the previous examples in the section. An advantage of using this method is that the programmer need not mention SIZE while declaring. The compiler automatically calculates the size, allocates memory and initializes the given values. This can be illustrated using the following
    example:
#include <stdio.h>
#define SIZE 5
int main()
{
    int integers[]={0, 1, 2, 3, 4};
    for(int i =0 ; i < SIZE ; i ++)
    printf("%d ", integers[i]);
    return 0;
}
  • This produces the same output as previous examples. That is:
0, 1, 2, 3, 4
  • Till the array elements are not given any specific values, they are supposed to contain garbage values as the array is assumed to be an auto variable by default (Storage classes).
#include <stdio.h>
#define SIZE 5
int main()
{
    int integers[SIZE];
    for(int i =0 ; i < SIZE ; i ++)
    printf("%d ", integers[i]);
    return 0;
}
  • The above program produces a random garbage output such as this:
0 0 -498802560 21903 161117280
  • Also, note that trying to access an index other than the range 0 to size – 1 of the array would lead to unpredictable results. There is a good chance that the program will crash with a segmentation fault. The same is illustrated in the following example:
#include <stdio.h>
int main()
{
    int array[] = {1, 2, 3, 4, 5};
    printf("%d", array[6]);
    return 0;
}
  • In the above example, anything other than the values from 0 to 4 would result in an unpredictable output. It is the sole responsibility of the programmer to take care that the written program doesn’t go out of the bounds of the array. C language doesn’t assist in bounds checking.

Properties of an Array

  • Each element of an array is of the same data type and carries the same size, i.e., int = 4 bytes.
  • Elements of the array are stored at contiguous memory locations where the first element is stored at the smallest memory location. Consider the following example:
array[] = {1, 2, 3, 4, 5};
  • The memory representation of the above array would be as follows assuming that the array is starting at 65508 and running on a 16-bit compiler:
  • Elements of the array can be randomly accessed since we can calculate the address of each element of the array with the given base address and the size of the data element.

Advantages of an Array

  1. Code optimisation: Less code to access the data.
  2. Ease of traversing: By using the for loop, we can retrieve the elements of an array easily.
  3. Ease of sorting: To sort the elements of the array, we need a few lines of code only.
  4. Random Access: We can access any element randomly using the array.

Passing Array Elements to a Function

  • An array is a collection of contiguous and independent elements each with a dedicated address, it should be possible to pass elements of an array individually to a function either by value or reference.
  • This is done by passing values or references to functions using subscript notation. The following example illustrates this:
#include <stdio.h>
void print_by_value(int value) 
{
    printf("%d ", value);
}

void print_by_reference(int* value) 
{
    printf("%d ", *value);
}
int main() 
{
    int array[] = {1, 2, 3, 4, 5};

    printf ("Print by value:\t\t");
    for (int i = 0 ; i < 5 ; i++)
    print_by_value (array[i]);
    
    printf ("\nPrint by reference:\t");
    for (int i = 0 ; i < 5 ; i++)
    print_by_reference (&array[i]);
    
    return 0;
}
  • The output of the above example is:
Print by value: 1 2 3 4 5
Print by reference: 1 2 3 4 5
  • Points to note from the above example:
    • The print_by_value( ) accepts an integer value and prints the same value. A local copy of the value being passed to the function is made in the function. Any changes made to the variable inside this function will not result in a change in the array in the main( ) function.
    • The print_by_reference( ) accepts the address of an integer value and stores it in a local pointer. Any changes made to the value being pointed by this pointer will result in a value change in the array of the main( ) function.

Pointers and Arrays

Before jumping into this section, please make sure you understand pointer arithmetic. Pointer arithmetic is discussed in detail in the ‘Pointers’ section. Till now we were accessing array elements using subscript notation. The elements can be accessed using pointers too. Consider the following example:

#include <stdio.h>
int main()
{
    int array[] = {1, 2, 3, 4, 5};
    int* ptr = &array[0];
    if (ptr == array)
    printf("%p\n%p", ptr, array);
    return 0;
}

Output:

0x7ffc0bd9e810
0x7ffc0bd9e810

A very important conclusion can be drawn from the above example. The array variable is also a pointer to the first element of the array. To reinforce this fact, consider the following example:

#include <stdio.h>
int main()
{
     int array[] = {1, 2, 3, 4, 5};
     int* ptr = &array[0];
     printf("%d", ptr[2]);
     return 0;
}

The output of the above program is:

3

So now, it is a fact that any pointer to the first element of the array can be a replacement for the array variable used earlier. With this fact as the premise, consider the following example:

#include <stdio.h>
int main() 
{
    int array[] = {1, 2, 3, 4, 5};
    for (int i = 0 ; i < 5 ; i++)
    printf("%d ", *(array+i));

    return 0;
}

The output of the above program is:

1 2 3 4 5

The subscript notation is converted to pointer notation internally. i.e.,
array[i] = *(array + i)
So it is clear from the above example that we can access array elements through pointer notation. And it is also clear that a pointer when incremented always points to an immediately next location of its type.

Another way of traversing all the elements of an array using pointer notation is illustrated in the following example:

#include <stdio.h>
int main() 
{
    int array[] = {1, 2, 3, 4, 5};
    const int* traverse_ptr = &array[0];
    for (int i = 0 ; i < 5 ; i++, traverse_ptr++)
    printf("%d ", *traverse_ptr);
    return 0;
}

Discretion between a pointer and an array

After going through the above examples, one may jump to a conclusion that there is no difference between an array and a pointer but there is.

  • An array has memory allocated to it and a pointer may or may not.
  • &array returns the address of the first element and not the address of the array variable which stores the address of the first element. Whereas &ptr gives the address of the ptr variable which stores the address of the first element of the array memory location to which it is pointing. If &array is returning any other address than the first element of the array, we call it “Array decay”
  • Another symptom of array decay is that the sizeof(array) returns a value other than the size of the memory allocated to the array.
  • These differences can be noticed in the following example program:
#include <stdio.h>
int main() 
{
    int array[] = {1, 2, 3, 4, 5};
    int* ptr = &array[0];
    printf("Address of ptr: %p\n", &ptr);
    printf("Address of array: %p\n", &array);
    printf("sizeof of ptr: %u\n", sizeof(ptr));
    printf("sizeof of ptr: %u\n", sizeof(array));
    return 0;
}

The output of the above program is as follows:

Address of ptr: 0061FF08
Address of array: 0061FF0C
sizeof of ptr: 4
sizeof of ptr: 20

Best practices

  1. Making array elements write protected while assigning a pointer for traversing using ‘const int*’ is essential to prevent accidental data modification while traversing.
  2. Accessing array elements by pointers is always faster than accessing them by subscripts.
  3. Array elements should be accessed using pointers if the elements are to be accessed in a fixed order, say from beginning to end, or from end to beginning, or every alternate element or any such definite logic.
  4. Instead, it would be easier to access the elements using a subscript if there is no fixed logic in accessing the elements.

Passing an Entire Array to a Function

At times there arise situations such as array sorting, where we need to pass entire arrays to functions instead of individual elements.
The following table contains the statements showing 2 ways to pass an array as arguments and 2 ways to receive an array as a parameter in the called function.

Consider that the function being called is named ‘foo’. Note that the C language does not have explicit library functions to determine the size of the array being passed. So one needs to make size available to all functions by declaring it at global space or pass the size as another parameter as shown in the table:

Calling statementCalled function prototype
foo(array, size);
or
foo(&array[0], size);
foo(int *array, int size)
or
foo(int array[], int size)

Notice that an array variable is nothing but a pointer to the first element of the array. In both of the calling ways described in the above table, implicitly, just the address of the first element of the array is being passed. Once an array is passed to the function, the array can be traversed in both subscript notation and pointer notation.

One can choose pairs of any combination from the above table to pass the array. The following example illustrates all 4 possibilities from the above table.

#include <stdio.h>
void print_array_1 (int array[], int size) 
{
    for (int i = 0 ; i < size ; i++)
    printf("%d ", array[i]);
}
void print_array_2 (int* array, int size) 
{
    for (int i = 0 ; i < size ; i++)
    printf("%d ", *(array+i));
}
void print_array_3 (int array[], int size) 
{
    for (int i = 0 ; i < size ; i++)
    printf("%d ", array[i]);
}
void print_array_4 (int* array, int size) 
{
    for (int i = 0 ; i < size ; i++)
    printf("%d ", array[i]);
}
int main() 
{
    int array[] = {1, 2, 3, 4, 5};
    print_array_1 (array, 5);
    printf("\n");
    print_array_2 (array, 5);
    printf("\n");
    print_array_3 (&array[0], 5);
    printf("\n");
    print_array_4 (&array[0], 5);
    return 0;
}

Output:

1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5

Notice that in the print_array_2( ) function, the array is being traversed using pointer notation and there is no change in the output.

It is very important to make a note that any changes we make to the array in the called functions will result in changes in the caller function too as we are sending the memory location to the function and not making a copy of it. The arrays we saw in this section are technically called ‘static arrays’ as we would know the size of the array at the compile time. These arrays are stored in the program stack.

Declaration

The syntax to declare a 2 dimensional array is:
data_type array_id[row_size][column_size];
Where,

  1. data_type is any valid primary or secondary data type supported by C.
  2. array_id is any valid identifier for the array.
  3. row_size is the length of the row needed.
  4. col_size is the length of the column
    Example: int array[2][3];
    The above array can be visualized as follows:

Points to note:

  • Indexing of row and column starts from 0 and ends at length-1. In the above example, the row index is in the range [0, 1] and the column index is in the range [0, 2].
  • Indexing of each cell in the two-dimensional array starts with the row index followed by the column index. So to address the 0th row 1st column, the index is [0][1].

Initialization

In order to initialize a value to a memory location in the array, the following is the
Syntax:

array_id [i][j] = value;

Where

  1. array_id is array declared.
  2. i is the row index which is in the range between 0 and row_size – 1
    inclusive.
  3. j is the column index which is in the range between 0 and column_size- 1 inclusive.
  4. value is the value to be initialized at the ith row and jth column of the
    array and should strictly belong to the same data type under which the
    array is declared.

Here is a sample program that stores roll numbers and marks obtained by a student side by side in a matrix.

#include <stdio.h>
int main( ) 
{
    int students[2][2] ;
    students[0][0] = 1;
    students[0][1] = 96;
    students[1][0] = 2;
    students[1][1] = 93;

    printf("Roll number: %d\tMarks: %d\n", students[0][0],
    students[0][1]);
    printf("Roll number: %d\tMarks: %d\n", students[1][0],
    students[1][1]);
    return 0;
}

The output of the above program is:

Roll number: 1 Marks: 96
Roll number: 2 Marks: 93

To understand the memory locations of the array in the above program, let us print the memory locations of the array:

#include <stdio.h>
int main( )
{
     int students[2][2] ;
     students[0][0] = 1;
     students[0][1] = 96;
     students[1][0] = 2;
     students[1][1] = 93;

     printf("%p\t%p\n", &students[0][0], &students[0][1]);
     printf("%p\t%p\n", &students[1][0], &students[1][1]);
     return 0;
}

The output (highlighted last 2 characters for readability):

0x7ffc2873dac0 0x7ffc2873dac4
0x7ffc2873dac8 0x7ffc2873dacc

Note that the size of int is 4 bytes as the program is executed on a 32-bit compiler. By analyzing the above output, it can be concluded that two-dimensional arrays are nothing but single dimensional arrays of length = row size * column size but indexed as rows and columns.

In the previous example, we saw that the arrays are initialized individually which can be a very tedious job and we might need to read the user input and store it in an array. This can be achieved by coupling scanf and loops. The following example illustrates the same:

#include <stdio.h>
int main( )
{
    int students[2][2] ;
    for (int i = 0 ; i < 2 ; i++)
    {
        printf("Enter the roll number: ");
        scanf("%d", &students[i][0]);
        printf("Enter the marks: ");
        scanf("%d", &students[i][1]);
    }
    for (int i = 0 ; i < 2 ; i++)
        printf("Roll number: %d\tMarks: %d\n", students[i][0], students[i][1]);
    return 0;
}

The output of the above program is:

Enter the roll number: 1
Enter the marks: 93
Enter the roll number: 2
Enter the marks: 96
Roll number: 1 Marks: 93
Roll number: 2 Marks: 96

The above example employs a for loop to iterate over rows as there are only 2 columns. When there are numerous columns, one can use nested loops as done in the following example:

#include <stdio.h>
#define ROWS 4
#define COLS 6
int main( )
{
        int array[ROWS][COLS] ;
        printf("Enter %d values: ", ROWS * COLS);
        for (int i = 0 ; i < ROWS ; i++)
        {
            for (int j = 0 ; j < COLS ; j++)
            {
            scanf("%d", &array[i][j]);
            }
            
        }
        printf("\nEntered values:\n");
        for (int i = 0 ; i < ROWS ; i++) {
        for (int j = 0 ; j < COLS ; j++) {
        printf("%d ", array[i][j]);
        }
        printf("\n");
        }
        return 0;
}

The output of the above program is:

Enter 24 values: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Entered values:
1 2 3 4 5 6
7 8 9 10 11 12
13 14 15 16 17 18
19 20 21 22 23 24

Declaration and initialisation

Declaration and initialization of an array can be done in a single statement. The syntax is as follows:

data_type array_id[][] = {
{value_00, value_01, ..., value_0n},
{value_10, value_11, ..., value_1n},
18
…
{value_m0, value_m1, ..., value_mn},
};

Where,

  1. data_type is the data type of the value that has to be stored in the array.
  2. array_id is the identifier of the array being declared and initialized.
  3. value_ij are values of the array, where 0 <= i < desired row size of the array and 0 <= j < desired column size of the array

Another way to declare and initialize is as follows:

data_type array_id[row_size][col_size] = {
    value_00, value_01, ..., value_0n,
    value_10, value_11, ..., value_1n,
    …
    value_m0, value_m1, ..., value_mn,
};

One may notice that just the nested brackets are removed. This method mandates the programmer to mention at least column size if not both row size and column size. This method often leads the compiler to give a warning saying that the brackets are missing. Hence the first method is always prefered.

The following example illustrates declaration and initialisation:

#include <stdio.h>
#define ROWS 3
#define COLS 2
int main( ) {
    int array[ROWS][COLS] = {{1, 2}, {3, 4}, {5, 6}};
    for (int i = 0 ; i < ROWS ; i++) {
        for (int j = 0 ; j < COLS ; j++) {
            printf("%d ", array[i][j]);
        }
        printf("\n");
    }
    return 0;
}

Output:

1 2
3 4
5 6

Please note that the rows and column size mentioned in the declaration are not mandatory as the compiler will determine the size of the rows and columns automatically by the initialisation part.

Jagged arrays

Jagged arrays are two-dimensional arrays where the column length is not constant throughout the rows. The following is a pictorial representation of a jagged array:

         0       1         2        3
    +--------+--------+
0   | [0][0] | [0][1] |
    +--------+--------+--------+
1   | [1][0] | [1][1] | [1][2] |
    +--------+--------+--------+
2   | [2][0] |
    +--------+--------+--------+--------+
3   | [3][0] | [3][1] | [3][2] | [3][3] |
    +--------+--------+--------+--------+

One might intuitively try to create a jagged array of the above sort by the method just discussed in the following way:

int array[][] = {{1, 2}, {3}, {5, 6}};

Please note that this will produce unpredictable results as this is NOT the proper way of creating jagged arrays in C. Jagged arrays can be created using dynamic arrays which we shall see in the latter part of this section.

Pointers and two-dimensional arrays

As mentioned earlier, two-dimensional arrays are nothing but a contiguous collection of single-dimensional arrays. The C language embodies this unusual but powerful capability that it can treat parts of arrays like arrays. More specifically, each row of a two-dimensional array can be thought of as a one-dimensional array. This is a very important fact if we wish to access array elements of a two-dimensional array using pointers.
Thus, the declaration,

int array_id[5][2] ;

can be thought of as setting up an array of 5 elements, each of which is a 1D (one-dimensional) array containing 2 integers. We refer to an element of a 1D array using a single subscript.

Similarly, if we can imagine array_id to be a one-dimensional array then we can refer to its zeroth element as array_id[0], the next element as array_id[1] and so on. More specifically, array_id[0] gives the address of the zeroth one-dimensional array, array_id[1] gives the address of the first one-dimensional array and so on. This fact can be demonstrated by the following program.

#include <stdio.h>
#define ROWS 4
#define COLS 2
int main( )
{
    int array_id[ROWS][COLS] = {
                                { 1234, 56 },
                                { 5678, 33 },
                                { 9012, 80 },
                                { 3456, 78 }
                            } ;
    for (int i = 0 ; i < ROWS ; i++ )
        printf ( "\nAddress of %d th 1-D array = %p", i, array_id[i]) ;

    return 0;
}

The output of the above program when run on a 32-bit compiler:

Address of 0 th 1-D array = 0x7fffe85bda40
Address of 1 th 1-D array = 0x7fffe85bda48
Address of 2 th 1-D array = 0x7fffe85bda50
Address of 3 th 1-D array = 0x7fffe85bda58

Points to note from the above example:

  • Compiler figures out the number of rows (4) and columns (2) from the declaration and initialisation line of code.
  • The compiler allocates the necessary amount of memory (8 bytes for 2 integers in the first row) linearly for the first row (from 40 to 44 for the first element, 44 to 48 for the second element in the above example).
  • Once allocated memory for the first row, the compiler allocates memory for the second row contiguously next to the first row’s memory location (from 48 to 4c for the first element, 4c to 50 for the second element in the above example)
  • The above pattern is followed for the 3rd and 4th row as well.

Pointer access to elements in a 2D array

Similar to the one-dimensional array, there are 2 ways to access the elements of a 2D array: subscript notation and pointer notation. We have been using the subscript access method so far and in this section, we shall peek into the pointer access method.
Syntax:

*(*(array_id + row_index) + column_index)

The evolution of the above syntax from subscript notation is as follows:

array_id[row_index][column_index]
* ( array_id[row_index] + column_index )
* ( * ( array_id + row_index ) + column_index)

Notice that the pointer access method is just a multi-level dereferencing of array element addresses.

Consider the following example illustrating array element access using the pointer method:

#include <stdio.h>
#define ROWS 4
#define COLS 2
int main( )
{
int array[ROWS][COLS] = {
                            { 1234, 56 },
                            { 1212, 33 },
                            { 1434, 80 },
                            { 1312, 78 }
} ;

for (int i = 0 ; i < ROWS ; i++ )
{
    for (int j = 0 ; j < COLS ; j++)
    printf ("\nAddress: %p Value: %d\t", (*(array + i) + j),
    *(*(array + i) + j) );
    printf("\n");
}
return 0;
}

Output when run on 32-bit machine:

Address: 0x7ffc2a646d30 Value: 1234
Address: 0x7ffc2a646d34 Value: 56

Address: 0x7ffc2a646d38 Value: 1212
Address: 0x7ffc2a646d3c Value: 33

Address: 0x7ffc2a646d40 Value: 1434
Address: 0x7ffc2a646d44 Value: 80

Address: 0x7ffc2a646d48 Value: 1312
Address: 0x7ffc2a646d4c Value: 78

Pointer to a 2D array

In this section, we shall explore how we can have a pointer to a 2-dimensional array. Consider the following example:

#include <stdio.h>
#define ROWS 4
#define COLS 2
int main( )
{
int array[ROWS][COLS] = {
                            { 1234, 56 },
                            { 1212, 33 },
                            { 1434, 80 },
                            { 1312, 78 }
                      } ;

int ( *p )[2] ;
int i, j, *pint ;
for ( i = 0 ; i <= 3 ; i++ )
{
    p = &array[i] ;
    pint = p ;
    for ( j = 0 ; j <= 1 ; j++ )
    printf ( "%d ", *( pint + j ) ) ;
    printf ( "\n" ) ;
}
return 0;
}

The output:

1234 56
1212 33
1434 80
1312 78

Here p is a pointer to a single-dimensional array of two integers. Note that the parentheses in the declaration of ‘p’ are necessary. The absence of them would make ‘p’ an array of 2 integer pointers. In the outer for loop each time we store the address of a new one-dimensional array. Thus the first time through this loop p would contain the address of the zeroth 1-D array. This address is then assigned to an integer pointer ‘pint’. Lastly, in the inner ‘for’ loop using the pointer ‘pint’, we have printed the individual elements of the 1-D array to which p is pointing.

The entity pointer to an array is immensely useful when we need to pass a 2-D array to a function. This is discussed in the next section.

Passing 2-D Array to a Function

There are three ways in which we can pass a 2-D array to a function. These are illustrated in the following program.

#include <stdio.h>
#define ROWS 4
#define COLS 2
void display ( int *array)
{
    int i, j ;
    for ( i = 0 ; i < ROWS ; i++ )
    {
        for ( j = 0 ; j < COLS ; j++ )
            printf ( "%d ", * ( array + i * COLS + j ) ) ;
        printf ( "\n" ) ;
    }
        printf ( "\n" ) ;
}
void show ( int ( *array )[COLS] )
{
    int i, j ;
    int *p ;
    for ( i = 0 ; i < ROWS ; i++ )
        {
        p = array + i ;
            for ( j = 0 ; j < COLS ; j++ )
            printf ( "%d ", * ( p + j ) ) ;
            printf ( "\n" ) ;
        }
        printf ( "\n" ) ;
}
void print ( int array[ ][COLS] ) 
{
    for ( int i = 0 ; i < ROWS ; i++ )
    {
        for ( int j = 0 ; j < COLS ; j++ )
        printf ( "%d ", array[i][j] ) ;
        printf ( "\n" ) ;
    }
        printf ( "\n" ) ;
}
int main( )
{
    int array[ROWS][COLS] = {
                            { 1234, 56 }
                            { 1212, 33 },
                            { 1434, 80 },
                            { 1312, 78 }
    } ;
    display(array);
    show(array);
    print(array);
    return 0;
}

The output of the above program is as follows:

1234 56
1212 33
1434 80
1312 78

1234 56
1212 33
1434 80
1312 78

1234 56
1212 33
1434 80
1312 78

display( )

In the display( ) function we have collected the base address of the 2-D array being passed to it in an ordinary int pointer. Then through the two for loops using the expression

* ( array + i * col + j )

we have reached the appropriate element in the array. Suppose i is equal to 2 and j is equal to 3, then we wish to reach the element array[2][3]. A more general formula for accessing each array element would be:

* ( base address + row no. * no. of columns + column no. )

show( )

In the show( ) function we have defined an array to be a pointer to an array of 4 integers through the declaration:

int ( *array )[COLS] ; // Where COLS is 4

To begin with, the array holds the base address of the zeroth 1-D array. This address is then assigned to ‘p’ which is an int pointer, and then using this pointer all elements of the zeroth 1-D array are accessed. Next time through the loop when ‘i’ takes a value of 1, the expression array + i fetches the address of the first 1-D array. This is because the array is a pointer to the zeroth 1-D array and adding 1 to it would give us
the address of the next 1-D array. This address is once again assigned to ‘p’ and using it all elements of the next 1-D array are accessed.

print( )

In the third function print( ), the declaration of the array looks like this:

int array[ ][COLS] ; // Where COLS is 4

This is the same as

int ( *array )[COLS],

where the array is a pointer to an array of 4 integers. The only advantage is that we can now use the more familiar expression array[i][j] to access array elements. We could have used the same expression in the show( ) function as well.

Consider the following example:

#include <stdio.h>
int main( ) 
{
        int *array[4] ; // array of integer pointers
        int p = 101, q = 102, r = 103, s = 104;
        array[0] = &p ;
        array[1] = &q ;
        array[2] = &r ;
        array[3] = &s ;
        
        printf ( "Address of p : %p\n", &p ) ;
        printf ( "Address of q : %p\n", &q ) ;
        printf ( "Address of r : %p\n", &r ) ;
        printf ( "Address of s : %p\n\n", &s ) ;

    for ( int i = 0 ; i <= 3 ; i++ ) 
    {
        printf ( "array[%d] : %p\n", i, array[i] ) ;
        printf ( "Value : %d\n\n", *array[i] );
    }
return 0;
}

Output:

Address of p : 0x7fff0c6dd45c
Address of q : 0x7fff0c6dd460
Address of r : 0x7fff0c6dd464
Address of s : 0x7fff0c6dd468

array[0] : 0x7fff0c6dd45c
Value : 101

array[1] : 0x7fff0c6dd460
Value : 102

array[2] : 0x7fff0c6dd464
Value : 103

array[3] : 0x7fff0c6dd468
Value : 104

As one can observe, arr contains addresses of isolated int variables i, j, k and l. The for loop in the program picks up the addresses present in the array and prints the values present at these addresses.

An array of pointers can even contain the addresses of other arrays. Consider the following example:

#include <stdio.h>
int main( )
{
    int a[ ] = { 0, 1, 2, 3, 4 } ;
    int *p[ ] = { a, a + 1, a + 2, a + 3, a + 4 } ;
    printf ( "%p %p %d", p, *p, * ( *p ) ) ;
    return 0;
}

Output:

0x7ffc5cf1dd10 0x7ffc5cf1dcf0 0

Remember that a subarray of another array is also an array.

Memory allocation

C gives two choices when a programmer wants to reserve memory for arrays. But before diving in, it is advised to go through “Memory layouts in C” under the “Advanced topics in Pointers” of the “Pointers” section.

  1. Static memory allocation
    1. Static memory allocation requires that the declaration and definition of memory be fully specified in the source program. The number of bytes reserved cannot be changed during runtime. This is the technique we have been using to allocate memory for variables, arrays, and pointers.
  2. Dynamic Memory allocation
    1. Dynamic memory allocation uses predefined functions to allocate and release memory for data while the program is running. It effectively postpones the data definition, but not the data declaration, to run time. It is important to note that we can refer to memory allocated in the heap only through a pointer.
      To use dynamic memory allocation, we use either standard data types or derived types that we have previously declared. Unlike static memory allocation, dynamic memory allocation has no identifier associated with it; it has only an address that must be used to access it.

Memory allocation functions.

There are 4 memory management functions:

  1. malloc ( )
  2. calloc ( )
  3. realloc ( )
  4. free ( )

The first three are used for memory allocation and the fourth is used to return memory when it is no longer needed. All 4 functions are found in the standard library file (stdlib.h).

Block memory allocation (malloc)

The malloc function allocates a block of memory that contains the number of bytes specified in its parameter. It returns a void pointer to the first byte of the allocated memory. The allocated memory is not initialized. The malloc ( ) function declaration is
shown below:

void* malloc (size_t size);

The type size_t is defined in several header files including stdio.h. This type is usually an unsigned integer, and by the standard, it is guaranteed to be large enough to hold the maximum address of the computer.
To provide portability, the size specification in malloc’s actual parameter is generally computed using sizeof operator. For example, if we want to allocate an integer in the heap we code the call as shown below:

pInt = malloc (sizeof(int));

If the memory allocation fails, instead of returning a pointer to the first byte of allocated memory, malloc ( ) returns a NULL pointer. An attempt to allocate memory from the heap when memory is insufficient is known as overflow. It is always the best practice for the program to check for memory overflow.

There is one subtlety to note here. As mentioned earlier, the malloc returns a void pointer but often we need a pointer of a specific type. C99 and above versions handle this situation with an implicit type casting but it is always recommended to explicitly typecast the pointer. It is to be done as follows:

pointer = (type *) malloc (size);

For example:

pInt = (int *) malloc (sizeof(int));

The malloc ( ) function has one more potential error. If we call malloc ( ) with a zero size, the results are unpredictable. It may return a NULL pointer, or it may return some other implementation-dependent value. Never call malloc with a zero size.

The following example shows the best way to use malloc ( ) to allocate a block of memory for an integer:

if (!(pInt = (int *) malloc (sizeof(int)))) {
    // No memory is available.
    exit(100);
}
// Memory available

Contiguous memory allocation (calloc)

The second memory allocation function, calloc, is primarily used to allocate memory for arrays. It differs from malloc only in that it sets memory to NULL characters. The calloc function declaration is shown below:

void *calloc (size_t element-count, size_t element_size);

The result is the same for both malloc and calloc when overflow occurs and when a zero size is given. A sample calloc call is illustrated in the following example. Here we attempt to allocate memory for an array of 200 integers:

if (!(pInt = (int *) calloc (200, sizeof(int)))) {
   // No memory is available.
   exit(100);
}
// Memory available

Reallocation of memory (realloc)

The realloc ( ) function can be highly inefficient and therefore should be used advisedly. When given a pointer to a previously allocated block of memory, realloc ( ) changes the size of the block by deleting or extending the memory at the end of the block. If the memory cannot be extended because of other allocations, realloc allocates a completely new block, copies the existing memory allocation to the new allocation, and deletes the old allocation.
The programmer must ensure that any other pointers to the data are correctly changed.

The realloc function declaration is shown below:

void *realloc (void* ptr, size_t newSize);

Releasing memory (free)

Garbage collection, that is, releasing the memory allocated by the above 3 functions once the need is satisfied should be done explicitly by the programmer. This can be done using the free( ) function.
It is an error to free memory:

  1. With a null pointer.
  2. A pointer to a location other than the first element of an allocated block of memory.
  3. A pointer that is different from the pointer that allocated the memory.
  4. Free the same memory location more than once.
    It is also a potential error to refer to memory after it is freed (dangling pointer). The function declaration statement for free is shown below:
void free(void* ptr);

Exercises

  1. Eleven integer values are entered from the keyboard. The first 10 are to be stored in an array. Search for the 11th integer in the array. Write a program to display the number of times it appears in the array.
  2. Read the size of the array and allocate an array dynamically using calloc. Read the array elements and also read a key value. Display the number of times the key has appeared in the array. Do not forget to free the allocated memory before exiting the program.
  3. Implement the Selection Sort, Bubble Sort and Insertion sort algorithms on a set of 10 numbers.
  4. Read the size of the array and allocate an array dynamically using calloc. Read the array elements and sort the array using selection sort, bubble sort and insertion sort. Do not forget to free the allocated memory before exiting the program.
  5. Implement the Sieve of Eratosthenes and print the first 100 prime numbers. The algorithm is as follows:
    1. Fill array values[100] with numbers from 1 to 100.
    2. Starting with the second entry in the array, set all its multiples to zero.
    3. Proceed to the next non-zero element and set all its multiples to zero.
    4. Repeat step 3 till you have set up the multiples of all the non-zero elements to zero.
    5. After step 4, all the non-zero entries left in the array would be prime numbers, so print out these numbers.F. Write a program to copy the contents of one array into another in reverse order.
  6. Write a program to copy the contents of one array into another in reverse order.
  7. Write a program to check if an array is symmetric. That is, in an array array_id check if array_id[0] = array_id[n-1], array_id[1] = array_id[n-2] and so on, where n is the size of the array. Statically initialize the array or take user input.
  8. Find the smallest number, the largest number, and an average of values in a floating numbers array using pointers.
  9. Write a program that performs the following tasks:
    1. initialize an integer array of 10 elements in main( )
    2. pass the entire array to a function modify( )
    3. in modify( ) multiply each element of the array by 4
    4. return the control to main( ) and print the new array of elements in main( )
  10. Write a function to find the norm of a matrix. The norm is defined as the square root of the sum of squares of all elements in the matrix.
  11. Write a program to read 10 coordinates. Each coordinate contains 2
  12. floating-point values X and Y. Print the two farthest points and 2 nearest points in the array.
Categories
Computer Science / Information Technology Language: C

Loops

Oftentimes we find ourselves writing programs to do repetitive tasks. After all, the use of a computer is that it can do the same task a million times a second with 0 margins of error. Loops play a very important role in achieving this objective. There are three types of loops C provides:

  1. for
  2. while
  3. do-while

These three types can be differentiated as entry controlled (for and while) and exit control (do-while) but more on this later.

For Loop

The for loop is the most popular way of using a loop instruction simply because of its ease. The syntax of the for loop is as follows:

for (initialisation ; condition ; updation)
{
      Statements to execute…
}

The following example illustrates a way to use for loop:

#include <stdio.h>
int main() 
{
    int i;
    for (i = 1 ; i <= 10 ; i++)
    printf("%d ", i);
    return 0;
}

The above program prints the first 10 natural numbers. The following steps elucidate how the for statement gets executed:

  • When the for statement is executed for the first time, the value of counter ‘i’ is set to an initial value 1. – Now the condition is checked. The condition set is i <= 10 which happens to be true for the iteration.
  • Since the condition is satisfied, the body of the loop, i.e., printf() in this case, is executed.
  • Once the body of the loop is executed, the update part of the for statement is executed where i gets incremented by 1 and then the condition is checked.
  • These iterations go on until the condition fails (or the control encounters a ‘break’ statement which we shall see in a short while).

Points to note

  • The scope of the variable ‘i’ is the main() function.
  • For loop is entry controlled loop: the condition is checked at the beginning of the loop and not at the end
  • The minimum number of times the body of the loop gets executed is zero. It is the case where the condition fails in the very first iteration.
  • The for loop allows us to specify three things about a loop in a single line, hence it is convenient to use:
    • Loop counter declaration/initialisation.
    • Loop termination condition.
    • Incrementation of loop counter at the end of each iteration of the loop.
  • The following flow chart should summarize the discussion:
for loop flow diagram

There is one more clean way to do it which is as follows:

#include <stdio.h>
int main()
{
    for (int i = 1 ; i <= 10 ; i++)
    printf("%d ", i);
    return 0;
}

Here, though the program produces the same output as earlier, the scope of the variable ‘i’ is just the for loop.
Similar to conditional statements such as if, else-if and if-else-if ladder, the default scope of any loop is the line immediately next to it. In the above example, it’s the printf() statement. If there are multiple sets of statements to be executed inside of the loops, one can use brackets
enclosure which is demonstrated in the following example:

#include <stdio.h>
int main() {
    for (int i = 1 ; i <= 10 ; i++)
    {
        printf("%d", i);
        printf("\n");
    }
    return 0;
}

It is important to note that the initialization, testing and incrementation part of a for loop can be replaced by any valid expression. For example:

int i = 0;
for (printf("Hello") ; i == 0 ; i++);

The above snippet will print “Hello”.

Multiple initialisations in for loop

The initialisation expression of the for loop can contain more than one statement separated by a comma. For example,

for ( i = 1, j = 2 ; j <= 10 ; j++ )

Multiple statements can also be used in the incrementation expression of for loop; i.e., you can increment (or decrement) two or more variables at the same time. However, only one expression is allowed in the test expression. This expression may contain several conditions linked together using logical operators.


The use of multiple statements in the initialisation expression also demonstrates why semicolons are used to separate the three expressions in the for loop. If commas had been used, they could not also have been used to separate multiple statements in the initialisation expression, without confusing the compiler.

While loops

Syntax:

Initialisation;
while (condition)
{
     statement1;
     statement2;
…
     statementn;
     updation;
}

}
Where,

  1. The initialisation is the declaration-assignment of a loop counter variable.
  2. Condition is the counter itself or a boolean expression involving the counter.
  3. Updation is an operation to alter the value of the loop counter variable.
  4. Statements inside the block of while constituting the ‘body’ of the while loop.

Points to note:

  • The statements in the body of the loop keep on getting executed till the condition remains true. When the condition is false, the control comes out of the loop and executes the first statement that follows the body of the while loop.
  • The condition can be replaced with any other valid expression which evaluates to a boolean true or false.
  • The default scope of the while loop is the single line following the loop statement. To override it, one must enclose the statements in a pair of parentheses as shown in the syntax.
  • Note that there is no semicolon at the end of the while loop. If there is a semi-colon, there is a high chance of the loop going into an infinite execution state as the updation of the loop counter is not happening.
  • If the condition of the while loop is a comparison, make sure that the elements being compared are having the same range bounds and the size of the elements is also the same. For example,
int i = 0;
while (i <= 32768) 
{
    printf("%d ", i);
    i++;
}
  • One might think that the above snippet would print integers from 0 to 32768 but the range of an int in a 32-bit compiler is till 32767 after which it becomes -32767 after increment, which certainly satisfies the condition. Hence this goes into an infinite loop.
  • It is not recommended to modify a variable inside of the while condition but one can do it without any syntactical error as shown in the following example:
int i = 0;
while (++i <= 10)
printf("%d ", i);
  • In this example, the value i pre increments in the condition of while. This prints integers from 1 to 10 inclusive. This is not recommended as it would be hard to read. One might easily assume that it prints from 0 to 10 inclusive by noticing that i is initialized to 0.
  • One more thing to notice in the previous example is that just by changing pre-increment to post-increment, the output would change from printing 1 to 10, to printing 1 to 11 inclusive.

The following flow chart should help in understanding the operation of while loop:

The following example prints the first 10 natural numbers using a while loop:

int n = 1;
while(n <= 10)
{
     printf("%d ", n);
     n++;
}

While loop is extensively used where there are a set of operations to be repeated a fixed number of times. For example, calculate the gross salaries of 10 different employees or convert cm to mm 10 times.

Do-while loop

The loops discussed till now are ‘entry controlled loops’ which means the condition is checked at the entry before executing the body of the loop for the first time. Ideally, a do-while loop is used when one does not know the number of iterations needed and the minimum number of times the body of the loop is to be executed is 1. The syntax is as follows:

do {
   statement(s);
} 
while( condition );

Notice that for a do-while loop, the pair of parenthesis is necessary. This is an ‘exit controlled loop’ as the condition is checked at the end of the loop. As discussed earlier, the do-while loop executes the statements in the body at least once while other loops the minimum number of
iterations are 0. This is illustrated clearly in the following 2 snippets with the same condition and body of the loop:

a. Using a do-while loop:

do {
    printf ( "Hello world") ;
} while ( 4 < 1 ) ;

Output: Hello world

b. Using while loop

while ( 4 < 1 )
    printf ( "Hello world") ;

The above loop produces no output.


The following example reads a number and prints the entered number’s square until the user wants:

do {
printf ( "Enter a number " ) ;
scanf  ( "%d", &num ) ;
printf ( "square of %d is %d", num, num * num ) ;
printf ( "\nWant to enter another number y/n " ) ;
scanf  ( " %c", &another ) ;
} while ( another == 'y' ) ;

The same output can be achieved using while and for. But the code wouldn’t be as ‘elegant’ as do-while.

The following is the flow chart of the do-while loop:

Nested loops

Nested loops are common in programming. Having one loop inside the block of another loop comes in handy for so many algorithmic applications such as sorting, reading matrix elements etc. It is perfectly common to place one loop inside the other to any level of nesting
and with various combinations of all 3 types of loops.

Consider one needs to print the following matrix:

00 01 02 03
10 11 12 13
20 21 22 23
30 31 32 33

It would be a terrible mess and a bad idea to do it without nested loops. The following snippet achieves the above output:

for (int i = 0 ; i < 4 ; i++)
{
    for (int j = 0 ; j < 4 ; j++)
        printf("%d%d ", i, j);
    printf("\n");
}

Notice that the scope of the loops is inherited, that is, by default, it’s the immediate line after the loop and it can be overridden with a pair of parenthesis.

The break statement

The break statement can be used with the loop to terminate the execution and continue with the execution of statements next to the loop. In other words, the break statement is used to ‘jump out of the loop’.

When a break is encountered inside any loop, control automatically passes to the first statement after the loop. A break is usually associated with an if conditional. Consider the following example which tells if the entered number is a prime number or not.

#include <stdio.h>
#include <math.h>
#include <stdbool.h>
int main()
{
    int n;
    bool isPrime = true;
    scanf("%d", &n);
    for (int i = 2 ; i <= sqrt(n) ; i++)
    {
        if (n % i == 0)
        {
            isPrime = false;
            break;
        }
    }
    printf("%s", isPrime ? "Prime" : "Not Prime");
    return 0;
}

In this program, the moment n % i evaluates to zero, (i.e. n is exactly divisible by i) the message “Not Prime” is printed and the control breaks out of the while loop. There are two ways the control could have reached outside the while loop:

  1. It jumped out because the number proved to be not a prime.
  2. The loop came to an end because the value of i became equal to the square root of n.

When the loop terminates in the second case, it means that there was no number between 2 and the square root of n that could exactly divide n. That is, n is indeed a prime. If this is true, the program should print out the message “Prime”. Notice that the statements above the break statement would get executed. And the statements after the break statement would get skipped.

The keyword break breaks the control only from the loop in which it is placed. Consider the following nested loop example:

#include <stdio.h>
int main()
{
    int i = 1 , j = 1;
    while ( i <= 10 ) {
        i++;
        while ( j <= 20 ) {
            j++;
            if ( j == 15 )
                break ;
            else
                printf ( "%d %d\n", i, j ) ;
        }
    }
    return 0;
}
2 2
2 3
2 4
2 5
2 6
2 7
2 8
2 9
2 10
2 11
2 12
2 13
2 14
3 16
3 17
3 18
3 19
3 20
3 21

In this program when j equals 15, the break takes the control outside the inner while only, since it is placed inside the inner while.

The continue statement

The continue statement takes the control to the beginning of the loop, bypassing the statements inside the loop, which have not yet been executed in that particular iteration. When a continue statement is encountered inside any loop, control automatically passes to the beginning of the loop. A continue is usually associated with an if.

#include <stdio.h>
int main() 
{
    int i, j ;
    for ( i = 1 ; i <= 2 ; i++ ) 
    {
       for ( j = 1 ; j <= 2 ; j++ )
       {
           if ( i == j )
               continue ;
           printf ( "%d %d\n", i, j ) ;
       }
    }
    return 0;
}

Output

1 2
2 1

Note that when the value of i equals that of j, the continue statement takes the control to the for loop (inner) bypassing the rest of the statements pending execution in the for loop (inner).

A continue statement in the do-while loop sends the control straight to the test at the end of the loop and not to the beginning of the loop.

Exercises

  1. Write a program to print all prime numbers from 1 to 300. (Hint: Use nested loops, break
    and continue)
  2. Write a program to generate all combinations of 1, 2 and 3.
  3. According to a study, the approximate level of intelligence of a person can be calculated using the following formula:
    i = 2 + ( y + 0.5 x )
    Write a program, which will produce a table of values of i, y and x, where y varies from 1 to 6, and, for each value of y, x varies from 5.5 to 12.5 in steps of 0.5.
  4. Write a program to print the following pattern:
  1. Write a program to read an integer number and print its table in the following format:
    12 * 1 = 12
    12 * 2 = 24

    12 * 10 = 120
  2. Write a program to print the following pattern:

    1
    2 3
    4 5 6
    7 8 9 10
  1. A machine is purchased which will produce an earning of Rs. 1000 per year while it lasts. The machine costs Rs. 6000 and will have a salvage of Rs. 2000 when it is condemned. If 12% per annum can be earned on alternative investments what would be the minimum life of the machine to make it a more attractive investment compared to alternative investments?
  1. Write a program in C to read x and the number of terms and display the sum of the
    series [ 1+x+x^2/2!+x^3/3!+….].
    Test data:
    x:3
    Terms:5
    Sum:16.375000

Categories
Computer Science / Information Technology Language: C

Case Control

Case-control statement – Switch

The control statement that allows us to make a decision from the number of choices is called a switch, or more correctly a switch-case-default since these three keywords go together to make up the control statement. A Switch is a composite statement used to make a decision between many distinct alternative possibilities of a constant presence in a variable or yielded from an expression. The syntax is as follows:

switch ( integer expression )
{
    case constant 1 :
    do this ;
    case constant 2 :
    do this ;
    case constant 3 :
    do this ;
    default :
    do this ;
}

The integer expression following the keyword switch is any C expression that will yield an integer value. It could be an integer constant like 1, 2 or 3, or an expression that evaluates to an integer. The keyword case is followed by an integer or a character constant. Each constant in each case must be different from all the others. The “do this” lines in the above form of switch represent any valid C statement.

A default label is a special form of the case label. It is executed whenever none of the other case values matches the value in the switch expression. Note, however, that default is not mandatory.

The need for a break

switch (i)
 {
    case 1: printf("It is 1\n");
    case 2: printf("It is 2\n");
    case 3: printf("It is 3\n");
    default: printf("None of the above");
}

The output of the above program is:
It is 1
It is 2
It is 3
None of the above

Clearly, this is not the desired output. The desired output is just “It is 1”. The program prints case 2 and 3 and the default case. It is important to note that it is the responsibility of the programmer to break out of the switch block once the intended case is executed. The following snippet illustrates the same:

switch (i) 
{
     case 1: printf("It is 1\n"); break;
     case 2: printf("It is 2\n"); break;
     case 3: printf("It is 3\n"); break;
     default: printf("None of the above");
}

Note that there is no need for a break statement after the last statement (in this case, the default statement) since the control comes out of the switch anyway. The operation of the switch is shown below in the form of a flowchart for a better understanding.

Case-control flow diagram

Points to note

  1. Cases can be sorted in any order. Not necessarily in incremental order as shown in previous examples. But note that it is advised to keep a logical order and the default statement at the beginning or at the end for better readability of the program written. The following switch snippet is equally valid and does the same job:
switch (i) 
{
     case 3: printf("It is 3\n"); break;
     case 1: printf("It is 1\n"); break;
     default: printf("None of the above"); break;
     case 2: printf("It is 2\n");
}
  1. One can use char values in a case as internally they are integer values mapping to ASCII table. The following snippet shows how this is done:
char ch = 'A';
switch (ch) 
{
    case 'A': printf("Apple\n"); break;
    case 'B': printf("Ball\n"); break;
    default: printf("None of the above"); break;
    case 'C': printf("Cat\n");
}

Output:
Apple

  1. char values in switch cases can be used interchangeably with their
    corresponding integer values. This is illustrated in the following snippet:
char ch = 'A';
switch (ch) 
{
    case 65: printf("Apple\n"); break;
    case 'B': printf("Ball\n"); break;
    default: printf("None of the above"); break;
    case 'C': printf("Cat\n");
}
  1. At times we may want to execute a common set of statements for multiple cases. How this can be done is shown in the following example.
char ch = 'A';
switch (ch) 
{
    case 'a':
    case 'A': printf("Apple\n"); break;
    case 'b':
    case 'B': printf("Ball\n"); break;
    default : printf("None of the above"); break;
    case 'c':
    case 'C': printf("Cat\n");
}

In the above snippet, the output remains the same regardless of the case of the character. Here, we are making use of the fact that once a case is satisfied the control simply falls through the case till it doesn’t encounter a break statement.

  1. Even if there are multiple statements to be executed in each case there is no need to enclose them within a pair of braces (unlike if, and else).
  2. Every statement in a switch must belong to some case or the other. If a statement doesn’t belong to any case the compiler won’t report an error. However, the statement would never get executed. For example, in the following program the printf( ) never goes to work.
char ch = 'A';
switch (ch) 
{
    printf("Inside switch");
    case 'A': printf("Apple\n"); break;
    case 'B': printf("Ball\n"); break;
    default: printf("None of the above"); break;
    case 'C': printf("Cat\n");
}
  1. If we have no default case, then the program simply falls through the entire switch and continues with the next instruction (if any,) that follows the closing brace of the switch, practically producing no effect.
  2. The disadvantage of the switch is that one cannot have a case in a switch which looks like case i <= 20 :

This is why a switch is not entirely replaceable with the if family of control statements. Even float is not allowed in the switch. The advantage of switch over if is that it leads to a more structured program and the level of indentation is manageable, more so if there are multiple statements within each case of a switch.

  1. switch works faster than an equivalent if-else ladder. Because the compiler generates a jump table for a switch during compilation. As a result, during execution, it simply refers to the jump table to decide which case should be executed, rather than actually checking which case is satisfied. As against this, if-elses are slower because they are evaluated at execution time. If on the other hand the conditions in the if-else were simple and less in number then if-else would work out faster than the lookup mechanism of a switch. Hence a switch with two cases would work slower than an equivalent if-else. Thus, you as a programmer should take a decision on which of the two should be used when.
  2. We can check the value of any expression in a switch. Thus the following switch statements are legal
switch ( i + j * k )
switch ( 23 + 45 % 4 * k )
switch ( a < 4 && b > 7 )

Expressions can also be used in cases provided they are constant expressions. Thus, case 3 + 7 is correct, however, case a + b is incorrect.

  1. The break statement when used in a switch takes the control outside the switch. However, the use of continue will not take control at the beginning of the switch as one is likely to believe.
  2. In principle, a switch may occur within another, but in practice, it is rarely done. Such statements would be called nested switch statements.
  3. The switch statement is very useful while writing menu-driven programs.

Exercises

  1. Write a program that tests the value of an integer num1. If the value is 10, square num1. If it is 9, read a new value into num1. If it is 2 or 3, multiply num1 by 99 and print out the result.
  2. Write a function called day_of_week that given an integer between 0 and 6, prints the corresponding day of the week. Assume that the first day of the week
    (0) is Sunday.
  3. Write a function called month_of_year that, given an integer between 1 and 12, prints the corresponding month of the year.
  4. Write a function called parkingCharge that, given the type of the vehicle (c for car, b for bus, t for truck) and the hours a vehicle spent in the parking lot, returns the parking charge based on the rates shown below:
    a. car : $2 per hour.
    b. bus : $3 per hour.
    c. truck : $4 per hou
Categories
Computer Science / Information Technology Language: C

Bitwise Operators

The smallest element in memory on which we are able to operate as yet is a bit. A bit is the building block of any data. In this section, we shall see how to operate on bits. Being able to operate on a bit level can be very important in programming, especially when a program must interact directly with the hardware. This is because the programming languages are byte-oriented, whereas hardware tends to be bit-oriented.

Before delving into bitwise operators, it is important to understand the bit numbering scheme in integers and characters.

Bit numbering scheme

Bits are numbered from zero onwards, increasing from left to right. This is shown below:

showbits()

This is a function which will be used throughout the section. We shall see the internal implementation of this function at a later part of the notes. For now, assume that the function prints the bits (binary format) of the number sent as an argument. For example:

int main() 
{
    for ( int j = 0 ; j <= 5 ; j++ ) {
        printf ( "\nDecimal %d is same as binary ", j ) ;
        showbits ( j ) ;
    }
    return 0;
}

Should produce the following output:

Decimal 0 is same as binary 0000000000000000
Decimal 1 is same as binary 0000000000000001
Decimal 2 is same as binary 0000000000000010
Decimal 3 is same as binary 0000000000000011
Decimal 4 is same as binary 0000000000000100
Decimal 5 is same as binary 0000000000000101

Bitwise operators

OperatorDescription
~
>>
<<
&
|
^
One’s Complement
Right shift
Left shift
Bitwise AND
Bitwise OR
Bitwise XOR
Bitwise operators in C

The above table contains one of the C’s powerful features: a set of bit manipulation operators. These permit the programmer to access and manipulate individual bits within a piece of data. These operators can operate on all primitive data types available in C.

One’s Complement

On taking one’s complement of a number, all 1’s present in the number is changed to 0’s and all 0’s are changed to 1’s. Examples:

  1. One’s complement of 1010 is 0101.
  2. One’s complement of 1111 is 0000.
  3. One’s complement of 65 is -66
    • The binary equivalent of 65 is 0000 0000 0100 0001.
    • One’s complement of the above binary is 1111 1111 0100 0001.
    • The compiler always displays the two’s complement of a negative number. The decimal equivalent of the two’s complement of the above binary is -66.

A program example:

#include <stdio.h>
int main()
{
    int i = 5;
    while (i--) {
    printf ( "\nDecimal: %d\tBinary:\t", i ) ;
    showbits ( i ) ;
    printf ( "\nOne’s complement: ") ;
    showbits ( ~i ) ;
    }
    return 0;
}

Output:

Decimal: 4 Binary: 0000000000000100
One’s complement: 1111111111111011
Decimal: 3 Binary: 0000000000000011
One’s complement: 1111111111111100
Decimal: 2 Binary: 0000000000000010
One’s complement: 1111111111111101
Decimal: 1 Binary: 0000000000000001
One’s complement: 1111111111111110
Decimal: 0 Binary: 0000000000000000
One’s complement: 1111111111111111

Right shift operator

The right shift operator is represented by >>. It needs two operands. It shifts each bit in its left operand to the right. The number of places the bits are shifted depends on the integer following the operator (i.e. its right operand)
Example:

  1. val >> 2 would shift all bits in val two places to the right.
    • If val is 5, then:
      • Val initially contains 0000 0101
      • The right shift results in the bits moving towards the right: 0000 0001.
  2. val >> 5 would shift all bits 5 places to the right.

Points to note

  • Note that as the bits are shifted to the right, blanks are created on the left. These blanks must be filled somehow. They are filled with zeros.
  • Note that right shifting once is the same as dividing it by 2 and ignoring the remainder. Thus,
    • 64 >> 1 gives 32
    • 64 >> 2 gives 16
    • 128 >> 2 gives 32
    • 27 >> 1 gives 13
    • 49 >> 1 gives 24
  • In the explanation a >> b if b is negative the result is unpredictable.
  • If a is negative then its leftmost bit (sign bit) would be 1. On some computers, right shifting would result in extending the sign bit.
    • For example, if a contains -1, its binary representation would be
      1111111111111111Without sign extension, the operation a >> 4 would be 0000111111111111.
#include <stdio.h>
int main() 
{
    int i = 1234;
    printf ( "\nDecimal %d is same as binary ", i ) ;
    showbits ( i ) ;
    for ( int j = 0 ; j <= 5 ; j++ ) {
    printf ( "\n%d right shift %d gives ", i, j ) ;
    showbits ( i >> j ) ;
    }
    return 0;
}

Output

Decimal 1234 is same as binary 0000 0100 1101 0010
1234 right shift 0 gives 0000 0100 1101 0010
1234 right shift 1 gives 0000 0010 0110 1001
1234 right shift 2 gives 0000 0001 0011 0100
1234 right shift 3 gives 0000 0000 1001 1010
1234 right shift 4 gives 0000 0000 0100 1101
1234 right shift 5 gives 0000 0000 0010 0110

Left shift operator

The left shift operator is represented by <<. It needs two operands. It shifts each bit in its left operand to the left. The number of places the bits are shifted depends on the integer following the operator (i.e. its right operand) Example:

  1. val << 2 would shift all bits in val two places to the left.
    • If val is 5, then:
      • Val initially contains 0000 0101
      • The left shift results the bits moved towards left: 0001 0100
  2. val << 5 would shift all bits 5 places to the left.

Points to note

  • Note that as the bits are shifted to the left, blanks are created on the right. These blanks must be filled somehow. They are filled with zeros.
  • Note that left shifting once is the same as multiplying it by 2. Thus,
    • – 64 << 1 gives 128
    • – 64 << 2 gives 256
  • In the explanation a << b if b is negative the result is unpredictable.

A program example:

#include <stdio.h>
int main() 
{
    int i = 1234;
    printf ( "\nDecimal %d is same as binary ", i ) ;
    showbits ( i ) ;
    for ( int j = 0 ; j <= 5 ; j++ )
    {
    printf ( "\n%d left shift %d gives ", i, j ) ;
    showbits ( i << j ) ;
    }
    return 0;
}

Output

Decimal 1234 is same as binary 0000 0100 1101 0010
1234 left shift 0 gives 0000 0100 1101 0010
1234 left shift 1 gives 0000 1001 1010 0100
1234 left shift 2 gives 0001 0011 0100 1000
1234 left shift 3 gives 0010 0110 1001 0000
1234 left shift 4 gives 0100 1101 0010 0000
1234 left shift 5 gives 1001 1010 0100 0000

Bitwise AND Operator

This operator is represented as & (not to be confused with &&, the logical AND operator). The & (an ampersand) operator operates on two operands and it is associative, that is, the order of the operands doesn’t change the result. While operating upon these two operands they are compared on a bit-by-bit basis. Hence both the operands must be of the same type. The second operand is often called an AND mask. The & operator operates on a pair of bits to yield a resultant bit. The rules (also called as truth table) that decide the value of the resultant bit are shown below:

+-----------+-----------+--------+
| Operand 1 | Operand 2 | Result |
+-----------+-----------+--------+
|     0     |     0     |    0   |
|     0     |     1     |    0   |
|     1     |     0     |    0   |
|     1     |     1     |    1   |
+-----------+-----------+--------+

The easier way to remember the truth table is that if both the operands are 1 then and then only the result will be 1, else 0.
Example:

+-----------+---+---+---+---+---+---+---+---+
| Operand 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 |
+-----------+---+---+---+---+---+---+---+---+
| Operand 2 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 |
+-----------+---+---+---+---+---+---+---+---+
| Result    | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 |
+-----------+---+---+---+---+---+---+---+---+

Thus, it must be clear that the operation is being performed on individual bits, and the operation performed on one pair of bits is completely independent of the operation performed on the other pairs.

The best use-case example of the AND operator is to check whether a particular bit or a set of bits of an operand is ON or OFF. Consider an operand whose 0th bit needs to be inspected to be 1 or not. All we need to do is AND the operand with another operand (the bitmask) whose 0th bit is set to 1, that is 00000001. If the result is equal to the bitmask, then the operand has the 0th bit set to 1. If the result is 0, then the 0th bit of the operator is set to 0. It is illustrated as follows:

+----------+---+---+---+---+---+---+---+---+
| Operand  | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 |
+----------+---+---+---+---+---+---+---+---+
| Bit mask | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
+----------+---+---+---+---+---+---+---+---+
| Result   | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+----------+---+---+---+---+---+---+---+---+

As the result is 0, the 0th bit of the operand is set to 0. A program example:

#include <stdio.h>
int main() {
    int i = 65, j ;
    printf ( "i: %d", i ) ;
    j = i & 32 ;
/*
		    +---------------------------+
		    |   Binary Representation   |
+---------------+---+---+---+---+---+---+---+---+
|     i = 65    | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
+---------------+---+---+---+---+---+---+---+---+
| Bit mask = 32 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
+---------------+---+---+---+---+---+---+---+---+
|     Result    | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+---------------+---+---+---+---+---+---+---+---+
*/
    if ( j == 0 )
        printf ( "\nFifth bit is off" ) ;
    else
        printf ( "\nFifth bit is on" ) ;
    j = i & 64 ;
/*
		    +---------------------------+
		    |   Binary Representation   |
+---------------+---+---+---+---+---+---+---+---+
|     i = 65    | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
+---------------+---+---+---+---+---+---+---+---+
| Bit mask = 64 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
+---------------+---+---+---+---+---+---+---+---+
|     Result    | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
+---------------+---+---+---+---+---+---+---+---+
*/
    if ( j == 0 )
        printf ( "\nSixth bit is off" ) ;
    else
        printf ( "\nSixth bit is on" ) ;
    return 0;
}

Output:

i: 65
Fifth bit is off
Sixth bit is on

Bitwise AND can also be used to turn off a particular bit in a given number. To do this, all the bits in the bitmask need to be set to 1 except for the bit position which we need to set 0 in the operand. For example, if the operand is 01000001 and we need to set the bit in the 0th
position to 0, the bitmask should be 11111110. The result of the AND operation of these two will give the desired result: 01000000.

Bitwise OR operator

Bitwise OR operator is represented as | (a pipe). The truth table of OR is as follows:

+-----------+-----------+--------+
| Operand 1 | Operand 2 | Result |
+-----------+-----------+--------+
|     0     |     0     |    0   |
|     0     |     1     |    1   |
|     1     |     0     |    1   |
|     1     |     1     |    1   |
+-----------+-----------+--------+

Notice that if any one of the operands of the OR operator is 1, the result is 1. Example:

+-----------+---+---+---+---+---+---+---+---+
| Operand 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 |
+-----------+---+---+---+---+---+---+---+---+
| Operand 2 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 |
+-----------+---+---+---+---+---+---+---+---+
| OR Result | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 |
+-----------+---+---+---+---+---+---+---+---+

Bitwise OR operator is usually used to put ON a particular bit in a number. To do this, all the bits in the second operand need to be set to 0 except for the bit position which we need to set 1 in the first operand. For example, if the operand is 01000001 and we need to set the bit in
the 2nd position to 1, the bitmask should be 00000010. The result of the OR operation of these two will give the desired result: 01000011.

Bitwise XOR operator

The XOR operator is represented as ^ and is also called an Exclusive OR Operator. XOR returns 1 only if one of the two bits is 1. The truth table for the XOR operator is given below.

+-----------+-----------+------------+
| Operand 1 | Operand 2 | XOR Result |
+-----------+-----------+------------+
|     0     |     0     |      0     |
|     0     |     1     |      1     |
|     1     |     0     |      1     |
|     1     |     1     |      0     |
+-----------+-----------+------------+

XOR operator is used to toggle a bit ON or OFF. A number XORed with another number twice gives the original number. This is shown in the following program.

#include <stdio.h>
int main() {
    int b = 45 ;
    b = b ^ 6 ;
/*
```
+-----------+---+---+---+---+---+---+---+
| Operand 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 |
+-----------+---+---+---+---+---+---+---+
| Operand 2 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
+-----------+---+---+---+---+---+---+---+
|XOR Result | 0 | 1 | 0 | 1 | 0 | 1 | 1 |
+-----------+---+---+---+---+---+---+---+
```
*/
    printf ( "\n%d", b ) ; /* this will print 43 */
    b = b ^ 12 ;
    printf ( "\n%d", b ) ; /* this will print 39 */
    return 0;
}

The showbits() function

We have been using the showbits() function throughout the text and now, as we have sufficient knowledge of bitwise operators, let’s have a look at the definition of this mighty function which is very useful and important:

void showbits(int number) 
{
    for (int i = 15 ; i >= 0 ; i--)
    {
        printf("%d", (number >> i) & 1);
        if (i % 4 == 0)
            printf(" ");
    }
}

The given number is shifted right bit by bit and a bitwise AND operation is performed with 1. This will print 1 if the bit is set to 1 at a location in number, else 0. Trace the program for better understanding.

Exercises

  1. Write a C program to count trailing zeros in a binary number.
  2. Write a C program to flip bits of a binary number using the bitwise operator.
  3. Write a C program to count total zeros and ones in a binary number.
  4. Write a C program to rotate bits of a given number.
  5. Modify showbits() function to return an integer containing the binary number of the input number.
  6. Write a C program to swap two numbers using the bitwise operator.
  7. Write a C program to check whether a number is even or odd using a bitwise operator.
  8. Write a C program to get the nth bit of a number

You cannot copy content of this page