Note: I've just migrated to a different physical server to run Spivey's Corner,
with a new architecture, a new operating system, a new version of PHP, and an updated version of MediaWiki.
Please let me know if anything needs adjustment! – Mike

C – a very quick guide

Copyright © 2017–2023 J. M. Spivey
Revision as of 09:57, 11 March 2020 by Mike (talk | contribs) (Created page with "[Adapted from Tanenbaum's 'Introduction to C' ({{smallcaps|Minix}} book, first edition).] C was invented by Dennis Ritchie of AT&T Bell Laboratories to provide a high-level l...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

[Adapted from Tanenbaum's 'Introduction to C' (Minix book, first edition).]

C was invented by Dennis Ritchie of AT&T Bell Laboratories to provide a high-level languae in which unix could be programmed. It is not widely used for many other applications as well. C is especially popular with systems programmers because it allows programs to be expressed simply and concisely. The definitive work describing C is The C Programming Language by Kernighan and Ritchie (1978, second ed. 19??).

In this guide we will attempt to provide enough of an introduction to C that someone who is familiar with high-level languages such as Pascal, PL/I, or Modula--2 will be able to understand most of the code in the course. Features of C not used in the course are not discussed here. Numerous subtle points are omitted. Emphasis is on reading C, not writing it.

Fundamentals of C

A C program is made up of a collection of procedure (often called functions, even when they do not return values). These procedures contain declarations, statements, and other elements that together tell the computer to do something. The following is a little procedure that declares three integer variables and assigns them all values.

void main(void) {        /* this is a comment */
     int i, j, k;        /* declaration of 3 integer variables */

     i = 10;             /* set i to 10 (decimal) */
     j = i + 13;         /* set j to i + 13 */
     k = j * j + 0xff    /* set k to j * j + 0xff (hexadecimal) */
}

The procedure's name is main. It has no formal parameters, as indicated by the keyword void between the parentheses. Its body is enclosed within braces (curly brackets). This example shows that C has variables, and that these variables must be delcared before being used. C also has statements, in this example, assignment statements. All statements must be terminated by semicolons (unlike Pascal, which uses semicolons between statements, not after them). Comments are started by the /* symbol and ended by the */ symbol, and may extend over multiple lines.

The procedure contains three constants. The constants 10 and 13 are ordinary decimal constants. The constant 0xff is a hexadecimal constant (equal to 255 decimal). Hexadecimal constants always begin with 0x. Both decimal and hexadecimal constants are commonly used in C. [We avoid octal constants, beginning with a leading zero.]

Basic data types

C has two principal data types: integer and character, written int and char, respectively. There is no Boolean data type. Instead, integers are used, with 0 meaning false and anything else meaning true. C also has floating point types, but we shall not use them.

The type int may be qualified with the "adjectives" short, long, or unsigned, which determine the (compiler dependent) range of values. Most ARM compilers use 32-bit integers of int, 16-bit integers for short int, and 64-bit integers for long int. Unsigned integers on the ARM range from 0 to 232-1, rather than -231 to 231-1, as ordinary integers do. Characters are 8 bits.

[We do not use register.]

Some delarations are shown in the following.

int i;                    /* one integer */
short int z1, z2;         /* two short integers */
char c;                   /* one character */
unsigned short int k;     /* one unsigned short integer */
long flag_pole;           /* the 'int' may be omitted */

Conversions between types are allowed. For example, the statement

flag_pole = i;

is allowed even though i is an integer and flag_pole is a long. In many cases when converting between types it is necessary or useful to force one type to another. This can be done by putting the target type in parentheses in front of the expression to be converted, as in

p((long) i);

to convert the integer i to a long before passing it as a parameter to a procedure p, which expects a long.[1]

One thing to watch out for when converting between types is sign extension. When converting a character to an integer, some compilers treat characters as being signed, that is, from -128 to 127, whereas others treat them as being unsigned, that is, from 0 to 255. A statement like

i = c & 0xff;

converts c (a character) to an integer and then performs a Boolean AND (the ampersand) with the hexadecimal constant 0xff. The result is that the upper 24 bits are set to zero, effectively forcing c to be treated as an unsigned 8-bit quantity, in the range 0 to 255.

Constructed types

In this section we will look at four ways of building up ore complex data types: arrays, structures, unions, and pointers. An array is a collection of items if the same type. All arrays in C start with element 0. The declarations

int a[10];

declares an array, a, with 10 integers, referred to as a[0] through a[9]. Two, three, and higher-dimensional arrays exist, but we will rarely use them.

A structure is a collection of variables, usually of different types. A structure in C is similar to a record in Pascal. The declaration

struct { int i; char c; } s;

declares s to be a structure containing two members, an integer i, and a character c. To assign the member i the value 6, one would write

s.i = 6;

where the dot operator indicates that a member is being selected from a structure.

A union is also a collection of members, except that at any one moment, it can only hold one of them. The declaration

union { int i; char c; } u;

means that u can hold either an integer or a character, but not both. The compiler must allocate enough space for a union to hold the largest member. Unions are only used in one place in the course (for the definition of a message as a union of several different structures).

Pointers are used to hold machine addresses in C. They are very heavily used. An asterisk is used to indicate a pointer in declarations. The declaration

int i, *pi, a[10], *b[10], **ppi;

declares an integer i, a pointer to an integer pi, an array with 10 elements a and array of 10 pointers to integers b, and a pointer to a pointer to an integer ppi. The exact syntax rules for declarations combining arrays, pointers, and other types are somewhat complex. Fortunately, we shall use only simple declarations.

The follwoing shows a declaration of an array z, of structures, each of which has three members, and integer i, a pointer to a character, cp, and a character, c.

struct table {            /* each structure is of type table */
     int i;               /* an integer */
     char *cp, c;         /* a pointer to a character and a character */
} z[20];                  /* this is an array of 20 structures */

Arrays of structures are common in operating systems programming. The name table is defined as the type of the structure, allowing struct table to be used in declarations to mean this structure. For example,

struct table *p;

declares p to be a pointer to a structure of type table. During execution, p might point, for example, to z[4] or to any of the other elements of z, all 20 of which are structures of type table.

To make p point to z[4], we would write

p = &z[4];

where the ampersand as a unary (monadic) operator means "take the address of what follows." To copy to the integer variable n the value of member i is the structure pointed to by p we would write

n = p->i;

Note that the arrow is used to access a member of astructure via a pointer. If we were to use z itself, we would use the dot operator:

n = z[4].i;

The difference is that z[4] is a structure, and the dot oerator selects members from structures. With pointers, we are not selecting a member directly. The pointer must first be followed to find the structure; only then can a member be selected.

It is sometimes convenient to give a name to a constructed type. For example,

typedef unsigned short int unshort;

defines unshort as the type of unsigned short integers. It can be used as though it were a basic type. For example,

unshort u1, *u2, u3[5];

declares an unsigned short integer, a pointer to an unsigned short integer, and an array of unsigned short integers.

Statements

Procedures in C contain declarations and statements. We have already seen the declarations, so now we will look at the statements. The assignment, if, and while statements are essentially the same as in other languages. The following shows some examples of them.

if (x < 0) x = 3;       /* a simple if statement */

if (x > y) {            /* a compound if statement */
     j = 2;
     k = j + 1;
}

if (x + 2 < y) {        /* an if-else statement */
     j = 2;
     k = j - 1;
} else {
     m = 0;
}

while (n > 0) {         /* a while statement */
     k = k + k;
     n = n - 1;
}

do {                    /* another kind of while statement */
     k = k + k;   
     n = n - 1;
} while (n > 0);

The only points worth making are that braces are used for grouping compound statements, and the while statement has two forms, the second of which is similar to Pascal's repeat statement.

C also has a for statement, but this is unlike the for statement in some other languages. It has the general form

for (initializer; condition; expression) statement;

The meaning of the statement is

initializer;
while (condition) {
     statement;
     expression;
}

As an example, consider the statement

for (i = 0; i  n; i = i + 1) a[i] = 0;

This statement sets the first n elements of a to zero. It starts out by initializing i to zero (outside the loop). Then it iterates ar long as i < n, executing the assignment and incrementing i. The statement can, of course, be a compound statement enclosed in braces, rather than just as simple assignment, as is shown here.

C has a construction that is similar to Pascal's case statment. It is called a switch statement. The following is an example.


Depending on the value of the expression following the keyword switch, one clause or another is chosen. If the expression does not match any of the cases, the default clause is selected. If the expression does not match any of the cases and no default is present, control just continues with the next statement after following the switch.

One thing to note is that after one of the cases has been executed, control just continues with the next one, unless a break statement is present. In practice, the break is almost always needed.

The break statement is also valid inside for and while loops, and when executed causes control to exit the loop. If the break statement is located in the innermost of a series of nested loops, only one level is exited.

A related statement is the continue statement, which does not exit the loop, but causes the current iteration to be termineated and the next iteration so start immediately. In effect, it is a jump back to the top of the loop.

C has procedures, which may be called with or without parameters. It is not permitted to pass arrays or procedures directly as parameters, but parameters can be pointers.

The name of an array, when written without a subscript, is taken to mean a pointer to the first element of the array, making it easy to pass an array pointer. Thus if a is the name of an array of any type, it can be passed to a procedure g by writing

g(a);

This rule holds only for arrays not strutures.

Procedures can return values by executing the return statement. This statement may provide an expression to be returned as the value of the procedure, but the caller may safely ignore it. If a procedure returns a value, the type of the value is written before the procedure name, as shown below. As with parameters, procedures may not return arrays, structures, or procedures, but may return pointers to them. This rule is designed to make the implementation efficient – all parameters and results always fit in a single machine word. Compilers that allow structures as parameters usually allow them as return values as well.

int sum(int i,              /* this procedure returns an integer */
          int j) {          /* formal paramters declared in parentheses */
     return i + j;          /* return the sum of the parameters */
}

C does not have any built-in input/output statements. I/O is done by calling library procedures, the most common of which is illustrated below:

print("x = %d  y = %d  z = %x\n", x, y, z);

The first parameter is a string of characters between quotation marks (it is actually a character array). Any character that is not a percent is just printed as is. When a percent is encountered, the next parameter is printed, with the letter following the percent telling how to print it:

d – print as a decimal integer
u – print as an unsigned decimal integer
x – print as a hexadecimal integer
s – print as a string
c – print as a single character

Expressions

Expressions are constructed by combining operands and operators. The arithmetic operators, such as + and -, and the relational operators, such as < and >, are similar to their counterparts in other languages. The % operator is used for modulo. It is worth noting that the equality operator is == and the not equals operator is !=. To see is a and b are equal, one can write

if (a == b) statement;

C also allows assignment and operators to be combined, so

a += 4;

means the same as

a = a + 4;

The other operators may also be combined in this way.

Operators are provided for manipulating the bits of a word. Both shifts and bitwise Boolean operations are allowed. The left and right shift operators are << and >> respectively. The bitwise Boolean operators &, | and ^ are AND, INCLUSIVE OR and EXCLUSIVE OR, respectively. If i has the value 0x1d, then the expression i & 0x6 has value 0x4 (hexadecimal). As another example, it i is 7, then

j = (i << 3) | 0xc;

assigns 0x3c to j.

Another important group of operators is the unary operators, all of which take only one operand. As a unary operator, the amersand takes the address of a variable. Thus &i has the value of the machine location at which i is located. If p is a pointer to an integer and i is an integer, the statement

p = &i;

computes the address of i and stores it in the variable p.

The opposite of taking the address of something (e.g., to put it in a pointer) is taking a pointer as input and computing the value of the thing pointed to. If we have just assigned the address of i to p, then *p has the same value as i. In other words, as a unary operator, the asterisk is followed by a pointer (or an expression yielding a pointer), and yields the value of the item pointed to. If i has the value 6, then the statement

j = *p;

will assign 6 to j.

The ! operator returns 0 if its operand is nonzero, and 1 if its operand is 0. It is primarily used in if statements, for example

if (!x) k = 8;

checks the value of x. If x is zero (false), k is assigned the value 8. In effect, the ! operator negates the condition followinit, just as the 'not operator does in Pascal.

The ~ operator is the bitwise complement operator. Each 0 bit in its operand becomes a 1 and each 1 becomes a 0. In fact, this is the ones complement of the operand.

The sizeof operator tells how big its operand is, in bytes. If applied to an array of 20 integers, a, on a machine with 4-byte integers, for example, sizeof a will have the value 40. When applied to a structure, it tells how big the structure is.

The last group of operators are the increment and decrement operators. The statement

p++;

means increment p. How much it is incremented by depends on its type. Integers and characters are incremented by 1, but pointers are incremented by the size of the object pointed to. Thus if a is an array of structures, and p is a pointer to one of these structures, and we write

p = &a[3];

to make p point to one of the structures in the array, then after we increment p it will point to a[4] no matter how big the structures are. The statement

p--;

is analogous, except that it decrements instead of incrementing.

In the assignment

n = k++;

where both variables are integers, the original value of k is assigned to n and then the increment happens. In the assignment

n = ++k;

first k is incremented, then its new value is stored in n. Thus the ++ (or --) operator can be written either before or after its operand, with different meanings.

One last operator is the ? operator, which selects one of two alternatives separated by a colon. For example,

i = (x < y ? 6 : k + 1);

compares x to y. If x is less than y, then i gets the value 6; otherwise, it gets the value k + 1. The parentheses are optional.

Program structure

A C program consists of one or more files containing procedures and declarations. These files can be separately compiled, yielding separate object files, which are then linked together (by the linker) to form an executable program. Unlike Pascal, procedure declarations may nor be nested, so they all appear at the "top level" in the file.

It is permitted to declare variables outside proceures, for example, at the beginning of a file before the first procedure declaration. These variables are global, and can be used in any procedure in the whole program, unless the keyword static precedes the declaration, in which case it is not permitted to use the variables in another file. The same rules apply to procedures. Variables declared inside a procedure are local to the procedure in which they are declared.

A procedure may acceess an integer variable, v, declared in a file other than its own (provided that the variable is not static), by saying

extern int v;

The extern declaration merely serves to tell the compiler what type the variable has; no storage is allocated by extern declarations. Each global variable must be declared exactly once without the attribute extern, in order to allocate storage for it.

Variables may be initialized, as in

int size = 100;

Arrays and structures may also be initialized. Global variables that are not explicitly initialized get the default value of zero.

The C preprocessor

Before a source file is even given to the C compiler, it is automatically run through a program called the preprocessor. The preprocessor output, not the original program, is what is fed into the compiler. The preprocessor carries out three major transformations on the file before giving it to the compiler:

  1. File inclusion.
  2. Macro definition and expansion.
  3. Conditional compilation.

Preprocessor directives all begin with a number sign (#) in column 1.

When a directive of the form

#include "file.h"

is encountered by the preprocessor, it bodily includes the file, line by line, in the program given to the compiler. When the directive is written as

#include <file.h>

the directory /usr/include rather than the working directory is searched for the file. It is common practice in C to group declarations used by several files in a header file (usually with the suffix .h), and include them where they are needed.

The preprocessor also allows macro definitions. For example,

#define BLOCK_SIZE 1024

defines the macro BLOCK_SIZE and gives it the value 1024. From that point on, every occurrence of the 10-character string "BLOCK_SIZE" in the file will be replaced by the 4-character string "1024" before the compiler sees the file. All that is happening here is that one character string is being replaced by another one. By convention, macro names are written in upper case. Macros can have parameters, but in practice few of them do.

The third proprocessor feature is conditional compilation. There are several places where the code is special for the Cortex-M0, and should not be used when compiling for a different CPU. These sections look something like this:

#ifdef CORTEX_M0
     statements for the Cortex-M0 only
#endif

If the symbol CORTEX_M0 is defined at the time, the statements between the two preprocessor directives are included in the preprocessor output; otherwise they are omitted. By calling the compiler with the command

gcc -c -DCORTEX_M0 prog.c

or by including in the program the statement

#define CORTEX_M0

we force the symbol CORTEX_M0 to be defined, hence all the Cortex-M0 dependent code to be included. As the program evolves, it may acquire special code for the MIPS and other processors, which would also be handled like this. As an example of what the preprocessor does, consider the following program.

#include prog.h

void main(void) {
     int a[MAX_ELEMENTS];

     x = 4;
     a[x] = 6;

#ifdef CORTEX_M0
     printf("Cortex-M0, a[x]=%d\n", a[x]);
#endif

#ifdef MIPS
     printf("MIPS, x=%d\n", x);
#endif
}

It includes one file, prog.h, whose contents are as follows:

int x;
#define MAX_ELEMENTS 100

Imagine that the compiler has been called with the command

gcc -c -DCORTEX_M0 file.c

After the file has been run through the preprocessor, the output is as follows.

int x;

void main(void) {
     int a[100];

     x = 4;
     a[x] = 6;

     printf("Cortex-M0, a[x]=%d\n", a[x]);

}

It is this output, not the original file, that is given as input to the C compiler.

Notice that the preprocessor has done its job and removed the lines starting with the # sign. If the compiler had been compiled with

gcc -c -DMIPS file.c

the other print statement would have been included. If it had been called with

gcc -c file.c

neither print statement would have been included. (We will leave it up to the reader to speculate about what would have happened if the compiler had been called with both -D flags.)

Idioms

In this section we look ar a few consructions that are characteristic of C, but are not common in other programming languages. As a starter, consider the loop

while (n--) *p++ = *q++;

The variables p and q are typically character pointers, and n is a counter. What the loop does is copy an n-character string from the place pointed to by q to the place pointed to by p. On each iteration of the loop, the counter is decremented, until it gets to 0, and each of the pointers is incremented, so they successively point to higher numbered memory locations.

Another common construction is

for (i = 0; i < N; i++) a[i] = 0;

which sets the first N elements of a to 0. An alternative way of writing this loop is

for (p = &a[0]; p < &a[N]; p++) *p = 0;

In this formulation, the integer pointer, p, is initialized to point to the zeroth element of the array. The loop continues as long as p has not reached the address of a[N], which is the first element that is too far. On each iteration, a different element is set to 0. The pointer construction is (with some compilers) much more efficient than the array construction, and is therefore commonly used.

Assignments may appear in unexpected places. For example,

if (a = f(x)) statement;

first calls the function f, the assigns the result of the function call to a, and finally tests a to see if it is true (nonzero) or false (zero). If a is nonzero, the statement is executed. The statement

if (a = b) statement;

is similar, in that it assigns b to a and then tests a to see if it is nonzero. It is totally different from

if (a == b) statement;

which compares two variables and executes the statement if they are equal.


  1. Such casts of parameters are less necessary when we have function prototypes.