Before you begin, it might be worthwhile to take a quick look at the C Header File tutorial to refresh your memory regarding the use of C header files and libraries. This particular tutorial covers the use of the string.h library provided by most ANSI compatible C implementations.
Also of note might be the General Programming series, for those not totally familiar with programming concepts in general. For example, it is in this set that we first described the string as a collection (array) of characters.
As readers familiar with C will know, there is no built-in way to deal with strings, unlike with numbers and other built-in data types. However, the string library gives the programmer access to all the functionality that they might need in dealing with strings.
While we cover the most frequently used functions, the reader is encouraged to delve into the string.h file for themselves and experiment. It can usually be found on the /include path of your compiler setup.
A string can be defined as:
Or
The difference between the two is that in the first case, we have specified the length to C beforehand. In the second we have merely defined a pointer to a memory block (that does not yet exist) containing character data.
Populating that data block is covered in a later tutorial (on memory handling), so for now we shall assume that the reader has defined their strings as arrays of characters. For the basis of this discussion, it makes no difference which representation has been chosen : so long as the last character is the null terminator (the constant '0x0' or '\0').
Two strings can be concatenated using the strcat function:
The result in szOne is szOne + szTwo, provided that there is enough space in szOne to contain the resulting string. Should there not be, the string will be clipped and a null terminator inserted at the appropriate point (the end).
Two strings can be compared using the strcmp function:
The result will be 0 if the two strings are equal, more than 0 if szOne is 'greater' than szTwo and less than 0 if szOne is 'less than' szTwo. The exact definition of greater than and less than in this context is slightly different from traditional alphabetizing techniques.
In principle, this centers around the treatment of capital letters, some punctuation and numbers. To a certain extent this effect can be mitigated by using strcmpi, which behaves in the same way as strcmp, except that the comparison is case insensitive, treating 'a' and 'A' as identical, for example.
To find out if one string is a substring of the other, we use the strstr function:
This function returns a pointer (char *) to the occurrence of needle in the haystack, or NULL if the substring could not be matched. If we perform the same function call using that pointer, incremented by one, as the haystack, we can get the next occurrence (if any) in the string:
The above snippet will find all the needles in the haystack, without destroying it. The following function is not quite so damage-averse.
The act of tokenizing takes a source string, and breaks it down into a series of substrings based on searching for a separator. For example, given a comma separated line of data:
one, two, three, four
We can break this down into four values by using the strtok function:
It needs to be called in two different ways - first with the string that we wish to tokenize in the first argument, and the separators in the second. This sets up the string environment and returns a pointer to the first item.
Subsequent calls should only contain NULL in the first argument, and the separators in the second. In between calls, the original string becomes altered, so it is wise to perform the function on a copy of the original.
This is easy to understand with an example:
There are many other variants on these themes. For example, most also come with an case insensitive version. However, there are some miscellaneous functions worth mentioning here:
If there are any specific functions that the reader would like an extended explanation on, then please feel free to start a discussion. After all, these are designed to be interactive tutorials!