Using the C String Library

ANSI compatible implementations for creating, searching, tokenizing and comparing character arrays.

© Guy Lecky-Thompson

The Source, sxc.hu

In this C string library tutorial, we explore the C functions for string handling, ways in which they can be used, and some caveats linked to their use.

Introduction

Before you begin, it might be worthwhile to take a quick look at the C Header File tutorial to refresh your memory regarding the use of C header files and libraries. This particular tutorial covers the use of the string.h library provided by most ANSI compatible C implementations.

Also of note might be the General Programming series, for those not totally familiar with programming concepts in general. For example, it is in this set that we first described the string as a collection (array) of characters.

As readers familiar with C will know, there is no built-in way to deal with strings, unlike with numbers and other built-in data types. However, the string library gives the programmer access to all the functionality that they might need in dealing with strings.

While we cover the most frequently used functions, the reader is encouraged to delve into the string.h file for themselves and experiment. It can usually be found on the /include path of your compiler setup.

Creating Strings

A string can be defined as:

char szString[100];

Or

char * szString;

The difference between the two is that in the first case, we have specified the length to C beforehand. In the second we have merely defined a pointer to a memory block (that does not yet exist) containing character data.

Populating that data block is covered in a later tutorial (on memory handling), so for now we shall assume that the reader has defined their strings as arrays of characters. For the basis of this discussion, it makes no difference which representation has been chosen : so long as the last character is the null terminator (the constant '0x0' or '\0').

Two strings can be concatenated using the strcat function:

strcat ( char * szOne, char * szTwo );

The result in szOne is szOne + szTwo, provided that there is enough space in szOne to contain the resulting string. Should there not be, the string will be clipped and a null terminator inserted at the appropriate point (the end).

Comparing Strings

Two strings can be compared using the strcmp function:

strcmp ( char * szOne, char * szTwo );

The result will be 0 if the two strings are equal, more than 0 if szOne is 'greater' than szTwo and less than 0 if szOne is 'less than' szTwo. The exact definition of greater than and less than in this context is slightly different from traditional alphabetizing techniques.

In principle, this centers around the treatment of capital letters, some punctuation and numbers. To a certain extent this effect can be mitigated by using strcmpi, which behaves in the same way as strcmp, except that the comparison is case insensitive, treating 'a' and 'A' as identical, for example.

Searching Strings

To find out if one string is a substring of the other, we use the strstr function:

strstr ( char * haystack, char * needle );

This function returns a pointer (char *) to the occurrence of needle in the haystack, or NULL if the substring could not be matched. If we perform the same function call using that pointer, incremented by one, as the haystack, we can get the next occurrence (if any) in the string:

char * szFound;
szFound = strstr( szMyString, szMyNeedle );
while (szFound != NULL)
{
// .. do something with szFound ..
// Get the next occurrence
szFound = strstr( szFound + 1, szMyNeedle );
}

The above snippet will find all the needles in the haystack, without destroying it. The following function is not quite so damage-averse.

Tokenizing

The act of tokenizing takes a source string, and breaks it down into a series of substrings based on searching for a separator. For example, given a comma separated line of data:

one, two, three, four

We can break this down into four values by using the strtok function:

strtok ( char * szString, char * szSeparators )

It needs to be called in two different ways - first with the string that we wish to tokenize in the first argument, and the separators in the second. This sets up the string environment and returns a pointer to the first item.

Subsequent calls should only contain NULL in the first argument, and the separators in the second. In between calls, the original string becomes altered, so it is wise to perform the function on a copy of the original.

This is easy to understand with an example:

char * szToken;
szToken = strtok ( "one, two, three, four", "," );
while (szToken != NULL)
{
// .. do something with szToken ..
szToken = strtok( NULL, "," );
}

Other String Library Functions

There are many other variants on these themes. For example, most also come with an case insensitive version. However, there are some miscellaneous functions worth mentioning here:

strlen ( char * szString )
returns the length of szString
strchr ( char * szString, char cChar )
returns a pointer to the first cChar in szString
strrchr (char * szString, char cChar )
returns a pointer to the last cChar in szString

If there are any specific functions that the reader would like an extended explanation on, then please feel free to start a discussion. After all, these are designed to be interactive tutorials!


The copyright of the article Using the C String Library in Computer Programming is owned by Guy Lecky-Thompson. Permission to republish Using the C String Library must be granted by the author in writing.



Comments
May 5, 2008 3:43 AM
Guest :
strcpy()
returns the copied result of string1 to string2.
Page:
1 Comment:

Post this Article to facebook Add this Article to del.icio.us! Digg this Article furl this Article Add this Article to Reddit Add this Article to Technorati Add this Article to Newsvine Add this Article to Windows Live Add this Article to Yahoo Add this Article to StumbleUpon Add this Article to BlinkLists Add this Article to Spurl Add this Article to Google Add this Article to Ask Add this Article to Squidoo