Fundamental C - Simple Strings

Written by Harry Fairhead

Sunday, 08 December 2019

Article Index
Fundamental C - Simple Strings
String Handling Functions
Buffer Overflow

Page 3 of 3

Buffer Overflow

Usually the discussion of avoiding string or more generally array or buffer overflow ends at this point, but in the real world things are more complex.

When programming in C you need to be aware of where array data comes from. In general, the arrays and strings that you create and consume are safe enough because you know their size and can ensure that they are null-terminated where necessary. This means that you can use null-terminated string functions if you want to. This includes any functions you might write - you don’t have to further protect a function that you know only you are going to use and in a responsible manner.

Where things get dangerous is when data is generated externally – user input, network data, file data, or anything that is not originated by your program. In this case you have to put an upper limit on the number of items of data you are prepared to accept. This usually means using strn functions as opposed to str functions and more generally specifying array sizes. It also means that you have to remember to check that an array access is within the array bounds for every access – this is inefficient but safe.

Of course, in the real world implementing such strategies is always much harder. For example, consider the network data problem. You set up an array to accept data from a device or a service which normally sends you 500 bytes. However, you have no guarantee that network problems or exceptional circumstances might not force it to send 750 bytes or more. In an ideal world you would simply allocate an array so huge that overrunning it was unlikely – you would still have to check that it wasn’t overrun, however. In the real world you generally can’t afford that sort of memory allocation, especially on small devices. So what can you do?

In most case the best solution is to divide the transaction into packets of data. Read in the first 500 bytes, process it and see if there is any more. In this way you can safely reuse the 500 bytes you have allocated to the array and still not miss any data that goes beyond this limit. Of course, any processing that you do has to be fast enough so that you can carry on reading the next 500 bytes without missing any data or even aborting the connection due to a timeout.

The exact details of implementing the repeated use of a small buffer to read in large amounts of data varies according to how the data transfer protocol works and what is to be done with the data, but a for loop and a test of the end of the data is generally what is required. In the case of limited resources it is often necessary to trade code for memory.

In the book but not in this extract:

Convert to String - sprintf
Input – Buffer Problems
Low-level I/O
A Safe Way To Do Input – String Conversion

Summary

Strings are null-terminated char arrays.
A char is generally a single byte and it is the smallest of the integer types.
The character code used depends on the operating system, but you can generally assume that you are working with UTF-8 restricted to a single byte which is functionally equivalent to ASCII.
C doesn’t currently handle Unicode well.
You can initialize a string using a string literal, but you cannot assign a string literal to a string.
Always make sure that the array has enough elements to hold the string and its null terminator. If the string has n characters the array has to have at least n+1 elements.
A string variable is a pointer to the first element of the string and behaves like a standard array.
There are no native string operators or functions in C but the standard library has a comprehensive set of string functions.
All string operations work by using a for loop to scan the string and stop when it reaches the null terminator.
If the null terminator is missing then most string operations will overrun the array.
There are alternative safe string functions which allow you to specify the maximum number of characters to be processed. Used correctly these protect you against array overrun.
String overflow is generally easy to control when all of the strings involved are generated by your program. Things are much more difficult when strings are input from external sources.
Printf and sprintf can be used to convert integer and floating point types to human readable string representations.
Operating system buffers make interactive I/O using scanf difficult.
Scanf also has problems in terms of how it applies the format string to the input.
In many cases the only solution is to use lower-level I/O functions to control the way the characters are converted into numeric data types.

Fundamental C: Getting Closer To The Machine

Now available as a paperback and ebook from Amazon.

About C
Extract Dependent v Independent
& Undefined Behavior
Getting Started With C Using NetBeans
Control Structures and Data
Variables
Extract Variables
Arithmetic and Representation
Extract Arithmetic and Representation
Operators and Expression
Extract: Expressions
Extract Side Effects, Sequence Points And Lazy Evaluation
First Draft of Chapter: Low Down Data
Functions Scope and Lifetime
Arrays
Extract Simple Arrays
Extract Ennumerations
Strings
Extract Simple Strings
Extract: String I/O ***NEW!!
Pointers
Extract Starting Pointers
Extract Pointers, Cast & Type Punning
Structs
Extract Basic Structs
Extract Typedef
Bit Manipulation
Extract Basic Bits
Extract Shifts And Rotates
Files
Extract Files
Extract Random Access Files
Compiling C – Preprocessor, Compiler, Linker
Extract Compilation & Preprocessor

Also see the companion volume: Applying C

Harry Fairhead is the author of Raspberry Pi IoT in C , Micro:bit IoT in C and Fundamental C: Getting Closer to the Machine. His latest book is Applying C For The IoT With Linux.

Remote C/C++ Development With NetBeans

Raspberry Pi And The IoT In C

Getting Started With C/C++ On The Micro:bit

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Z3 Completed This Day In 1941
12/05/2025

On May 12, 1941 Konrad Zuse completed his Z3 computer, the first program-controlled electromechanical digital computer. It followed in the footsteps of the Z1 - the world’s first binary digital [ ... ]

+ Full Story

Rust Celebrates 10 Years Since Version 1.0
17/05/2025

Rust reached the milestone of Version 1.0 becoming generally available on May 15, 2015. Version 1.87 has just been released on the 10th anniversary with a celebratory event in Utrecht during Rust week [ ... ]

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

<< Prev - Next

Last Updated ( Monday, 09 December 2019 )

Recent Articles

Recent Book Reviews

Popular Articles