Extending & Embedding Python Using C - Arguments
Written by Mike James   
Tuesday, 03 February 2026
Article Index
Extending & Embedding Python Using C - Arguments
Data Formats Numbers
Py_Buffer
Sequences

Py_Buffer

The format specifiers that return a Py_Buffer are:

 

Python Type

S*

str or bytes-like object

z*

str, bytes-like object or None

y*

bytes-like object

w*

read-write bytes-like object

Although the Python type is listed as a string or bytes object, all that really matters is that the object supports the buffer protocol. Python strings are converted to UTF-8 C strings.

Returning a Py_buffer struct is very similar to simply providing a pointer to the internal buffer that the object uses. A Py_buffer struct wraps the internal buffer by providing additional information on how the buffer is organized. This additional information goes far enough to even allow the client program to modify the internal buffer.

That is, the buffer protocol provides enough information to use the internal buffer without invalidating it. The buffer structure can be very complicated with different parts of it stored in different locations, possibly with 2D structures for image storage, but most of the time the only important field is len which tells you how long the buffer is. This allows the buffer to store arbitrary byte values even if it originated as a null-terminated string. The actual data is pointed at by the buf field.

For example you can convert the previous function to work with a Py_Buffer:

static PyObject* stringPBuffer(PyObject *self, 
PyObject *args) { Py_buffer myBuf; if (!PyArg_ParseTuple(args, "s*", &myBuf)) return NULL; char *myString=(char*) myBuf.buf; printf("%s\n",myString); for(int i=0;i< myBuf.len;i++){ printf("%02X ",(unsigned char)myString[i]); } PyBuffer_Release(&myBuf); Py_RETURN_NONE; }

Notice that we convert the pointer to the internal buffer myBuf.buf which is a pointer to void to a pointer to char so that we can treat it as a C string. However, the for loop makes use of the buffer’s length rather than a null termination. There is no exception if the string includes a NULL. For example:

example.stringPBuffer("Hello \0 World")

displays:

Hello 
48 65 6C 6C 6F 20 00 20 57 6F 72 6C 64 

The encoding of the internal buffer depends on the object that supports the buffer protocol. In the case of a string the encoding is UTF-8 and for a bytes or byte array object it is just a series of bytes.

We also have to use PyBuffer_Release to let the Python object know we have finished using the buffer. The client program “owns” the Py_buffer struct, but the Python object continues to own and look after the internal buffer.

Encoders

The encoder format specifiers are: 

 

Python Type

CType

es

str

const char *encoding, char **buffer

et

str, bytes or bytearray

const char *encoding, char **buffer

es#

str

const char *encoding, char **buffer, Py_ssize_t *buffer_length

et#

str, bytes or bytearray

const char *encoding, char **buffer, Py_ssize_t *buffer_length

The # versions allow for embedded NULLs and return the length of the buffer and the et versions don’t perform any encoding, they simply assume that the data is already in the specified coding and return the raw data.

The encoding converters are in some senses the most complex of the conversion formats, but they also return the simplest data structure – a raw C buffer. To understand how they work you need to know about encodings in general and this is covered in detail in Chapter 10.

All Python strings can represent any of the huge range of Unicode characters. The problem is that not everything uses pure Unicode and there are a number of different ways of encoding the same information. For example, UTF-8 uses a variable number of bytes to represent the full range of Unicode. Before Unicode there was a system of “code pages” which reused the single byte range of extended ASCII to encode multiple characters. Python supports all of the old Window code pages and their ANSI standardizations.

To convert a Unicode sequence into a particular encoding you can use one of the “e” format specifiers. For example, es converts a string to an encoded C string using the encoding specified by the first variable in the parameter list. That is:

"es", "cp1252", &name

encodes the string according to cp1252 which is Code Page 1252, i.e. the Latin code page for Windows and stores a pointer to the result in name. The result is stored in a *char buffer which is allocated by the API. You have to tell the Python API to free the buffer when you are finished using it by calling PyMem_Free().

A function to encode a string or byte sequence into the Latin code page is:

static PyObject *stringEncode(PyObject *self, 
PyObject *args) { char *name = NULL; if (!PyArg_ParseTuple(args, "es", "cp1252", &name)) return NULL; printf("%s \n", name); for (int i = 0; name[i]; i++) { printf("%02X ", (unsigned char)name[i]); } PyMem_Free(name); Py_RETURN_NONE; }

If you try this out using:

example.stringEncode("Hello World")

then you will see “Hello World” followed by the usual ASCII codes for each character. This isn’t a good test as the encoding doesn’t change any of the ASCII characters into anything else. To see that the encoding is actually changing something, try:

example.stringEncode("\u2020")

Unicode character 0x2020 is †, i.e. a dagger symbol, and this doesn’t occur in the ASCII code, but it is character 0x86 in the Latin code page. If you run the program above you will probably see:

åå

86

The 86 corresponds to the code for the dagger in the Latin code page – the character you see printed depends on what code page the editor’s terminal is using. In the example above it is the Windows terminal which uses code page 850 Latin-1 by default and code 0x86 is “Lower case a with ring above”. You can see that the Unicode dagger has been converted to the correct code page code, but what you actually see depends on what code page or Unicode the terminal is set to.

This is confusing and it is the reason Unicode and encodings such as UTF-8 have become standard.

If a Unicode symbol doesn’t have a representation within the selected code page then an exception occurs. If you want to substitute a “character not found” symbol you need to handle the exception.

To make use of other Python encoders you may have to swap to a format that allows NULLs. For example, to encode a string to UTF-16, a variable length encoding using 16-bit words, you have to use the function with es#:

static PyObject *stringEncode2(PyObject *self, 
PyObject *args) { char *name = NULL; Py_ssize_t len; if (!PyArg_ParseTuple(args, "es#", "utf-16", &name,
&len)) return NULL; printf("%s \n", name); for (int i = 0; i<len; i++) { printf("%02X ", (unsigned char)name[i]); } PyMem_Free(name); Py_RETURN_NONE; }

In this case you also have to supply a variable to record the length of the string.

When using es# you have the option of preallocating the buffer before calling the PyArg_ParseTuple function. In this case the buffer will be filled with the data and you are responsible for freeing it. For example:

static PyObject *stringEncodeAllocate(PyObject *self,
                                        PyObject *args)
{
  Py_ssize_t len = 25;
  char *name = malloc(sizeof(char) * (len + 1));
  if (!PyArg_ParseTuple(args, "es#", "cp1252",
&name, &len)) return NULL; printf("%d\n", (int)len); printf("%s \n", name); for (int i = 0; i < len; i++) { printf("%02X ", (unsigned char)name[i]); } free(name); Py_RETURN_NONE; }

Notice that we now use free rather than PyMem_Free as Python doesn’t own the buffer. The len parameter gives the number of characters in the string and it is null-terminated even if it has embedded NULLs. If the buffer isn’t big enough an exception is raised.

For more details on how to work with Unicode strings see Chapter 10.



Last Updated ( Tuesday, 03 February 2026 )