Programmer's Python Data - Byte Manipulation
Written by Mike James   
Monday, 05 June 2023
Article Index
Programmer's Python Data - Byte Manipulation
Multibyte Shifts

Bytes are at the most primitive of data type and hence universal but can you manipulate them? Find out how it all works in this extract from my new book Programmer's Python: Everything is Data.

Programmer's Python
Everything is Data

Is now available as a print book: Amazon

pythondata360Contents

  1. Python – A Lightning Tour
  2. The Basic Data Type – Numbers
       Extract: Bignum
  3. Truthy & Falsey
  4. Dates & Times
  5. Sequences, Lists & Tuples
       Extract Sequences 
  6. Strings
       Extract Unicode Strings
  7. Regular Expressions
  8. The Dictionary
       Extract The Dictionary 
  9. Iterables, Sets & Generators
       Extract  Iterables 
  10. Comprehensions
       Extract  Comprehensions 
  11. Data Structures & Collections
  12. Bits & Bit Manipulation
         Extract Bits and BigNum ***NEW!!!
  13. Bytes
        Extract Bytes And Strings
        Extract Byte Manipulation 
  14. Binary Files
  15. Text Files
  16. Creating Custom Data Classes
        Extract A Custom Data Class 
  17. Python and Native Code
        Extract   Native Code
    Appendix I Python in Visual Studio Code
    Appendix II C Programming Using Visual Studio Code

<ASIN:1871962765>

<ASIN:1871962749>

<ASIN:1871962595>

<ASIN:187196265X>

In chapter but not in this extract

  • Bytes
  • Bytes and Bytearray
  • Bytes As Strings
  • Decode Encode

Byte Manipulation

The need to perform bit manipulation on multiple bytes is a common requirement. There are two ways to approach this problem. We could convert the bytes to a single bignum representation, perform the bitwise operation and then convert back. Alternatively we could process the sequence directly, using for loops, to produce a new sequence.

If you want to convert a byte sequence to a bignum you can use the from_bytes class method:

int.from_bytes(bytes, byteorder =, signed = False)

where bytes is a bytes or bytearray object and byteorder determines the order in which the bytes are to be used to create the integer and can be set to big or little.

This matter of order is something we have been able to ignore up to this point, but no longer. The problem is, where is the most significant byte – at the start of the sequence or at the end? This is the well known “endian” problem and it is a fundamental choice in computer architecture. Bytes, or groupings of bytes, are generally stored in a single memory location, but to make use of them you generally have to assemble them into a single bit pattern and there are two ways of doing this – big first or little first. For example, consider:

myBytes=bytes([0xAA,0x55])

as a possible representation of a two-byte integer. Our two choices are to take the first element as the most significant byte:

myBytes[0]+myBytes[1] = 0xAA55

this is big endian or we could take the last element as the most significant byte:

myBytes[1]+myBytes[0] = 0x55AA

which is little endian. You can see that the selection of big or little endian produces two very different integer values and two very different bit patterns.

The endian problem occurs whenever you have to put a sequence of bytes, or other discrete bit patterns, together to form a larger bit pattern. For example:

myBytes = bytes([0xFF,0xAA,0x55])
bits = int.from_bytes(myBytes,byteorder = 'big')
print(hex(bits))

displays:

0xffaa55

and changing to byteorder = ’little’ displays:

0x55aaff

If you want to use the byte order that the current machine uses for its memory access then specify byteorder = sys.byteorder

To convert the bignum back to a bytes object you can use the to_bytes int method:

to_bytes(length ,byteorder =,signed = False)

again you have to specify the byteorder and the number of elements in the bytearray. For example:

myBytes=bits.to_bytes(3,byteorder='big')
print(myBytes)

displays:

b'\xff\xaaU'

The need to specify the number of elements in the array is irritating because if you get it wrong and the integer cannot be represented in the number of elements it generates an exception. To generate as many elements as needed you can use the int method bit_length that returns the number of bits stored in the bignum. To convert this into the number of bytes needed to accommodate this number of bits we can use:

(bit_length()+7)//8

Using this we can rewrite the previous example as:

myBytes = bits.to_bytes((bits.bit_length()+7)//8,
byteorder = 'big')

Finally we have to deal with the problem of negative values. In most cases you can ignore this because you are only interested in working with bit patterns and, in general, bit patterns are usually extended using zero bits. The only time this is not the case is if the bit pattern really is an integer value in two’s complement form.

When converting from bytes to bignums, setting the signed parameter to True has the same effect as putting a minus sign in front of the value, i.e. it sets the sign bit to 1. As a side effect it will also appear to remove any leading ones from the value as these are treated as negative sign bits. For example:

myBytes=bytes([0xFF,0xAA,0x55])
bits=int.from_bytes(myBytes,byteorder='big',signed=True)
print(hex(bits))

displays:

-0x55ab

which, in two's complement, is equivalent to:

FFFF AA55

with as many leading ones as required by the operation. Notice that the bit pattern isn’t actually changed when stored in the bignum, it simply sets the sign bit.

Going the other way, from an integer to a bytes object works in much the same way, but if you try to convert a negative integer without signed = True an exception occurs because negative integers have to be treated as two's complement. For example:

bits=-1
myBytes=bits.to_bytes(1,byteorder='big',signed=True)
print(myBytes.hex())

displays ff as -1 is ff in two's complement.

In most cases when doing byte manipulation you can ignore problems with negative numbers because you can treat everything as positive integers.



Last Updated ( Monday, 05 June 2023 )