Programmer's Python Data - Byte Manipulation
Written by Mike James
Monday, 05 June 2023
Article Index
Programmer's Python Data - Byte Manipulation
Multibyte Shifts

Bytes are at the most primitive of data type and hence universal but can you manipulate them? Find out how it all works in this extract from my new book Programmer's Python: Everything is Data.

## Is now available as a print book: Amazon

#### Contents

1. Python – A Lightning Tour
2. The Basic Data Type – Numbers
Extract: Bignum
3. Truthy & Falsey
4. Dates & Times
5. Sequences, Lists & Tuples
Extract Sequences
6. Strings
Extract Unicode Strings
7. Regular Expressions
8. The Dictionary
Extract The Dictionary
9. Iterables, Sets & Generators
Extract  Iterables
10. Comprehensions
Extract  Comprehensions
11. Data Structures & Collections
Extract Stacks, Queues and Deques ***NEW!!!
12. Bits & Bit Manipulation
Extract Bits and BigNum
13. Bytes
Extract Bytes And Strings
Extract Byte Manipulation
14. Binary Files
15. Text Files
16. Creating Custom Data Classes
Extract A Custom Data Class
17. Python and Native Code
Extract   Native Code
Appendix I Python in Visual Studio Code
Appendix II C Programming Using Visual Studio Code

<ASIN:1871962765>

<ASIN:1871962749>

<ASIN:1871962595>

<ASIN:B0CK71TQ17>

<ASIN:187196265X>

#### In chapter but not in this extract

• Bytes
• Bytes and Bytearray
• Bytes As Strings
• Decode Encode

## Byte Manipulation

The need to perform bit manipulation on multiple bytes is a common requirement. There are two ways to approach this problem. We could convert the bytes to a single bignum representation, perform the bitwise operation and then convert back. Alternatively we could process the sequence directly, using for loops, to produce a new sequence.

If you want to convert a byte sequence to a bignum you can use the from_bytes class method:

`int.from_bytes(bytes, byteorder =, signed = False)`

where bytes is a bytes or bytearray object and byteorder determines the order in which the bytes are to be used to create the integer and can be set to big or little.

This matter of order is something we have been able to ignore up to this point, but no longer. The problem is, where is the most significant byte – at the start of the sequence or at the end? This is the well known “endian” problem and it is a fundamental choice in computer architecture. Bytes, or groupings of bytes, are generally stored in a single memory location, but to make use of them you generally have to assemble them into a single bit pattern and there are two ways of doing this – big first or little first. For example, consider:

`myBytes=bytes([0xAA,0x55])`

as a possible representation of a two-byte integer. Our two choices are to take the first element as the most significant byte:

`myBytes[0]+myBytes[1] = 0xAA55`

this is big endian or we could take the last element as the most significant byte:

`myBytes[1]+myBytes[0] = 0x55AA`

which is little endian. You can see that the selection of big or little endian produces two very different integer values and two very different bit patterns.

The endian problem occurs whenever you have to put a sequence of bytes, or other discrete bit patterns, together to form a larger bit pattern. For example:

```myBytes = bytes([0xFF,0xAA,0x55])
bits = int.from_bytes(myBytes,byteorder = 'big')
print(hex(bits))```

displays:

`0xffaa55`

and changing to byteorder = ’little’ displays:

`0x55aaff`

If you want to use the byte order that the current machine uses for its memory access then specify byteorder = sys.byteorder

To convert the bignum back to a bytes object you can use the to_bytes int method:

`to_bytes(length ,byteorder =,signed = False)`

again you have to specify the byteorder and the number of elements in the bytearray. For example:

```myBytes=bits.to_bytes(3,byteorder='big')
print(myBytes)```

displays:

`b'\xff\xaaU'`

The need to specify the number of elements in the array is irritating because if you get it wrong and the integer cannot be represented in the number of elements it generates an exception. To generate as many elements as needed you can use the int method bit_length that returns the number of bits stored in the bignum. To convert this into the number of bytes needed to accommodate this number of bits we can use:

(bit_length()+7)//8

Using this we can rewrite the previous example as:

`myBytes = bits.to_bytes((bits.bit_length()+7)//8,                                  byteorder = 'big')`

Finally we have to deal with the problem of negative values. In most cases you can ignore this because you are only interested in working with bit patterns and, in general, bit patterns are usually extended using zero bits. The only time this is not the case is if the bit pattern really is an integer value in two’s complement form.

When converting from bytes to bignums, setting the signed parameter to True has the same effect as putting a minus sign in front of the value, i.e. it sets the sign bit to 1. As a side effect it will also appear to remove any leading ones from the value as these are treated as negative sign bits. For example:

```myBytes=bytes([0xFF,0xAA,0x55])
bits=int.from_bytes(myBytes,byteorder='big',signed=True)
print(hex(bits))```

displays:

`-0x55ab`

which, in two's complement, is equivalent to:

`FFFF AA55`

with as many leading ones as required by the operation. Notice that the bit pattern isn’t actually changed when stored in the bignum, it simply sets the sign bit.

Going the other way, from an integer to a bytes object works in much the same way, but if you try to convert a negative integer without signed = True an exception occurs because negative integers have to be treated as two's complement. For example:

```bits=-1
myBytes=bits.to_bytes(1,byteorder='big',signed=True)
print(myBytes.hex())```

displays ff as -1 is ff in two's complement.

In most cases when doing byte manipulation you can ignore problems with negative numbers because you can treat everything as positive integers.

Last Updated ( Monday, 05 June 2023 )