Programmer's Python Data - Byte Manipulation

Written by Mike James

Monday, 05 June 2023

Article Index
Programmer's Python Data - Byte Manipulation
Multibyte Shifts

Page 2 of 2

Multibyte Shifts

When you have byte sequences to work with it is tempting to simply write a for loop that processes each byte in turn. However, notice that there is no easy way to implement a shift operation on a bytes or bytearray object as you need to arrange to move bits from one byte to another. On the other hand, implementing a shift on a bignum is a single operation. For example:

myBytes=bytes([0xFF,0xAA,0x55])
bits=int.from_bytes(myBytes,byteorder="big")
bits=bits>>4
print(bits.to_bytes(3,byteorder="big"))

displays:

b'\x0f\xfa\xa5'

which, as you can see, has shifted the low four bits of each byte into the high four bits of the next byte.

Doing this without converting to bignums is a difficult task involving masking out and shifting the low-order bits of the previous byte to become the high-order bits of the current byte. For example, to implement a shift right of four bits:

myBytes1=bytearray([0xFF,0xAA,0x55])
myBytes2=bytearray(3)
for i in range(len(myBytes1)):
    myBytes2[i]=myBytes1[i]>>4
    if i>0:
        myBytes2[i]=myBytes2[i]|
                      ((myBytes1[i-1]<<4)&0xF0)
print(myBytes2)

In most cases it is preferable to convert to bignums.

One-Time Pad

As an example of this approach to byte manipulation consider the common task of XORing a set of random bits with a bit pattern. The reason you might want to do this is to encrypt the data. This is a very secure code usually known as a “one-time pad”. You can recover the original data by simply performing the XOR operation a second time as (x ^ y)^y is x. This doesn’t sound very secure, but to decode it you need the random bits to perform the XOR the second time – without the one-time pad it is impossible to recover the original text.

Start with a suitable message as an ASCII string:

myBytes = b"Hello World Of Secrets"

which could have been in the form of a Unicode string converted to an ASCII string. Next we need a one-time pad:

oneTime = int.from_bytes(random.randbytes(len(myBytes)),
                                      byteorder="big")

To understand this you need to know that:

random.randbytes(len(myBytes))

generates the specified number of random bytes as a bytes object. We then use the from_bytes method to create the bignum oneTime with the same bit pattern. To XOR the message with the oneTime pad we need to convert the ASCII string to a bignum:

msg = int.from_bytes(myBytes,byteorder="big")

Now we have both bit patterns as bignums and so can perform the XOR:

crypt=msg ^ oneTime

To decrypt the message we just need to repeat the XOR:

decrypt=crypt ^ oneTime

and to see it we need to convert it back to an ASCII string:

decrypt=decrypt.to_bytes((decrypt.bit_length()+7)//8,byteorder="big")

Putting all this together, and adding some print instructions gives:

import random
myBytes=b"Hello World Of Secrets"
oneTime=int.from_bytes(random.randbytes(len(myBytes)),
                                       byteorder="big")
msg=int.from_bytes(myBytes,byteorder="big")
crypt=msg ^ oneTime
print(hex(crypt))
decrypt=crypt ^ oneTime
decrypt=decrypt.to_bytes((decrypt.bit_length()+7)//8,
                                       byteorder="big")
print(decrypt)

Of course, in a real application the one-time pad would be available at another site and the encoded message would be transmitted between them securely – usually a difficult task. The one-time pad may be uncrackable, but it isn’t convenient.

How would you implement this using direct operations on the byte sequences? The most obvious way to a programmer used to for loops in other languages would be to use a loop index:

crypt=bytearray(len(msg))
for i in range(len(msg)):
    crypt[i]=msg[i]^ oneTime[1]
print(crypt)

Notice that you need to use a bytearray and not a bytes object because of the need to modify it in-place.

A more Pythonic approach would be to use a comprehension:

crypt= bytes([a^b for a, b in zip(msg,oneTime)])

This is more compact and arguably easier to understand, but only if you are happy with comprehensions, the zip function, tuples, destructuring and the bytes constructor. In principle it also has the potential to be faster than the index loop approach, but this does depend on the quality of the compiler or interpreter in use.

A complete program using comprehensions is:

import random
msg=b"Hello World Of Secrets"
oneTime=random.randbytes(len(msg))
crypt= bytes([a^b for a, b in zip(msg,oneTime)])
print(crypt)
decrypt= bytes([a^b for a, b in zip(crypt,oneTime)])
print(decrypt)

In chapter but not in this extract

The Array
Memoryview

Summary

Working with bit patterns is fundamental, but you generally have to work with bytes or some other larger unit of storage.
Working with a byte sequence is possible using the bytes object which is immutable or a bytearray which is mutable.
Both the bytes and bytearray objects can be thought of as ASCII strings and have many of the same methods as strings.
A bytes literal is distinguished from a string by a leading b and contains ASCII characters and escape codes for values above 127.
You can also create bytes objects and bytearrays using an iterable that provides integers in the correct range.
The encode method takes a Unicode string and converts it into a byte sequence using the specified encoding.
The decode method takes a byte sequence and converts it into a Unicode string using the specified encoding.
When trying to manipulate a byte sequence you can opt to convert it to a bignum and then use bitwise operators or you can work byte-by-byte in a for loop.
When working with bytes in groups it matters which order you take them in – big endian takes the most significant byte first and little endian takes the least significant byte first.
Multibyte shifts are difficult to implement because of the way the sign bit has to be treated.
Python has a basic array type in the array module. This supports arrays of basic C arrays.
The memoryview class provides a view into the buffer of any object that supports the buffer protocol.
A memoryview doesn’t make a copy of the original buffer – it simply provides access.
The object that the buffer belongs to can set the type and shape of the buffer in an attempt to make it easier for you to use.
If the object doesn’t set the type and shape of the buffer you can use the cast method to change or set it.

Programmer's Python
Everything is Data

Is now available as a print book: Amazon

Python – A Lightning Tour
The Basic Data Type – Numbers
Extract: Bignum
Truthy & Falsey
Dates & Times
Extract Naive Dates
Sequences, Lists & Tuples
Extract Sequences
Strings
Extract Unicode Strings
Regular Expressions
Extract Simple Regular Expressions
The Dictionary
Extract The Dictionary
Iterables, Sets & Generators
Extract Iterables
Comprehensions
Extract Comprehensions
Data Structures & Collections
Extract Stacks, Queues and Deques
Extract Named Tuples and Counters
Bits & Bit Manipulation
Extract Bits and BigNum
Extract Bit Masks ***NEW!!!
Bytes
Extract Bytes And Strings
Extract Byte Manipulation
Binary Files
Extract Files and Paths
Text Files
Extract Text Files & CSV
Creating Custom Data Classes
Extract A Custom Data Class
Python and Native Code
Extract Native Code
Appendix I Python in Visual Studio Code
Appendix II C Programming Using Visual Studio Code

<ASIN:1871962765>

<ASIN:1871962749>

<ASIN:1871962595>

<ASIN:B0CK71TQ17>

<ASIN:187196265X>

Creating The Python UI With Tkinter

Creating The Python UI With Tkinter - The Canvas Widget

The Python Dictionary

Arrays in Python

Advanced Python Arrays - Introducing NumPy

Comments

or email your comment to: comments@i-programmer.info

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

<< Prev - Next

Last Updated ( Monday, 05 June 2023 )

Multibyte Shifts

One-Time Pad

In chapter but not in this extract

Summary

Programmer's Python
Everything is Data

Is now available as a print book: Amazon

Contents

Related Articles

Comments

Multibyte Shifts

One-Time Pad

In chapter but not in this extract

Summary

Programmer's PythonEverything is Data

Is now available as a print book: Amazon

Contents

Related Articles

Comments

Programmer's Python
Everything is Data