Python code optimization with ctypes

A translation of the article was prepared specifically for students of the Python Developer course .




Note : the code in this article is licensed under GNU AGPLv3 .

I wrote this guide because I could not find one that combines all the useful things about ctypes. I hope this article makes someone's life a lot easier.

Content:

  1. Basic optimizations
  2. styles
  3. Python compilation
  4. Structures in Python
  5. Call your code in C
  6. Pypy

Basic optimizations


Before rewriting the Python source code in C, consider the basic optimization methods in Python.

Built-in Data Structures


Python's built-in data structures, such as set and dict, are written in C. They work much faster than your own data structures written as Python classes. Other data structures besides the standard set, dict, list, and tuple are described in the collections module documentation .

List expressions


Instead of adding items to the list using the standard method, use list expressions.

# Slow
      mapped = []
      for value in originallist:
          mapped.append(myfunc(value))
      
      # Faster
      mapped = [myfunc(value) in originallist]

ctypes


The ctypes module allows you to interact with C code from Python without using a module subprocessor other similar module to start other processes from the CLI.

There are only two parts: compiling C code to load in quality shared objectand setting up data structures in Python code to map them to types C.

In this article I will connect my Python code with lcs.c , which finds the longest subsequence in two lists of strings. I want the following to work in Python:

list1 = ['My', 'name', 'is', 'Sam', 'Stevens', '!']
      list2 = ['My', 'name', 'is', 'Alex', 'Stevens', '.']
      
      common = lcs(list1, list2)
      
      print(common)
      # ['My', 'name', 'is', 'Stevens']

One problem is that this particular C function is the signature of a function that takes lists of strings as argument types and returns a type that does not have a fixed length. I solve this problem with a sequence structure containing pointers and length.

Compiling C code in Python


First, C source code ( lcs.c ) is compiled in lcs.soto load in Python.

gcc -c -Wall -Werror -fpic -O3 lcs.c -o lcs.o
      gcc -shared -o lcs.so lcs.o

  • - Wall will display all warnings;
  • - Werror will wrap all warnings in errors;
  • - fpic will generate position-independent instructions that you will need if you want to use this library in Python;
  • - O3 maximizes optimization;

And now we will begin to write Python code using the resulting shared object file .

Structures in Python


Below are two data structures that are used in my C code.

struct Sequence
      {
          char **items;
          int length;
      };
      
      struct Cell
      {
          int index;
          int length;
          struct Cell *prev;
      };

And here is the translation of these structures into Python.

import ctypes
      class SEQUENCE(ctypes.Structure):
          _fields_ = [('items', ctypes.POINTER(ctypes.c_char_p)),
                      ('length', ctypes.c_int)]
      
      class CELL(ctypes.Structure):
          pass
      
      CELL._fields_ = [('index', ctypes.c_int), ('length', ctypes.c_int),
                       ('prev', ctypes.POINTER(CELL))]

A few notes:

  • All structures are classes that inherit from ctypes.Structure.
  • The only field _fields_is a list of tuples. Each tuple is ( <variable-name>, <ctypes.TYPE>).
  • There ctypesare similar types in c_char (char) and c_char_p (* char) .
  • There is ctypesalso one POINTER()that creates a type pointer from each type passed to it.
  • If you have a recursive definition like in CELL, you must pass the initial declaration, and then add fields _fields_in order to later get a link to yourself.
  • Since I did not use CELLPython in my code, I did not need to write this structure, but it has an interesting property in the recursive field.

Call your code in C


In addition, I needed some code to convert Python types to new structures in C. Now you can use your new C function to speed up Python code.

def list_to_SEQUENCE(strlist: List[str]) -> SEQUENCE:
          bytelist = [bytes(s, 'utf-8') for s in strlist]
          arr = (ctypes.c_char_p * len(bytelist))()
          arr[:] = bytelist
          return SEQUENCE(arr, len(bytelist))
      
      
      def lcs(s1: List[str], s2: List[str]) -> List[str]:
          seq1 = list_to_SEQUENCE(s1)
          seq2 = list_to_SEQUENCE(s2)
      
          # struct Sequence *lcs(struct Sequence *s1, struct Sequence *s2)
          common = lcsmodule.lcs(ctypes.byref(seq1), ctypes.byref(seq2))[0]
      
          ret = []
      
          for i in range(common.length):
              ret.append(common.items[i].decode('utf-8'))
          lcsmodule.freeSequence(common)
      
          return ret
      
      lcsmodule = ctypes.cdll.LoadLibrary('lcsmodule/lcs.so')
      lcsmodule.lcs.restype = ctypes.POINTER(SEQUENCE)
      
      list1 = ['My', 'name', 'is', 'Sam', 'Stevens', '!']
      list2 = ['My', 'name', 'is', 'Alex', 'Stevens', '.']
      
      common = lcs(list1, list2)
      
      print(common)
      # ['My', 'name', 'is', 'Stevens']

A few notes:

  • **char (list of strings) matches directly to a list of bytes in Python.
  • There lcs.cis a function lcs()with the signature struct Sequence * lcs (struct Sequence * s1, struct Sequence * s2) . To set up the return type, I use lcsmodule.lcs.restype = ctypes.POINTER(SEQUENCE).
  • To make a call with reference to the Sequence structure , I use ctypes.byref()one that returns a “light pointer” to your object (faster than ctypes.POINTER()).
  • common.items- this is a list of bytes, they can be decoded to get retin the form of a list str.
  • lcsmodule.freeSequence (common) just frees the memory associated with common. This is important because the garbage collector (AFAIK) will not automatically collect it.

Optimized Python code is code that you wrote in C and wrapped in Python.

Something More: PyPy


Attention: I myself have never used PyPy.
One of the simplest optimizations is to run your programs in the PyPy runtime , which contains a JIT compiler (just-in-time) that speeds up the work of loops, compiling them into machine code for repeated execution.

If you have comments or want to discuss something, write to me (samuel.robert.stevens@gmail.com).

That's all. See you on the course !

All Articles