🥤 👍🏼 🆒 Exploring the code quality of the Zephyr operating system 🚕 🌂 👩🏾‍🎓

We recently said that the PVS-Studio code analyzer began to integrate with PlatformIO. Naturally, the PVS-Studio development team communicated with the PlatformIO team and they suggested, for the sake of interest, checking the code of the Zephyr real-time operating system. Why not, we thought, and here is an article about such a study.

PlatformIO

Before starting the main part of the article, I would like to recommend the PlatformIO project to developers of embedded systems , which can make their life easier. It is a cross-platform microcontroller programming tool. The core of PlatformIO is a command-line tool, but it is recommended to use it as a plug-in for Visual Studio Code. A large number of modern chips and motherboards based on them are supported. It is able to automatically download suitable assembly systems, and the site has a large collection of libraries for managing plug-in electronic components.

PVS-Studio

PVS-Studio is still little known in the world of embedded systems, so just in case, I will make an introduction for new readers who are not yet familiar with this tool. Our regular readers can skip right to the next section.

PVS-Studio is a static code analyzer that allows detecting errors and potential vulnerabilities in the code of programs written in C, C ++, C # and Java. If we talk only about C and C ++, then the following compilers are supported:

Windows Visual Studio 2010-2019 C, C ++, C ++ / CLI, C ++ / CX (WinRT)
Windows IAR Embedded Workbench, C / C ++ Compiler for ARM C, C ++
Windows QNX Momentics, QCC C, C ++
Windows / Linux Keil µVision, DS-MDK, ARM Compiler 5/6 C, C ++
Windows / Linux Texas Instruments Code Composer Studio, ARM Code Generation Tools C, C ++
Windows/Linux/macOS. GNU Arm Embedded Toolchain, Arm Embedded GCC compiler, C, C++
Windows/Linux/macOS. Clang C, C++
Linux/macOS. GCC C, C++
Windows. MinGW C, C++

The analyzer has its own warning classification system , but if necessary, you can enable the display of alerts according to the coding standards CWE , SEI CERT , MISRA .

You can quickly start regularly using PVS-Studio even in a large legacy project. A special mechanism for mass suppression of warnings is provided for this . All current warnings are considered technical debt and are hidden, which allows you to focus on warnings that apply only to new or modified code. This allows the team to immediately start using the analyzer daily in their work, and you can come back to the technical duty from time to time and improve the code.

There are many other scenarios for using PVS-Studio. For example, you can use it as a plugin to SonarQube. Integration with systems such as Travis CI, CircleCI, GitLab CI / CD, etc. is possible. A more detailed description of PVS-Studio is beyond the scope of this article. Therefore, I propose to familiarize yourself with the article, which has many useful links, and in which answers to many questions are given: " Reasons to introduce the PVS-Studio static code analyzer into the development process ."

Zephyr

Working on the integration of PVS-Studio into PlatformIO , our teams talked and were asked to check out a project from the embedded world, namely Zephyr. We liked the idea, which was the reason for writing this article.

Zephyr is a lightweight real-time operating system designed to work on devices with limited resources of various architectures. The code is distributed under the open source Apache 2.0 license. Works on the following platforms: ARM (Cortex-M0, Cortex-M3, Cortex-M4, Cortex-M23, Cortex-M33, Cortex-R4, Cortex-R5, Cortex-A53), x86, x86-64, ARC, RISC- V, Nios II, Xtensa.

Some features:

Unified address space. The specific application code in combination with the custom kernel creates a monolithic image that is executed on the device.
. , .
. .
. . .
: , , , , .

Of the interesting moments for us, Synopsys is involved in the development of the operating system . In 2014, Synopsys acquired Coverity, which produced the static code analyzer of the same name.

It is only natural that from the very beginning, the Zephyr developer uses the Coverity analyzer . The analyzer is a market leader and this could not but affect the quality of the operating system code for the better.

Zephyr Code Quality

In my opinion, the Zephyr operating system code is quality. Here is what gives me reason to think so:

PVS-Studio 122 High 367 Medium. , , 560 C/C++ . . 7810 C/C++ 10075 . , . , , .
. «» , .
The SourceMonitor utility , having analyzed the source code, gave statistics that 48% of the code is comments. This is a lot and in my experience indicates a high concern for the quality of the code, its comprehensibility for other developers.
When developing a project, a Coverity static code analyzer is used. Most likely, due to this fact, the PVS-Studio analyzer, although it found errors in the project, could not show itself vividly, as it sometimes happens when analyzing other projects.

Based on this, I believe that the authors of the project care about the quality and reliability of the code. Let's now look at some warnings issued by the PVS-Studio analyzer (version 7.06).

Semi-false warnings

The project code, due to its low level, is written quite specifically and with a lot of conditional compilation (#ifdef). This generates a large number of warnings that do not indicate a real mistake, but they can not be called simply false. It will be easiest to clarify this with a few examples.

An example of a “semi-false” actuation N1

static struct char_framebuffer char_fb;

int cfb_framebuffer_invert(struct device *dev)
{
  struct char_framebuffer *fb = &char_fb;

  if (!fb || !fb->buf) {
    return -1;
  }

  fb->inverted = !fb->inverted;

  return 0;
}

PVS-Studio Warning: V560 A part of conditional expression is always false :! Fb . cfb.c 188

When taking the address of a static variable, a non-zero pointer is always obtained. Therefore, the pointer fb is always non-zero and its verification does not make sense.

However, it is clear that this is not an error at all, but simply an excessive check, which does no harm. Moreover, when building the Release version, the compiler will throw it away, so this will not even cause a slowdown.

A similar case, in my understanding, falls under the concept of “semi-false” analyzer operation. Formally, the analyzer is absolutely right. And it's better to remove the extra pointless check from the code. However, all this is petty and such warnings are not even interesting to consider in the framework of the article.

An example of a “semi-cut” actuation of N2

int hex2char(u8_t x, char *c)
{
  if (x <= 9) {
    *c = x + '0';
  } else if (x >= 10 && x <= 15) {
    *c = x - 10 + 'a';
  } else {
    return -EINVAL;
  }
  return 0;
}

Warning PVS-Studio: V560 A part of conditional expression is always true: x> = 10. hex.c 31

The analyzer is again formally right in asserting that part of the condition is always true. If the variable x is not less than / equal to 9, then it turns out that it is always greater than / equal to 10. And the code can be simplified:

} else if (x <= 15) {

Once again, it’s clear that there is no real harmful error here, and the extra comparison is written just for the beauty of the code.

Now let's look at a more complex example of N3.

First, let's see how the CHECKIF macro can be implemented .

#if defined(CONFIG_ASSERT_ON_ERRORS)
#define CHECKIF(expr) \
  __ASSERT_NO_MSG(!(expr));   \
  if (0)
#elif defined(CONFIG_NO_RUNTIME_CHECKS)
#define CHECKIF(...) \
  if (0)
#else
#define CHECKIF(expr) \
  if (expr)
#endif

Depending on the compilation mode of the project, the check may be either performed or skipped. In our case, when checking the code using PVS-Studio, this macro implementation was selected:

#define CHECKIF(expr) \
  if (expr)

Now let's see what this leads to.

int k_queue_append_list(struct k_queue *queue, void *head, void *tail)
{
  CHECKIF(head == NULL || tail == NULL) {
    return -EINVAL;
  }

  k_spinlock_key_t key = k_spin_lock(&queue->lock);
  struct k_thread *thread = NULL;
  if (head != NULL) {
    thread = z_unpend_first_thread(&queue->wait_q);
  }
  ....
}

PVS-Studio warning : V547 [CWE-571] Expression 'head! = NULL' is always true. queue.c 244

The analyzer considers that checking (head! = NULL) always gives true. And indeed it is. If the head pointer was NULL, then the function would cease to work due to a check at the beginning of the function:

CHECKIF(head == NULL || tail == NULL) {
  return -EINVAL;
}

Recall that here the macro expands as follows:

if (head == NULL || tail == NULL) {
  return -EINVAL;
}

So, the PVS-Studio analyzer is right from its point of view and issues a correct warning. However, this check cannot be deleted. She is needed. In another scenario, the macro will open like this:

if (0) {
  return -EINVAL;
}

And then re-checking the pointer is needed. Of course, the analyzer will not generate a warning in this version of code compilation. However, it gives a warning for the debug version of the compilation.

I hope that now it’s clear to readers where the “semi-false” warnings come from. However, there is nothing wrong with them. PVS-Studio analyzer provides various mechanisms for suppressing false alerts, which can be found in the documentation.

Case Warnings

But did you find something interesting after all? It was a success, and now we will look at various errors. In this case, I want to immediately note two points:

. : , , , Coverity. , PVS-Studio - , .
. , «» . , . GitHub, PVS-Studio.

N1,

static void gen_prov_ack(struct prov_rx *rx, struct net_buf_simple *buf)
{
  ....
  if (link.tx.cb && link.tx.cb) {
    link.tx.cb(0, link.tx.cb_data);
  }
  ....
}

PVS-Studio Warning: V501 [CWE-571] There are identical sub-expressions to the left and to the right of the '&&' operator: link.tx.cb && link.tx.cb pb_adv.c 377

One is double checked the same variable link.tx.cb . Apparently, this is a typo, and the second variable to be checked should be link.tx.cb_data .

Fragment N2, overflowing the buffer

Consider the function net_hostname_get , which will be used later.

#if defined(CONFIG_NET_HOSTNAME_ENABLE)
const char *net_hostname_get(void);
#else
static inline const char *net_hostname_get(void)
{
  return "zephyr";
}
#endif

In my case, when preprocessing, the option related to the #else branch was selected . That is, in the preprocessed file, the function is implemented like this:

static inline const char *net_hostname_get(void)
{
  return "zephyr";
}

The function returns a pointer to an array of 7 bytes (we take into account the terminal zero at the end of the line).

Now consider the code that leads to the exit of the array boundary.

static int do_net_init(void)
{
  ....
  (void)memcpy(hostname, net_hostname_get(), MAX_HOSTNAME_LEN);
  ....
}

PVS-Studio Warning: V512 [CWE-119] A call of the 'memcpy' function will lead to the 'net_hostname_get ()' buffer becoming out of range. log_backend_net.c 114

After preprocessing, MAX_HOSTNAME_LEN is expanded as follows:

(void)memcpy(hostname, net_hostname_get(),
    sizeof("xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx"));

Accordingly, when copying data, an out-of-bounds string literal occurs. It is difficult to predict how this will affect the execution of the program, as this leads to undefined behavior.

Fragment N3, potential buffer overrun

int do_write_op_json(struct lwm2m_message *msg)
{
  u8_t value[TOKEN_BUF_LEN];
  u8_t base_name[MAX_RESOURCE_LEN];
  u8_t full_name[MAX_RESOURCE_LEN];
  ....
  /* combine base_name + name */
  snprintf(full_name, TOKEN_BUF_LEN, "%s%s", base_name, value);
  ....
}

PVS-Studio Warning: V512 [CWE-119] A call of the 'snprintf' function will lead to overflow of the buffer 'full_name'. lwm2m_rw_json.c 826

If we substitute macro values, the picture of what is happening looks like this:

u8_t value[64];
u8_t base_name[20];
u8_t full_name[20];
....
snprintf(full_name, 64, "%s%s", base_name, value);

Only 20 bytes are allocated under the full_name buffer in which the string is formed. In this case, the parts from which the string is formed are stored in buffers of 20 and 64 bytes in size. Moreover, the constant 64, passed to the snprintf function and designed to prevent the array from going abroad, is obviously too big!

This code does not necessarily lead to a buffer overflow. Perhaps always lucky, and substrings are always very small. However, in general, this code is not protected from overflow in any way and contains the classic security flaw CWE-119 .

Fragment N4, expression is always true

static int keys_set(const char *name, size_t len_rd, settings_read_cb read_cb,
                    void *cb_arg)
{
  ....
  size_t len;
  ....
  len = read_cb(cb_arg, val, sizeof(val));
  if (len < 0) {
    BT_ERR("Failed to read value (err %zu)", len);
    return -EINVAL;
  }
  ....
}

PVS-Studio warning : V547 [CWE-570] Expression 'len <0' is always false. Unsigned type value is never <0. keys.c 312 The

variable len has an unsigned type and, therefore, cannot be less than 0. Accordingly, the error status is not processed in any way. In other places , the type int or ssize_t is used to store the result of the read_cb function . Example:

static inline int mesh_x_set(....)
{
 ssize_t len;
 len = read_cb(cb_arg, out, read_len);
 if (len < 0) {
 ....
}

Note. Everything seems to be bad with the read_cb function . The fact is that it is declared like this:

static u8_t read_cb(const struct bt_gatt_attr *attr, void *user_data)

Type u8_t is an unsigned char.

The function always returns only positive numbers of type unsigned char . If you put this value in a signed variable of type int or ssize_t , all the same, the value will always be positive. Therefore, in other places, checking for error status does not work either. But I did not delve into the study of this issue.

Fragment N5, something very strange

static char *mntpt_prepare(char *mntpt)
{
  char *cpy_mntpt;

  cpy_mntpt = k_malloc(strlen(mntpt) + 1);
  if (cpy_mntpt) {
    ((u8_t *)mntpt)[strlen(mntpt)] = '\0';
    memcpy(cpy_mntpt, mntpt, strlen(mntpt));
  }
  return cpy_mntpt;
}

PVS-Studio Warning: V575 [CWE-628] The 'memcpy' function doesn't copy the whole string. Use 'strcpy / strcpy_s' function to preserve terminal null. shell.c 427

Someone tried to make an analog of the strdup function , but he did not succeed.

Let's start with the analyzer warning. It reports that the memcpy function copies the line, but does not copy the terminal zero, and this is very suspicious.

It seems that this terminal 0 is copied here:

((u8_t *)mntpt)[strlen(mntpt)] = '\0';

But no! Here's a typo, because of which the terminal zero is copied to itself! Note that writing to the mntpt array , not cpy_mntpt . As a result, the mntpt_prepare function returns a string that is incomplete with a terminal zero.

In fact, the programmer wanted to write like this:

((u8_t *)cpy_mntpt)[strlen(mntpt)] = '\0';

However, it is still not clear why it was so complicated! This code can be simplified to the following option:

static char *mntpt_prepare(char *mntpt)
{
  char *cpy_mntpt;

  cpy_mntpt = k_malloc(strlen(mntpt) + 1);
  if (cpy_mntpt) {
    strcpy(cpy_mntpt, mntpt);
  }
  return cpy_mntpt;
}

Fragment N6, dereferencing a pointer before validation

int bt_mesh_model_publish(struct bt_mesh_model *model)
{
  ....
  struct bt_mesh_model_pub *pub = model->pub;
  ....
  struct bt_mesh_msg_ctx ctx = {
    .send_rel = pub->send_rel,
  };
  ....
  if (!pub) {
    return -ENOTSUP;
  }
  ....
}

PVS-Studio Warning: V595 [CWE-476] The 'pub' pointer was utilized before it was verified against nullptr. Check lines: 708, 719. access.c 708

A very common error pattern. First, the pointer is dereferenced to initialize a member of the structure:

.send_rel = pub->send_rel,

And only then follows a check that this pointer can be null.

Fragment N7-N9, pointer dereferencing before validation

int net_tcp_accept(struct net_context *context, net_tcp_accept_cb_t cb,
                   void *user_data)
{
  ....
  struct tcp *conn = context->tcp;
  ....
  conn->accept_cb = cb;

  if (!conn || conn->state != TCP_LISTEN) {
    return -EINVAL;
  }
  ....
}

PVS-Studio Warning: V595 [CWE-476] The 'conn' pointer was utilized before it was verified against nullptr. Check lines: 1071, 1073. tcp2.c 1071

The same as in the previous case. Explanation is not required here.

Two more such errors can be seen here:

V595 [CWE-476] The 'context-> tcp' pointer was utilized before it was verified against nullptr. Check lines: 1512, 1518. tcp.c 1512
V595 [CWE-476] The 'fsm' pointer was utilized before it was verified against nullptr. Check lines: 365, 382. fsm.c 365

Fragment N10, erroneous check

static int x509_get_subject_alt_name( unsigned char **p,
                                      const unsigned char *end,
                                      mbedtls_x509_sequence *subject_alt_name)
{
  ....
    while( *p < end )
    {
        if( ( end - *p ) < 1 )
            return( MBEDTLS_ERR_X509_INVALID_EXTENSIONS +
                    MBEDTLS_ERR_ASN1_OUT_OF_DATA );
    ....
  }
  ....
}

PVS-Studio warning : V547 [CWE-570] Expression '(end - * p) <1' is always false. x509_crt.c 635

Take a close look at the conditions:

* p <end
(end - * p) <1

They contradict each other.

If (* p <end), then (end - * p) will always give a value of 1 or more. In general, there is something wrong here, but I do not know how to spell it correctly.

Fragment N11, unreachable code

uint32_t lv_disp_get_inactive_time(const lv_disp_t * disp)
{
    if(!disp) disp = lv_disp_get_default();
    if(!disp) {
        LV_LOG_WARN("lv_disp_get_inactive_time: no display registered");
        return 0;
    }

    if(disp) return lv_tick_elaps(disp->last_activity_time);

    lv_disp_t * d;
    uint32_t t = UINT32_MAX;
    d          = lv_disp_get_next(NULL);
    while(d) {
        t = LV_MATH_MIN(t, lv_tick_elaps(d->last_activity_time));
        d = lv_disp_get_next(d);
    }

    return t;
}

PVS-Studio Warning: V547 [CWE-571] Expression 'disp' is always true. lv_disp.c 148

The function terminates if disp is a null pointer. Then, on the contrary, it is checked that the disp pointer is not null (and this is always the case), and the function again finishes its work.

As a result, part of the code in a function will never get control at all.

Fragment N12, strange return value

static size_t put_end_tlv(struct lwm2m_output_context *out, u16_t mark_pos,
        u8_t *writer_flags, u8_t writer_flag,
        int tlv_type, int tlv_id)
{
  struct tlv_out_formatter_data *fd;
  struct oma_tlv tlv;
  u32_t len = 0U;

  fd = engine_get_out_user_data(out);
  if (!fd) {
    return 0;
  }

  *writer_flags &= ~writer_flag;

  len = out->out_cpkt->offset - mark_pos;

  /* use stored location */
  fd->mark_pos = mark_pos;

  /* set instance length */
  tlv_setup(&tlv, tlv_type, tlv_id, len);
  len = oma_tlv_put(&tlv, out, NULL, true) - tlv.length;
  return 0;
}

PVS-Studio Warning: V1001 The 'len' variable is assigned but is not used by the end of the function. lwm2m_rw_oma_tlv.c 338

The function contains two return statements that both return 0. It is strange that the function always returns 0. It is also strange that the len variable is no longer used after assignment. I have a big suspicion that it should actually be written like this:

  len = oma_tlv_put(&tlv, out, NULL, true) - tlv.length;
  return len;
}

Fragment N13-N16, synchronization error

static int nvs_startup(struct nvs_fs *fs)
{
  ....
  k_mutex_lock(&fs->nvs_lock, K_FOREVER);
  ....
  if (fs->ate_wra == fs->data_wra && last_ate.len) {
    return -ESPIPE;
  }
  ....
end:
  k_mutex_unlock(&fs->nvs_lock);
  return rc;
}

PVS-Studio Warning: V1020 The function exited without calling the 'k_mutex_unlock' function. Check lines: 620, 549. nvs.c 620

There is a situation when a function finishes its work without unlocking the mutex. As I understand it, it would be correct to write like this:

static int nvs_startup(struct nvs_fs *fs)
{
  ....
  k_mutex_lock(&fs->nvs_lock, K_FOREVER);
  ....
  if (fs->ate_wra == fs->data_wra && last_ate.len) {
    rc = -ESPIPE;
    goto end;
  }
  ....
end:
  k_mutex_unlock(&fs->nvs_lock);
  return rc;
}

Three more such errors:

V1020 The function exited without calling the 'k_mutex_unlock' function. Check lines: 574, 549. nvs.c 574
V1020 The function exited without calling the 'k_mutex_unlock' function. Check lines: 908, 890. net_context.c 908
V1020 The function exited without calling the 'k_mutex_unlock' function. Check lines: 1194, 1189. shell.c 1194

Conclusion

I hope you enjoyed it. Visit our blog to read about checks on other projects and other interesting publications.

Use static analyzers in your work to reduce the number of errors and potential vulnerabilities even at the stage of writing code. Especially early error detection is relevant for embedded systems, updating programs in which is often a time-consuming and expensive process.

I also suggest not postponing and trying to check your projects using the PVS-Studio analyzer. See the article: How to quickly see interesting warnings generated by the PVS-Studio analyzer for C and C ++ code?

If you want to share this article with an English-speaking audience, then please use the link to the translation: Andrey Karpov. Checking the Code of Zephyr Operating System .

Exploring the code quality of the Zephyr operating system