Start collecting errors in copy functions

memcpy

I have noticed several times that programmers make mistakes in simple data copy functions. This topic will require much more time in the future to study and select material in order to write a thorough article. But I wanted to share a couple of examples that I recently noticed.

The Baadera-Meinhof phenomenon? No I do not think so


As a member of the PVS-Studio team, I encounter a large number of errors that we discover in various projects. Like DevRel - I like to talk about it :). Today I decided to talk about incorrectly implemented data copy functions.

I have come across such unsuccessful functions more than once. But I did not write them out, because I did not attach any importance to this. However, since I noticed this trend, it's time to start collecting them. To begin, I will share the last two cases noticed.

Someone may argue that two cases - this is not a regularity. And that, perhaps, I drew attention to them solely because they met me after a short amount of time and triggered the Baader-Meinhof phenomenon .

The Baader-Meinhof phenomenon, also the illusion of frequency, is a cognitive distortion in which recently discovered information that appears again after a short period of time is perceived as unusually often repeated.

I think that this is not so. I already had the experience of such an observation about the comparison functions, which was then confirmed by the collected material: " Evil lives in the comparison functions ."

Okay, let's get to the point. The introduction in order to give just two examples so far has been too long :).

Example N1


In the article about Zephyr RTOS verification, I described just such an unsuccessful attempt to implement the strdup function analogue :

static char *mntpt_prepare(char *mntpt)
{
  char *cpy_mntpt;

  cpy_mntpt = k_malloc(strlen(mntpt) + 1);
  if (cpy_mntpt) {
    ((u8_t *)mntpt)[strlen(mntpt)] = '\0';
    memcpy(cpy_mntpt, mntpt, strlen(mntpt));
  }
  return cpy_mntpt;
}

PVS-Studio Warning: V575 [CWE-628] The 'memcpy' function doesn't copy the whole string. Use 'strcpy / strcpy_s' function to preserve terminal null. shell.c 427

The analyzer reports that the memcpy function copies the line, but does not copy the terminal zero, and this is very suspicious. It seems that this terminal 0 is copied here:

((u8_t *)mntpt)[strlen(mntpt)] = '\0';

No, there is a typo here, because of which the terminal zero is copied to itself. Note that writing to the mntpt array , not cpy_mntpt . As a result, the mntpt_prepare function returns a string that is incomplete with a terminal zero.

In fact, the programmer wanted to write like this:

((u8_t *)cpy_mntpt)[strlen(mntpt)] = '\0';

It is not clear why the code is written so confusing and non-standard. As a result, a serious mistake was made in a small and uncomplicated function. This code can be simplified to the following option:

static char *mntpt_prepare(char *mntpt)
{
  char *cpy_mntpt;

  cpy_mntpt = k_malloc(strlen(mntpt) + 1);
  if (cpy_mntpt) {
    strcpy(cpy_mntpt, mntpt);
  }
  return cpy_mntpt;
}

Example N2


void myMemCpy(void *dest, void *src, size_t n) 
{ 
   char *csrc = (char *)src; 
   char *cdest = (char *)dest; 
   for (int i=0; i<n; i++) 
     cdest[i] = csrc[i]; 
}

We ourselves did not identify this code using PVS-Studio, but I accidentally met it on the StackOverflow website: C and static Code analysis: Is this safer than memcpy?

However, if you check this function using the PVS-Studio analyzer, he will rightly notice:

  • V104 Implicit conversion of 'i' to memsize type in an arithmetic expression: i <n test.cpp 26
  • V108 Incorrect index type: cdest [not a memsize-type]. Use memsize type instead. test.cpp 27
  • V108 Incorrect index type: csrc [not a memsize-type]. Use memsize type instead. test.cpp 27

Indeed, this code contains a flaw, as indicated in the answers to StackOverflow. You cannot use an int variable as an index . In a 64-bit program, almost certainly (we don’t consider exotic architectures), the int variable will be 32-bit and the function will be able to copy no more than INT_MAX bytes. Those. no more than 2 gigabytes.

With a larger buffer to be copied, an overflow of the sign variable will occur, which from the point of view of C and C ++ is an undefined behavior. And by the way, do not try to guess exactly how the error will manifest itself. This is actually a difficult topic, which you can read about in the article " Undefined behavior is closer than you think ."

It is especially funny that this code appeared as an attempt to remove some kind of warning from the Checkmarx analyzer that occurred when the memcpy function was called . The programmer did not come up with anything better than making his own bike. And despite the simplicity of the copy function, it still turned out to be wrong. That is, in fact, the person most likely did even worse than it was. Instead of understanding the reason for the warning, he masked the problem by writing his own function (confusing the analyzer). Plus added an error using int for the counter . Oh yes, such code can still hinder optimization. It is inefficient to use your own code instead of the efficient optimized memcpy function . Do not do this :)

Conclusion


Well, I’m only at the beginning of the journey and, probably, it will take more than one year before I accumulate materials for a thorough publication on this topic. Actually, only now I will begin to write out such cases. Thank you for your attention and look what interesting the PVS-Studio analyzer will find in your C / C ++ / C # / Java code.



If you want to share this article with an English-speaking audience, then please use the link to the translation: Andrey Karpov. Starting My Collection of Bugs Found in Copy Functions .

All Articles