A study of one vague behavior

The article explores the possible manifestations of undefined behavior that occurs in c ++ when a non-void function is completed without calling return with a suitable value. The article is more scientific and entertaining than practical.

Who doesn’t like having fun jumping on a rake - we pass by, we don’t stop.

Introduction


Everyone knows that when developing c ++ code, you should not allow undefined behavior.
However:

  • indefinite behavior may not seem dangerous enough due to the abstractness of the possible consequences;
  • it’s not always clear where the line is.

Let's try to specify the possible manifestations of undefined behavior that occurs in one rather simple case - in a non-void function, there is no return.

To do this, consider the code generated by the most popular compilers in different optimization modes.

Research under Linux will be conducted using the Compiler Explorer . Research on Windows and macOs X - on the hardware directly available to me.

All builds will be done for x86-x64.

No measures will be taken to enhance or suppress compiler warnings / errors.

There will be a lot of disassembled code. Its design, unfortunately, is motley, because I have to use several different tools (well, at least I managed to get Intel syntax everywhere). I will give moderately detailed comments on disassembled code, which, however, do not eliminate the need for knowledge of processor registers and the principles of the stack.

Read Standard


C ++ 11 final draft n3797, C ++ 14 final draft N3936:
6.6.3 The return statement
...
Flowing off the end of a function is equivalent to a return with no value; this results in undefined
behavior in a value-returning function.
...

Reaching the end of a function is equivalent to return without a return value; for a function whose return value is provided, this leads to undefined behavior.

C ++ 17 draft n4713
9.6.3 The return statement
...
Flowing off the end of a constructor, a destructor, or a function with a cv void return type is equivalent to a return with no operand. Otherwise, flowing off the end of a function other than main (6.8.3.1) results in undefined behavior.
...

Reaching the end of a constructor, destructor, or function with a void return value (possibly with const and volatile qualifiers) is equivalent to return without a return value. For all other functions, this leads to undefined behavior (except for the main function).

What does this mean in practice?

If the function signature provides a return value:

  • its execution should end with a return statement with an instance of the appropriate type;
  • otherwise, vague behavior;
  • undefined behavior does not start from the moment the function is called and not from the moment the returned value is used, but from the moment the function is not completed properly;
  • if the function contains both correct and incorrect execution paths - undefined behavior will occur only on incorrect paths;
  • the undefined behavior in question does not affect the execution of instructions contained in the body of the function.

The phrase about the main function is not new to c ++ 17 - in previous versions of the Standard, a similar exception was described in section 3.6.1 Main function.

Example 1 - bool


In c ++ there is no type with a state simpler than bool. Let’s start with him.

#include <iostream>

bool bad() {};

int main()
{
    std::cout << bad();

    return 0;
}

MSVC generates a C4716 compilation error for such an example, so the code for MSVC will have to be slightly complicated by providing at least one correct execution path:

#include <iostream>
#include <stdlib.h>

bool bad()
{
    if (rand() == 0) {
        return true;
    }
}

int main()
{
    std::cout << bad();

    return 0;
}

Compilation:

PlatformCompilerCompilation result
Linuxx86-x64 Clang 10.0.0warning: non-void function does not return a value [-Wreturn-type]
Linuxx86-x64 gcc 9.3warning: no return statement in function returning non-void [-Wreturn-type]
macOs XApple clang version 11.0.0warning: control reaches end of non-void function [-Wreturn-type]
WindowsMSVC 2019 16.5.4The original example is error C4716, complicated - warning C4715: not all control paths return a value

Execution Results:
OptimizationProgram returnConsole output
Linux x86-x64 Clang 10.0.0
-O0255No output
-O1, -O20No output
Linux x86-x64 gcc 9.3
-O0089
-O1, -O2, -O30No output
macOs X Apple clang version 11.0.0
-O0, -O1, -O200
Windows MSVC 2019 16.5.4, original example
/ Od, / O1, / O2No buildNo build
Windows MSVC 2019 16.5.4 Complicated Example
/ Od041
/ O1, / O201

Even in this simplest example, four compilers have demonstrated at least three ways to display undefined behavior.

Let's figure out what these compilers compiled there.

Linux x86-x64 Clang 10.0.0, -O0


image

The last statement in the bad () function is ud2 .

Description of instructions from Intel 64 and IA-32 Architectures Software Developer's Manual :
UD2—Undefined Instruction
Generates an invalid opcode exception. This instruction is provided for software testing to explicitly generate an invalid opcode exception. The opcode for this instruction is reserved for this purpose.
Other than raising the invalid opcode exception, this instruction has no effect on processor state or memory.

Even though it is the execution of the UD2 instruction that causes the invalid opcode exception, the instruction pointer saved by delivery of the exception references the UD2 instruction (and not the following instruction).

This instruction’s operation is the same in non-64-bit modes and 64-bit mode.

In short, this is a special instruction for throwing an exception.

You need to wrap the bad () call in a try ... catch! Block

No matter how. This is not c ++ exception.

Is it possible to catch ud2 in runtime?
On Windows, __try should be used for this; on Linux and macOs X, the SIGILL signal handler.

Linux x86-x64 Clang 10.0.0, -O1, -O2


image

As a result of optimization, the compiler simply took and threw away both the body of the bad () function and its call.

Linux x86-x64 gcc 9.3, -O0


image

Explanations (in reverse order, since in this case the chain is easier to parse from the end):

5. The output operator in stream for bool is called (line 14);

4. The address std :: cout is placed in the edi register - this is the first argument of the output operator in stream (line 13);

3. The contents of the eax register are placed in the esi register - this is the second argument of the output operator in stream (line 12);

2. The three high bytes of eax are reset to zero, the value of al does not change (line 11);

1. The bad () function is called (line 10);

0. The bad () function should put the return value in the al register.

Instead, line 4 shows nop (No Operation, dummy).

One byte of garbage from the al register is output to the console. The program ends normally.

Linux x86-x64 gcc 9.3, -O1, -O2, -O3


image

The compiler threw everything as a result of optimization.

macOs X Apple clang version 11.0.0, -O0


Function main ():

image

The path of the Boolean argument of the output operator to the stream (this time in the direct order):

1. The contents of the al register are placed in the edx register (line 8);

2. All bits of the edx register are zeroed, except for the lowest (line 9);

3. A pointer to std :: cout is placed in the rdi register - this is the first argument of the output operator in stream (line 10);

4. The contents of the edx register are placed in the esi register - this is the second argument to the output operator in stream (line 11);

5. The output statement is called in stream for bool (line 13);

The main function expects to get the result of the bad () function from the al register.

The bad () function:

image

1. The value from the next byte of the stack, not yet allocated, is placed in the al register (line 4);

2. All bits of the al register are excepted, except for the least significant (line 5);

One bit of garbage from the unallocated stack is output to the console. It so happened that during a test run it turned out to be zero.

The program ends normally.

macOs X Apple clang version 11.0.0, -O1, -O2


image

The boolean argument of the output operator in stream is nullified (line 5).

The bad () call was thrown during optimization.

The program always displays zero in the console and exits normally.

Windows MSVC 2019 16.5.4, Advanced Example, / Od


image

It can be seen that the bad () function should provide a return value in the al register.

image

The value returned by the bad () function is first pushed onto the stack and then into the edx register for output to stream.

A single byte of garbage from the al register is output to the console (if a little more precisely, then the low byte of the result of rand ()). The program ends normally.

Windows MSVC 2019 16.5.4 Complicated Example, / O1, / O2


image

The compiler forcibly inlined the bad () call. Main () function:

  • copies one byte from ebx from memory located at [rsp + 30h];
  • if rand () returned zero, copy the unit from ecx to ebx (line 11);
  • copies the same value to dl (more precisely, its least significant byte) (line 13);
  • calls the output function in stream, which outputs the dl value (line 14).

One byte of garbage from RAM (from the address rsp + 30h) is output to stream.

The conclusion of example 1


The results of the consideration of disassembler listings are shown in the table:
OptimizationProgram returnConsole outputCause
Linux x86-x64 Clang 10.0.0
-O0255No outputud2
-O1, -O20No outputThe console output and the call to the bad () function were thrown as a result of optimization
Linux x86-x64 gcc 9.3
-O0089One byte of garbage from register al
-O1, -O2, -O30No outputThe console output and the call to the bad () function were thrown as a result of optimization
macOs X Apple clang version 11.0.0
-O000One bit of garbage from RAM
-O1, -O200Function call bad () replaced by zero
Windows MSVC 2019 16.5.4, original example
/ Od, / O1, / O2No buildNo buildNo build
Windows MSVC 2019 16.5.4 Complicated Example
/ Od041One byte of garbage from register al
/ O1, / O201One byte of garbage from RAM

As it turned out, the compilers did not demonstrate 3, but as many as 6 variants of undefined behavior - just before considering disassembler listings, we could not distinguish some of them.

Example 1a - Managing Undefined Behavior


Let's try to steer a little with undefined behavior - affect the value returned by the bad () function.

This can only be done with compilers that output garbage.
To do this, palm off the desired values ​​into the places from which the compilers will take them.

Linux x86-x64 gcc 9.3, -O0


The empty bad () function does not modify the value of register al, as the calling code requires it. Thus, if we place a certain value in al before calling bad (), then we expect to see that value as the result of executing bad ().

Obviously, this can be done by calling any other function that returns bool. But it can also be done using a function that returns, for example, unsinged char.

Full example code
#include <iostream>

bool bad() {}

bool goodTrue()
{
    return rand();
}

bool goodFalse()
{
    return !goodTrue();
}

unsigned char goodChar(unsigned char ch)
{
    return ch;
}

int main()
{
    goodTrue();
    std::cout << bad() << std::endl;

    goodChar(85);
    std::cout << bad() << std::endl;

    goodFalse();
    std::cout << bad() << std::endl;

    goodChar(240);
    std::cout << bad() << std::endl;

    return 0;
}


Output to the console:
1
85
0
240

Windows MSVC 2019 16.5.4, / Od


In the example for MSVC, the bad () function returns the low byte of the result of rand ().

Without modifying the bad () function, external code can affect its return value by modifying the result of rand ().

Full example code
#include <iostream>
#include <stdlib.h>

void control(unsigned char value)
{
    uint32_t count = 0;
    srand(0);
    while ((rand() & 0xff) != value) {
        ++count;
    }

    srand(0);
    for (uint32_t i = 0; i < count; ++i) {
        rand();
    }
}

bool bad()
{
    if (rand() == 0) {
        return true;
    }
}

int main()
{
    control(1);
    std::cout << bad() << std::endl;

    control(85);
    std::cout << bad() << std::endl;

    control(0);
    std::cout << bad() << std::endl;

    control(240);
    std::cout << bad() << std::endl;

    return 0;
}


Output to the console:
1
85
0
240


Windows MSVC 2019 16.5.4, / O1, / O2


To influence not the value “returned” by the bad () function, it is enough to create one stack variable. So that the record in it was not thrown out during optimization, you should mark it as volatile.
Full example code
#include <iostream>
#include <stdlib.h>

bool bad()
{
  if (rand() == 0) {
    return true;
  }
}

int main()
{
  volatile unsigned char ch = 1;
  std::cout << bad() << std::endl;

  ch = 85;
  std::cout << bad() << std::endl;

  ch = 0;
  std::cout << bad() << std::endl;

  ch = 240;
  std::cout << bad() << std::endl;

  return 0;
}


Output to the console:
1
85
0
240


macOs X Apple clang version 11.0.0, -O0


Before calling bad (), you must enter a certain value in that memory cell that will be one less than the top of the stack at the time of calling bad ().

Full example code
#include <iostream>

bool bad() {}

void putToStack(uint8_t value)
{
    uint8_t memory[1]{value};
}

int main()
{
    putToStack(20);
    std::cout << bad() << std::endl;

    putToStack(55);
    std::cout << bad() << std::endl;

    putToStack(0xfe);
    std::cout << bad() << std::endl;

    putToStack(11);
    std::cout << bad() << std::endl;

    return 0;
}

-O0, memory. , .

memory , — , , .

, .. , — putToStack .

Output to the console:
0
1
0
1

It seems to have happened: it is possible to change the output of the bad () function, and only the low-order bit is taken into account.

The conclusion of example 1a


An example made it possible to verify the correct interpretation of disassembler listings.

Example 1b - broken bool


Well, you think of it, "41" will be displayed in the console instead of "1" ... Is this dangerous?

We will check on two compilers that provide a whole byte of garbage.

Windows MSVC 2019 16.5.4, / Od


Full example code
#include <iostream>
#include <stdlib.h>
#include <set>
#include <unordered_set>

bool bad()
{
    if (rand() == 0) {
        return true;
    }
}

int main()
{
    bool badBool1 = bad();
    bool badBool2 = bad();

    std::cout << "badBool1: " << badBool1 << std::endl;
    std::cout << "badBool2: " << badBool2 << std::endl;

    if (badBool1) {
      std::cout << "if (badBool1): true" << std::endl;
    } else {
      std::cout << "if (badBool1): false" << std::endl;
    }
    if (!badBool1) {
      std::cout << "if (!badBool1): true" << std::endl;
    } else {
      std::cout << "if (!badBool1): false" << std::endl;
    }

    std::cout << "(badBool1 == true || badBool1 == false || badBool1 == badBool2): "
              << std::boolalpha << (badBool1 == true || badBool1 == false || badBool1 == badBool2)
              << std::endl;
    std::cout << "std::set<bool>{badBool1, badBool2, true, false}.size(): "
              << std::set<bool>{badBool1, badBool2, true, false}.size()
              << std::endl;
    std::cout << "std::unordered_set<bool>{badBool1, badBool2, true, false}.size(): "
              << std::unordered_set<bool>{badBool1, badBool2, true, false}.size()
              << std::endl;

    return 0;
}


Output to the console:
badBool1: 41
badBool2: 35
if (badBool1): true
if (! badBool1): false
(badBool1 == true || badBool1 == false || badBool1 == badBool2): false
std :: set <bool> {badBool1, badBool2 , true, false} .size (): 4
std :: unordered_set <bool> {badBool1, badBool2, true, false} .size (): 4

Undefined behavior led to the appearance of a Boolean variable that breaks at least:
  • comparison operators for boolean values;
  • hash function of boolean value.


Windows MSVC 2019 16.5.4, / O1, / O2


Full example code
#include <iostream>
#include <stdlib.h>
#include <set>
#include <unordered_set>

bool bad()
{
  if (rand() == 0) {
    return true;
  }
}

int main()
{
  volatile unsigned char ch = 213;
  bool badBool1 = bad();
  ch = 137;
  bool badBool2 = bad();

  std::cout << "badBool1: " << badBool1 << std::endl;
  std::cout << "badBool2: " << badBool2 << std::endl;

  if (badBool1) {
    std::cout << "if (badBool1): true" << std::endl;
  }
  else {
    std::cout << "if (badBool1): false" << std::endl;
  }
  if (!badBool1) {
    std::cout << "if (!badBool1): true" << std::endl;
  }
  else {
    std::cout << "if (!badBool1): false" << std::endl;
  }

  std::cout << "(badBool1 == true || badBool1 == false || badBool1 == badBool2): "
    << std::boolalpha << (badBool1 == true || badBool1 == false || badBool1 == badBool2)
    << std::endl;
  std::cout << "std::set<bool>{badBool1, badBool2, true, false}.size(): "
    << std::set<bool>{badBool1, badBool2, true, false}.size()
    << std::endl;
  std::cout << "std::unordered_set<bool>{badBool1, badBool2, true, false}.size(): "
    << std::unordered_set<bool>{badBool1, badBool2, true, false}.size()
    << std::endl;

  return 0;
}


Output to the console:
badBool1: 213
badBool2: 137
if (badBool1): true
if (! badBool1): false
(badBool1 == true || badBool1 == false || badBool1 == badBool2): false
std :: set <bool> {badBool1, badBool2 , true, false} .size (): 4
std :: unordered_set <bool> {badBool1, badBool2, true, false} .size (): 4

Work with a corrupted Boolean variable did not change when optimization was turned on.

Linux x86-x64 gcc 9.3, -O0


Full example code
#include <iostream>
#include <stdlib.h>
#include <set>
#include <unordered_set>

bool bad()
{
}

unsigned char goodChar(unsigned char ch)
{
  return ch;
}

int main()
{
  goodChar(213);
  bool badBool1 = bad();

  goodChar(137);
  bool badBool2 = bad();

  std::cout << "badBool1: " << badBool1 << std::endl;
  std::cout << "badBool2: " << badBool2 << std::endl;

  if (badBool1) {
    std::cout << "if (badBool1): true" << std::endl;
  }
  else {
    std::cout << "if (badBool1): false" << std::endl;
  }
  if (!badBool1) {
    std::cout << "if (!badBool1): true" << std::endl;
  }
  else {
    std::cout << "if (!badBool1): false" << std::endl;
  }

  std::cout << "(badBool1 == true || badBool1 == false || badBool1 == badBool2): "
    << std::boolalpha << (badBool1 == true || badBool1 == false || badBool1 == badBool2)
    << std::endl;
  std::cout << "std::set<bool>{badBool1, badBool2, true, false}.size(): "
    << std::set<bool>{badBool1, badBool2, true, false}.size()
    << std::endl;
  std::cout << "std::unordered_set<bool>{badBool1, badBool2, true, false}.size(): "
    << std::unordered_set<bool>{badBool1, badBool2, true, false}.size()
    << std::endl;

  return 0;
}


Output to the console:
badBool1: 213
badBool2: 137
if (badBool1): true
if (! badBool1): true
(badBool1 == true || badBool1 == false || badBool1 == badBool2): false
std :: set <bool> {badBool1, badBool2 , true, false} .size (): 4
std :: unordered_set <bool> {badBool1, badBool2, true, false} .size (): 4


Compared to MSVC, gcc also added the incorrect operation of the not operator.

The conclusion of example 1b


Disruption of basic operations with Boolean values ​​can have serious consequences for high-level logic.

Why did it happen?

Because some operations with Boolean variables are implemented under the assumption that true is strictly a unit.

We will not consider this issue in the disassembler - the article turned out to be voluminous.

Once again, we will clarify the table with the behavior of the compilers:
OptimizationProgram returnConsole outputCauseConsequences of using the result of bad ()
Linux x86-x64 Clang 10.0.0
-O0255No outputud2
-O1, -O20No outputThe console output and the call to the bad () function were thrown as a result of optimization
Linux x86-x64 gcc 9.3
-O0089One byte of garbage from register alViolation of work:
not; ==; ! =; <; >; <=; > =; std :: hash.
-O1, -O2, -O30No outputThe console output and the call to the bad () function were thrown as a result of optimization
macOs X Apple clang version 11.0.0
-O000One bit of garbage from RAM
-O1, -O200Function call bad () replaced by zero
Windows MSVC 2019 16.5.4, original example
/ Od, / O1, / O2No buildNo buildNo build
Windows MSVC 2019 16.5.4 Complicated Example
/ Od041One byte of garbage from register alViolation of work:
==; ! =; <; >; <=; > =; std :: hash.
/ O1, / O201One byte of garbage from RAMViolation of work:
==; ! =; <; >; <=; > =; std :: hash.

Four compilers gave 7 different manifestations of undefined behavior.

Example 2 - struct


Let's take an example a little more complicated:

#include <iostream>
#include <stdlib.h>

struct Test
{
    Test(uint64_t v)
        : value(v)
    {
        std::cout << "Test::Test(" << v << ")" << std::endl;
    }
    ~Test()
    {
        std::cout << "Test::~Test()" << std::endl;
    }

    uint64_t value;
};

Test bad(int v)
{
    if (v == 0) {
        return {42};
    } else if (v == 1) {
        return {142};
    }
}

int main()
{
    const auto rnd = rand();
    std::cout << "rnd: " << rnd << std::endl;

    std::cout << bad(rnd).value << std::endl;

    return 0;
}

The Test structure requires a single parameter of type int to construct. Diagnostic messages are output from its constructor and destructor. The bad (int) function has two valid execution paths, none of which will be implemented in a single call.

This time - first the table, then the disassembler analysis on obscure points.
OptimizationProgram returnConsole output
Linux x86-x64 Clang 10.0.0
-O0255rnd: 1804289383ud2
-O1, -O20rnd: 1804289383
Test::Test(142)
142
Test::~Test()
if (v == 1) . else if else.
Linux x86-x64 gcc 9.3
-O00rnd: 1804289383
4198608
Test::~Test()
nop .
value .
-O1, -O2, -O30rnd: 1804289383
Test::Test(142)
142
Test::~Test()
if (v == 1) . else if else.
macOs X Apple clang version 11.0.0
-O0The program has unexpectedly finished.rnd: 16807ud2
-O1, -O20rnd: 16807
Test::Test(142)
142
Test::~Test()
if (v == 1) . else if else.
Windows MSVC 2019 16.5.4
/Od /RTCsAccess violation reading location 0x00000000CCCCCCCCrnd: 41MSVC stack frame run-time error checking
/Od, /O1, /O20rnd: 41
8791061810776
Test :: ~ Test ()
Garbage from a memory location whose address is in rax

Again we see many options: in addition to the already known ud2, there are at least 4 different behaviors.

Compiler handling with a constructor is very interesting:

  • in some cases, execution continued without calling the constructor - in this case, the object was in some random state;
  • in other cases, a constructor call was not provided for on the execution path, which is rather strange.

Linux x86-x64 Clang 10.0.0, -O1, -O2


image

Only one comparison is made in the code (line 14), and there is only one conditional jump (line 15). The compiler ignored the second comparison and the second conditional jump.
This leads to the suspicion that indefinite behavior began earlier than the Standard prescribes.

But checking the condition of the second if does not contain any side effects, and the compiler logic worked as follows:

  • if the second condition is true - you need to call the constructor Test with argument 142;
  • if the second condition is not true, the function will exit without returning a value, which means undefined behavior in which the compiler can do anything. Including - call the same constructor with the same argument;
  • verification is superfluous; the Test constructor with argument 142 can be called without checking the condition.

Let's see what happens if the second check contains a condition with side effects:

Test bad(int v)
{
    if (v == 0) {
        return {42};
    } else if (v == rand()) {
        return {142};
    }
}

Full code
#include <iostream>
#include <stdlib.h>

struct Test
{
    Test(uint64_t v)
        : value(v)
    {
        std::cout << "Test::Test(" << v << ")" << std::endl;
    }
    ~Test()
    {
        std::cout << "Test::~Test()" << std::endl;
    }

    uint64_t value;
};

Test bad(int v)
{
    if (v == 0) {
        return {42};
    } else if (v == rand()) {
        return {142};
    }
}

int main()
{
    const auto rnd = rand();
    std::cout << "rnd: " << rnd << std::endl;

    std::cout << bad(rnd).value << std::endl;

    return 0;
}


image

The compiler honestly reproduced all the intended side effects by calling rand () (line 16), thereby dispelling doubts about the improperly early start of undefined behavior.

Windows MSVC 2019 16.5.4, / Od / RTCs


The / RTCs option enables stack frame run-time error checking. This option is available only in the debug assembly. Consider the disassembled code of the main () segment:

image

Before calling bad (int) (line 4), the arguments are prepared - the value of the rnd variable is copied to the edx register (line 2), and the effective address of some local variable located at the address is loaded into the rcx register rsp + 28h (line 3).

Presumably, rsp + 28 is the address of a temporary variable that stores the result of calling bad (int).

This assumption is confirmed by lines 19 and 20 - the effective address of the same variable is loaded into rcx, after which the destructor is called.

However, in the interval of lines 4 - 18, this variable is not accessed, despite the output of the value of its data field to stream.

As we saw from previous MSVC listings, the argument for the stream output operator should be expected in the rdx register. The rdx register gets the result of dereferencing the address located in rax (line 9).

Thus, the calling code expects from bad (int):

  • filling in a variable whose address is passed through the rcx register (here we see RVO in action);
  • returning the address of this variable through the rax register.

Let's move on to listing bad (int):

image

  • in eax, the value 0xCCCCCCCC is entered, which we saw in the Access violation message (line 9) (note that it is only 4 bytes, while in the AccessViolation message the address consists of 8 bytes);
  • the rep stos command is called, executing 0xC cycles of writing the contents of eax to memory starting from the address rdi (line 10). These are 48 bytes - exactly as much as is allocated on the stack in line 6;
  • on the correct execution paths, the value from rsp + 40h is entered in rax (lines 23, 36);
  • the value of the rcx register (through which main () passed the destination address) is pushed onto the stack at rsp + 8 (line 4);
  • rdi is pushed onto the stack, which reduces rsp by 8 (line 5);
  • 30h bytes are allocated on the stack by decreasing rsp (line 6).

So rsp + 8 in line 4 and rsp + 40h in the rest of the code are the same value.
The code is rather confusing as it does not use rbp.

There are two accidents in the Access Violation message:

  • zeros in the upper part of the address - there could be any garbage;
  • the address accidentally turned out to be incorrect.

Apparently, the / RTCs option enabled stack overwriting with certain non-zero values, and the Access Violation message was just a random side effect.

Let's see how the code with the / RTCs option turned on differs from the code without it.

image

The code for sections of main () differs only in the addresses of local variables on the stack.

image

(for clarity, I placed two versions of the bad (int) function side by side - with / RTCs and without)
Without the / RTCs, the rep stos instruction disappeared and preparing arguments for it at the beginning of the function.

Example 2a


Again, try to control indefinite behavior. This time for just one compiler.

Windows MSVC 2019 16.5.4, / Od / RTCs


With the / RTCs option, the compiler inserts code at the beginning of the bad (int) function that populates the lower half of rax with a fixed value, which can lead to an Access violation.

To change this behavior, just fill rax with some valid address.
This can be achieved with a very simple modification: add the output of something to std :: cout to the bad (int) body.

Full example code
#include <iostream>
#include <stdlib.h>

struct Test
{
    Test(uint64_t v)
        : value(v)
    {
        std::cout << "Test::Test(" << v << ")" << std::endl;
    }
    ~Test()
    {
        std::cout << "Test::~Test()" << std::endl;
    }

    uint64_t value;
};

Test bad(int v)
{
  std::cout << "rnd: " << v << std::endl;
  
  if (v == 0) {
        return {42};
    } else if (v == 1) {
        return {142};
    }
}

int main()
{
    const auto rnd = rand();

    std::cout << bad(rnd).value << std::endl;

    return 0;
}


rnd: 41
8791039331928
Test :: ~ Test ()

operator << returns a link to stream, which is implemented as placing the address std :: cout in rax. The address is correct, it can be dereferenced. Access violation is prevented.

Conclusion


Using the simplest examples, we were able to:

  • collect about 10 different manifestations of indefinite behavior;
  • learn in detail exactly how these options will be executed.

All compilers demonstrated strict adherence to the Standard - in no example did the indefinite behavior begin earlier. But you can’t refuse a fantasy to compiler developers.

Often, the manifestation depends on subtle nuances: it is worth adding or removing one seemingly irrelevant line of code - and the behavior of the program changes significantly.

Obviously, it’s easier not to write such code than to solve puzzles later.

All Articles