🙂 ⬅️ 👨🏼‍⚖️ Testing the GCC 10 Compiler with PVS-Studio 🖇️ 🍃 👩‍🚒

The GCC compiler is written with copious use of macros. Another check of the GCC code using PVS-Studio once again confirms the opinion of our team that macros are bad. Such code is difficult to understand not only for a static analyzer, but also for a programmer. Of course, GCC developers are already used to the project and are well versed in it. But from the side it is very difficult to understand something. Actually, because of the macros, it was not possible to fully perform code verification. Nevertheless, the PVS-Studio analyzer, as always, showed that it can find errors even in compilers.

Time to double-check the GCC compiler code

The last time I checked the GCC compiler four years ago. Time flies quickly and imperceptibly, and somehow I all forgot to return to this project and recheck it. The publication " Static analysis in GCC 10 " pushed back to this idea .

Actually, it is no secret that compilers have their own built-in static code analyzers and they are also developing. Therefore, from time to time we write articles that the PVS-Studio static analyzer can find errors even inside compilers and that we are not in vain eating bread :).

In fact, you cannot compare classical static analyzers with compilers. Static analyzers are not only a search for errors in the code, but also a developed infrastructure. For example, this is integration with systems such as SonarQube, PlatformIO, Azure DevOps, Travis CI, CircleCI, GitLab CI / CD, Jenkins, Visual Studio. These are developed mechanisms for mass warning suppression, which allows you to quickly start using PVS-Studio even in a large old project. This is a notification mailing list. And so on and so forth. However, anyway, the first question asked is: “Can PVS-Studio find something that compilers cannot find?”. So, we will write articles over and over again about checking these compilers themselves.

Let's get back to the subject of the GCC project verification. There is no need to dwell on this project and tell what it is. Let's better talk what's inside this project.

And inside there are a huge number of macros that interfere with the verification. Firstly, the PVS-Studio analyzer generates a large number of false positives. There is nothing wrong with that, but it’s not so easy to take and start studying the report issued by him. In a good way, you need to do work to suppress false warnings in macros. Otherwise, useful warnings just drown in a stream of noise. This setup is beyond the scope of this article writing task. Actually, I will be completely honest - I was just too lazy to do this, although there is nothing complicated about it. Due to the noise, viewing the report was quite superficial.

Secondly, it is very difficult for me, as a person not familiar with the project, to understand the code. Macros, macros ... You have to look at what they are deployed to understand why the analyzer generates warnings. Very hard. I do not like macros . Someone may say that without macros in C can not do. But GCC has not been written in C at all for a long time. Yes, files for historical reasons have the extension .c, but you look there, and there:

//  alias.c
....
struct alias_set_hash : int_hash <int, INT_MIN, INT_MIN + 1> {};
struct GTY(()) alias_set_entry {
  alias_set_type alias_set;
  bool has_zero_child;
  bool is_pointer;
  bool has_pointer;
  hash_map<alias_set_hash, int> *children;
};
....

This is clearly not C, but C ++.

In general, macros and coding style make it very difficult to study the analyzer report. So this time I will not please a long list of various errors. With difficulty and using a few cups of coffee, I wrote out 10 interesting fragments, and this left me alone :).

10 suspicious code snippets

Fragment N1, Seems unsuccessful Copy-Paste

static bool
try_crossjump_to_edge (int mode, edge e1, edge e2,
                       enum replace_direction dir)
{
  ....
  if (FORWARDER_BLOCK_P (s->dest))
    s->dest->count += s->count ();

  if (FORWARDER_BLOCK_P (s2->dest))
    s2->dest->count -= s->count ();
  ....
}

PVS-Studio Warning: V778 Two similar code fragments were found. Perhaps, this is a typo and 's2' variable should be used instead of 's'. cfgcleanup.c 2126

In fact, I'm not sure if this is a mistake. However, I have a strong suspicion that this code was written using Copy-Paste, and in the second block in one place they forgot to replace s with s2 . That is, it seems to me the second block of code should be like this:

if (FORWARDER_BLOCK_P (s2->dest))
  s2->dest->count -= s2->count ();

Fragment N2, Typo

tree
vn_reference_lookup_pieces (....)
{
  struct vn_reference_s vr1;
  ....
  vr1.set = set;
  vr1.set = base_set;
  ....
}

PVS-Studio Warning: V519 The 'vr1.set' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 3448, 3449. tree-ssa-sccvn.c 3449

It is very strange that different values are written to the same variable twice in a row. This is an obvious typo. Next to this file is this code:

vr1.set = set;
vr1.base_set = base_set;

Most likely, in the suspicious code it should have been written exactly the same.

Fragment N3, Assigning a variable to itself

static omp_context *
new_omp_context (gimple *stmt, omp_context *outer_ctx)
{
  omp_context *ctx = XCNEW (omp_context);

  splay_tree_insert (all_contexts, (splay_tree_key) stmt,
         (splay_tree_value) ctx);
  ctx->stmt = stmt;

  if (outer_ctx)
    {
      ctx->outer = outer_ctx;
      ctx->cb = outer_ctx->cb;
      ctx->cb.block = NULL;
      ctx->local_reduction_clauses = NULL;
      ctx->outer_reduction_clauses = ctx->outer_reduction_clauses;  // <=
      ctx->depth = outer_ctx->depth + 1;
    }
  ....
}

PVS-Studio warning: V570 The 'ctx-> outer_reduction_clauses' variable is assigned to itself. omp-low.c 935

It is very strange to assign a variable to itself.

Fragment N4. 0,1,2, Freddy will pick you up.

I recently published an article, " Zero, One, Two, Freddy Will Take You ." It seems to me that the following code snippet continues the collection of errors discussed in this article.

#define GET_MODE(RTX)    ((machine_mode) (RTX)->mode)
....
static int
add_equal_note (rtx_insn *insns, rtx target, enum rtx_code code, rtx op0,
                rtx op1, machine_mode op0_mode)
{
  ....
  if (commutative_p
      && GET_MODE (xop0) != xmode0 && GET_MODE (xop1) != xmode1
      && GET_MODE (xop0) == xmode1 && GET_MODE (xop1) == xmode1)
    std::swap (xop0, xop1);
  ....
}

PVS-Studio warning: V560 A part of conditional expression is always false:
((machine_mode) (xop1) -> mode) == xmode1. optabs.c 1053

Note these two subexpressions:

GET_MODE (xop1)! = Xmode1
GET_MODE (xop1) == xmode1

The AND operation is performed on the results of these subexpressions, which obviously has no practical meaning. Actually, if the second subexpression started to be executed, then it is known in advance that it will result in false .

Most likely, there is a typo here in zeros and ones, and in fact the condition should have been like this:

&& GET_MODE (xop0) != xmode0 && GET_MODE (xop1) != xmode1
&& GET_MODE (xop0) == xmode1 && GET_MODE (xop1) == xmode0

Of course, I'm not sure if I changed the code correctly, since I am not guided in the project.

Fragment N5. Suspicious change in argument value

bool
ipa_polymorphic_call_context::set_by_invariant (tree cst,
                                                tree otr_type,
                                                HOST_WIDE_INT off)
{
  poly_int64 offset2, size, max_size;
  bool reverse;
  tree base;

  invalid = false;
  off = 0;                // <=
  ....
  if (otr_type && !contains_type_p (TREE_TYPE (base), off, otr_type))
    return false;

  set_by_decl (base, off);
  return true;
}

PVS-Studio Warning: V763 Parameter 'off' is always rewritten in function body before being used. ipa-polymorphic-call.c 766

The value of the off argument is immediately replaced by 0. Moreover, there is no explanatory comment. All this is very suspicious. Sometimes this code appears during debugging. The programmer needed to see how the function behaves in a certain mode, so he temporarily changed the value of the argument, and then they forgot to delete this line. As a result, an error appears in the code. Of course, everything may be right here, but this code clearly needs to be checked and clarified to ensure that similar questions do not arise in the future.

Fragment N6. Little thing

cgraph_node *
cgraph_node::create_clone (....)
{
  ....
  new_node->icf_merged = icf_merged;
  new_node->merged_comdat = merged_comdat;   // <=
  new_node->thunk = thunk;
  new_node->unit_id = unit_id;
  new_node->merged_comdat = merged_comdat;   // <=
  new_node->merged_extern_inline = merged_extern_inline;
  ....
}

PVS-Studio warning: V519 The 'new_node-> merged_comdat' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 406, 409. cgraphclones.c 409 Assignment is

randomly duplicated. Most likely nothing wrong. However, there is always a risk that in reality they forgot to perform some other assignment.

Fragment N7. Code that looks dangerous

static void
complete_mode (struct mode_data *m)
{
  ....
  if (m->cl == MODE_COMPLEX_INT || m->cl == MODE_COMPLEX_FLOAT)
    alignment = m->component->bytesize;
  else
    alignment = m->bytesize;

  m->alignment = alignment & (~alignment + 1);

  if (m->component)
  ....
}

PVS-Studio Warning: V595 The 'm-> component' pointer was utilized before it was verified against nullptr. Check lines: 407, 415. genmodes.c 407

At the beginning, the m-> component pointer is dereferenced in one of the branches of the if statement . I mean this expression: m-> component-> bytesize .

It further turns out that this pointer may be null. This follows from the check: if (m-> component) .

This code is not necessarily wrong. It is entirely possible that a dereferenced branch is only executed if the pointer is not null. That is, there is an indirect relationship between the value of the variable m-> cl and m-> component. But this code looks very dangerous in any case. And there are no explanatory comments.

Fragment N8. Double check

void
pointer_and_operator::wi_fold (value_range &r, tree type,
                               const wide_int &lh_lb,
                               const wide_int &lh_ub,
                               const wide_int &rh_lb ATTRIBUTE_UNUSED,
                               const wide_int &rh_ub ATTRIBUTE_UNUSED) const
{
  // For pointer types, we are really only interested in asserting
  // whether the expression evaluates to non-NULL.
  if (wi_zero_p (type, lh_lb, lh_ub) || wi_zero_p (type, lh_lb, lh_ub))
    r = range_zero (type);
  else 
    r = value_range (type);
}

PVS-Studio Warning: V501 There are identical sub-expressions 'wi_zero_p (type, lh_lb, lh_ub)' to the left and to the right of the '||' operator. range-op.cc 2657

Some kind of weird check. The wi_zero_p function is called twice with the same set of actual arguments. One may suspect that in fact, the second call should use the arguments accepted from the outside: rh_lb , rh_ub. But no, these arguments are marked as unused ( ATTRIBUTE_UNUSED ).

Therefore, it is not clear to me why not writing a check is simpler:

if (wi_zero_p (type, lh_lb, lh_ub))
  r = range_zero (type);
else 
  r = value_range (type);

Or is there some typo here? Or a logical mistake? I don’t know, but the code is very strange.

Fragment N9. Dangerous array access

struct algorithm
{
  struct mult_cost cost;
  short ops;
  enum alg_code op[MAX_BITS_PER_WORD];
  char log[MAX_BITS_PER_WORD];
};

static void
synth_mult (struct algorithm *alg_out, unsigned HOST_WIDE_INT t,
            const struct mult_cost *cost_limit, machine_mode mode)
{
  int m;
  struct algorithm *alg_in, *best_alg;
  ....
  /* Cache the result.  */
  if (!cache_hit)
  {
    entry_ptr->t = t;
    entry_ptr->mode = mode;
    entry_ptr->speed = speed;
    entry_ptr->alg = best_alg->op[best_alg->ops];
    entry_ptr->cost.cost = best_cost.cost;
    entry_ptr->cost.latency = best_cost.latency;
  }

  /* If we are getting a too long sequence for `struct algorithm'
     to record, make this search fail.  */
  if (best_alg->ops == MAX_BITS_PER_WORD)
    return;
  ....
}

PVS-Studio Warning: V781 The value of the 'best_alg-> ops' variable is checked after it was used. Perhaps there is a mistake in program logic. Check lines: 3157, 3164. expmed.c 3157

Let's shorten the code to make it clear what the analyzer does not like:

if (!cache_hit)
{
  entry_ptr->alg = best_alg->op[best_alg->ops];
}
if (best_alg->ops == MAX_BITS_PER_WORD)

At the beginning, the variable best_alg-> ops is used to index the array. And only then does this variable check for a boundary value. Theoretically, an overflow of an array can occur (a classic kind of error CWE-193: Off-by-one Error ).

Is this a real mistake? And as this is constantly happening in this article, I'm not sure :). Perhaps there is a relationship between the value of this index and the cache_hit variable . Perhaps nothing is cached if the index is at the maximum ( MAX_BITS_PER_WORD ). The function code is large, and I did not figure it out.

In any case, this code is best checked. And even if it turns out to be correct, I would recommend accompanying the considered section of the program with a comment. It can confuse not only me or PVS-Studio, but also someone else.

Fragment N10. Code that has not been fixed

in 4 years. In the last article, I drew attention to this code:

static bool
dw_val_equal_p (dw_val_node *a, dw_val_node *b)
{
  ....
  case dw_val_class_vms_delta:
    return (!strcmp (a->v.val_vms_delta.lbl1, b->v.val_vms_delta.lbl1)
            && !strcmp (a->v.val_vms_delta.lbl1, b->v.val_vms_delta.lbl1));
  ....
}

PVS-Studio warning: V501 There are identical sub-expressions '! Strcmp (a-> v.val_vms_delta.lbl1, b-> v.val_vms_delta.lbl1)' to the left and to the right of the '&&' operator. dwarf2out.c 1481

Two strcmp functions compare the same pointers. That is, a clearly redundant check is performed. In a previous article, I suggested that it was a typo, and should actually be written:

return (   !strcmp (a->v.val_vms_delta.lbl1, b->v.val_vms_delta.lbl1)
        && !strcmp (a->v.val_vms_delta.lbl2, b->v.val_vms_delta.lbl2));

However, for 4 years this code has not been fixed. At the same time, we informed the authors about suspicious sections of code that we described in the article. Now I'm not so sure that this is a mistake. Perhaps this is just redundant code. In this case, the expression can be simplified:

return (!strcmp (a->v.val_vms_delta.lbl1, b->v.val_vms_delta.lbl1));

Let's see if the GCC developers will change this piece of code after a new article.

Conclusion

I remind you that you can use this free license option to check open projects . By the way, there are other options for free licensing of PVS-Studio, including even for closed projects. They are listed here: " PVS-Studio Free Licensing Options ".

Thank you for the attention. And come read our blog . There are many interesting things.

Our other articles about checking compilers

LLVM check (Clang) (August 2011), second check (August 2012), third check (October 2016), fourth check (April 2019)
GCC Review (August 2016)
Check Huawei Ark Compiler (December 2019)
Verification of the .NET Compiler Platform (“Roslyn”) (December 2015), second inspection (April 2019)
Roslyn Analyzers Review (August 2019)
Checking PascalABC.NET (March 2017)

If you want to share this article with an English-speaking audience, then please use the link to the translation: Andrey Karpov. Checking the GCC 10 Compiler with PVS-Studio .

Testing the GCC 10 Compiler with PVS-Studio

Time to double-check the GCC compiler code

10 suspicious code snippets

Conclusion

Our other articles about checking compilers

More articles: