How strange code hides errors? TensorFlow.NET Analysis

TensorFlow.NET and PVS-Studio

Static analysis is an extremely useful tool for any developer, as it helps to find in time not only errors, but also just suspicious and strange pieces of code that can cause bewilderment for programmers who would have to work with him in the future. This idea will be demonstrated by the analysis of the open C # project TensorFlow.NET, developed to work with the popular TensorFlow machine learning library.

My name is Nikita Lipilin. Some time ago I joined the C # programmers department of PVS-Studio. Traditionally, all newcomers to the team write articles that discuss the results of checking various open projects using the PVS-Studio static analyzer. Such articles help new employees to get to know the product better, and at the same time provide additional benefits in terms of popularizing the static analysis methodology. I suggest that you familiarize yourself with my first work on the topic of analysis of open projects.

Introduction


The variety of possible errors in the program code is amazing. Some of them reveal themselves immediately, upon a surface examination of the created application, others can be difficult to notice even when conducting a code review by a team of experienced developers. However, it also happens that, due to carelessness or some other reason, the programmer sometimes writes simply strange and illogical code, which, nevertheless, (seems to) successfully fulfill its function. But only further, when returning to what was written, or when others study this code, questions begin to arise that cannot be answered.

Refactoring old code can turn into problems, especially if other parts of the program depend on it, so when you find even some frankly curved constructions, the ″ Does it work? Don’t touch it! ″. Subsequently, this makes it difficult to study the source, and therefore, it becomes difficult to expand the available capabilities. The code base is clogged, there is a likelihood that some small and invisible, but potentially very unpleasant internal problem will not be fixed in time.

At some point, the consequences of this error will be felt, but its search will take a lot of time, because the developer’s suspicions will fall on a huge number of strange code fragments that were not refactored at one time. It follows from this that various problems and oddities in a particular fragment should be corrected immediately after its writing. In the case when there are reasonable reasons to leave everything as it is (for example, if the code is written as some kind of blank ″ for later ″), such a fragment should be accompanied by an explanatory comment.

It is also worth noting that, regardless of the developer’s qualifications, some problematic and simply unsuccessful moments can slip away from his gaze, and sometimes a “temporary solution” can be applied at some place, which will soon be forgotten and becomes “permanent”. Subsequently, the analysis of such code (most likely, another developer will be engaged in this) will take an unacceptably much effort.

In such cases, code review can help. However, if the task is quite serious, then this option will require a lot of time. In addition, when there are a lot of small errors or shortcomings, then the inspector may well not notice high-level errors behind them. Checking the code becomes a tedious routine, and gradually the effectiveness of the review decreases.

Obviously, routine tasks will be much better transferred from person to computer. This approach is used in many areas of modernity. Automation of various processes is the key to prosperity. What is automation in the context of this topic?

A reliable assistant in solving the problem of writing reasonable and stable working code is static analysis. Each time before sending the results of their activities to the review, the programmer will be able to conduct an automated check and not burden other developers (and himself) with unnecessary work. The code will be sent for verification only after all analyzer warnings have been taken into account: errors have been fixed, and strange moments have been rewritten or at least explained by a comment.

Of course, the need for code review does not disappear, but static analysis complements and greatly simplifies its implementation. A sufficiently large part of the errors will be fixed thanks to the analyzer, and strange moments will definitely not be forgotten and marked accordingly. Accordingly, with a code review, it will be possible to focus on verifying the correctness of the implementation of complex logical interactions and finding underlying problems that, unfortunately, cannot be detected by the analyzer (so far).

TensorFlow.NET




This article is inspired by the TensorFlow.NET project. It represents the implementation of the ability to work with the functionality of the popular TensorFlow machine learning library through C # code (by the way, we also tested it ). This idea seemed quite interesting, because at the time of writing, working with the library is available only in Python, Java and Go.

The source code available on GitHub is constantly being updated and now its volume is a little more than one hundred thousand lines. After a superficial study, there was a desire to conduct verification using static analysis. PVS-Studio was used as a specific tool, which has proved its effectiveness in a fairly large number of different projects .


For TensorFlow.NET, the analyzer displayed a number of alerts: 39 High levels, 227 Medium levels and 154 Low levels (you can read about warning levels here in the subsection “Warning levels and sets of diagnostic rules”). A detailed analysis of each of them would make this article gigantic, therefore only those that seemed most interesting to me are described below. It is also worth noting that some of the problems found are encountered several times in the project, and the analysis of each such piece of code is beyond the scope of this article.

The project sets itself a rather serious task, and, unfortunately, the appearance of various kinds of strange code fragments is inevitable. In this article I will try to show that the use of static analysis can greatly simplify the work of programmers, pointing to areas that may cause questions. Not always a warning indicates an error, but quite often it indicates a code that would cause questions in a person. Accordingly, the likelihood of a strange code being either redesigned or, in any case, commented accordingly is significantly increased.

Fragments that attracted attention when studying the analyzer report


In fact, a fairly large number of analyzer warnings for this project can be called not so much errors, but rather strange code. When viewing lines for which a warning is issued, at least bewilderment arises. Of course, some of the examples below are probably ″ temporary solutions ″, but they don’t give any comments on this issue, which means that someone who will work with this code in the future will have questions, searching for answers to which will take extra time.


At the same time, some warnings point to code that is obviously not just weird, but just wrong. This is the main danger of ″ strange code ″ - it is extremely difficult to notice a real mistake if here and there you see incomprehensible solutions and gradually get used to the fact that the code seems to be incorrect.

Sophisticated collection tour


private static void _RemoveDefaultAttrs(....)
{
  var producer_op_dict = new Dictionary<string, OpDef>();
  producer_op_list.Op.Select(op =>
  {
    producer_op_dict[op.Name] = op;
    return op;
  }).ToArray();           
  ....
}

Analyzer Warning : V3010 The return value of function 'ToArray' is required to be utilized. importer.cs 218

The analyzer considers the call to Toray at this point to be suspicious, since the value returned by this function is not assigned to a variable. However, such a code is not an error. This construct is used to populate the producer_op_dict dictionary with values ​​corresponding to the elements of the producer_op_list.Op list . A call to ToArray is necessary so that the function passed as an argument to the Select method is called for all elements in the collection.

In my opinion, the code does not look the best. Filling out the dictionary is somewhat unobvious, and some developers may want to remove the ″ unnecessary ″ call of ToArray . It would be much simpler and more understandable to use the foreach loop here :

var producer_op_dict = new Dictionary<string, OpDef>();

foreach (var op in producer_op_list.Op)
{
  producer_op_dict[op.Name] = op;
}

In this case, the code looks as simple as possible.

Another similar moment looks like this:

public GraphDef convert_variables_to_constants(....)
{
  ....
  inference_graph.Node.Select(x => map_name_to_node[x.Name] = x).ToArray();
  ....
}

Analyzer Warning : V3010 The return value of function 'ToArray' is required to be utilized. graph_util_impl.cs 48

The only difference is that such a record looks more concise, but only the temptation to remove the ToArray call here does not disappear, and it still looks unobvious.

Temporary solution


public GraphDef convert_variables_to_constants(....)
{
  ....
  var source_op_name = get_input_name(node);
  while(map_name_to_node[source_op_name].Op == "Identity")
  {
    throw new NotImplementedException);
    ....
  }
  ....
}

Analyzer Warning : V3020 An unconditional 'throw' within a loop. graph_util_impl.cs 73

In the project under consideration, the following approach is often used: if some behavior needs to be implemented later, a NotImplementedException is thrown at the appropriate place . It is clear why the analyzer warns of a possible error in this piece: using while instead of if does not really look too reasonable.

This is not the only warning that appears due to the use of temporary solutions. For example, there is a method:

public static Tensor[] _SoftmaxCrossEntropyWithLogitsGrad(
  Operation op, Tensor[] grads
)
{
  var grad_loss = grads[0];
  var grad_grad = grads[1];
  var softmax_grad = op.outputs[1];
  var grad = _BroadcastMul(grad_loss, softmax_grad);

  var logits = op.inputs[0];
  if(grad_grad != null && !IsZero(grad_grad)) // <=
  {
    throw new NotImplementedException("_SoftmaxCrossEntropyWithLogitsGrad");
  }

  return new Tensor[] 
  {
    grad,
    _BroadcastMul(grad_loss, -nn_ops.log_softmax(logits))
  };
}

Analyzer Warning : V3022 Expression 'grad_grad! = Null &&! IsZero (grad_grad)' is always false. nn_grad.cs 93

In fact, a NotImplementedException ("_ SoftmaxCrossEntropyWithLogitsGrad") will never be thrown, as this code is simply unreachable. In order to unravel the reason, you must go to the code of the IsZero function :

private static bool IsZero(Tensor g)
{
  if (new string[] { "ZerosLike", "Zeros" }.Contains(g.op.type))
    return true;

  throw new NotImplementedException("IsZero");
}

The method either returns true or throws an exception. This code is not an error - obviously, the implementation here is left for later. The main thing is that ″ then ″ really has arrived. Well, it turned out very successfully that PVS-Studio will not let you forget that there is such an imperfection here :)

Is Tensor is Tensor?


private static Tensor[] _ExtractInputShapes(Tensor[] inputs)
{
  var sizes = new Tensor[inputs.Length];
  bool fully_known = true;
  for(int i = 0; i < inputs.Length; i++)
  {
    var x = inputs[i];

    var input_shape = array_ops.shape(x);
    if (!(input_shape is Tensor) || input_shape.op.type != "Const")
    {
      fully_known = false;
      break;
    }

    sizes[i] = input_shape;
  }
  ....
}

Analyzer Warning : V3051 An excessive type check. The object is already of the 'Tensor' type. array_grad.cs 154

The return type of the shape method is Tensor . So the input_shape is Tensor check looks at least weird. Perhaps, once the method returned a value of a different type and the verification made sense, but it is also possible that instead of Tensor, the condition should specify some kind of heir to this class. One way or another, the developer should pay attention to this fragment.

Thorough Check


public static Tensor[] _BaseFusedBatchNormGrad(....)
{
  ....
  if (data_format == "NCHW") // <=
    throw new NotImplementedException("");

  var results = grad_fun(new FusedBatchNormParams
  {
    YBackprop = grad_y,
    X = x,
    Scale = scale,
    ReserveSpace1 = pop_mean,
    ReserveSpace2 = pop_var,
    ReserveSpace3 = version == 2 ? op.outputs[5] : null,
    Epsilon = epsilon,
    DataFormat = data_format,
    IsTraining = is_training
  });

  var (dx, dscale, doffset) = (results[0], results[1], results[2]);
  if (data_format == "NCHW") // <=
    throw new NotImplementedException("");

  ....
}

Analyzer Warnings :

  • V3021 There are two 'if' statements with identical conditional expressions. The first 'if' statement contains method return. This means that the second 'if' statement is senseless nn_grad.cs 230
  • V3022 Expression 'data_format == "NCHW"' is always false. nn_grad.cs 247

Unlike some of the previous examples, there is clearly something wrong with the code. The second check does not make any sense, since if the condition is true, then the execution of the program will not reach it at all. Perhaps some typo is allowed here, or one of the checks is generally superfluous.

The illusion of choice


public Tensor Activate(Tensor x, string name = null)
{
  ....
  Tensor negative_part;
  if (Math.Abs(_threshold) > 0.000001f)
  {
    negative_part = gen_ops.relu(-x + _threshold);
  } else
  {
    negative_part = gen_ops.relu(-x + _threshold);
  }
  ....
}

Analyzer Warning : V3004 The 'then' statement is equivalent to the 'else' statement. gen_nn_ops.activations.cs 156

A rather amusing demonstration of the effectiveness of using static analysis in development. It is difficult to come up with a reasonable reason why the developer wrote this particular code - most likely, this is a typical copy-paste error (although this, of course, may be another ″ for later ″).

There are other fragments of a similar plan, for example:

private static Operation _GroupControlDeps(
  string dev, Operation[] deps, string name = null
)
{
  return tf_with(ops.control_dependencies(deps), ctl =>
  {
    if (dev == null)
    {
      return gen_control_flow_ops.no_op(name);
    }
    else
    {
      return gen_control_flow_ops.no_op(name);
    }
  });
}

Analyzer Warning : V3004 The 'then' statement is equivalent to the 'else' statement. control_flow_ops.cs 135

Maybe once the check made sense, but over time it disappeared, or some further changes are planned in the future. However, neither one nor the other option seems to be sufficient justification for leaving something similar in the code, without explaining this oddity in any way. With a high degree of probability, a copy-paste error was made in exactly the same way.

Belated Check


public static Tensor[] Input(int[] batch_shape = null,
  TF_DataType dtype = TF_DataType.DtInvalid,
  string name = null,
  bool sparse = false,
  Tensor tensor = null)
{
  var batch_size = batch_shape[0];
  var shape = batch_shape.Skip(1).ToArray(); // <=

  InputLayer input_layer = null;
  if (batch_shape != null)                   // <=
    ....
  else
    ....

  ....
}

Analyzer Warning : V3095 The 'batch_shape' object was used before it was verified against null. Check lines: 39, 42. keras.layers.cs 39

A classic and rather dangerous error of the potential use of a variable, which is a link to nowhere. In this case, the code clearly implies the possibility that batch_shape will be null - this is clear both from the list of arguments and from the subsequent verification of the same variable. Thus, the analyzer here indicates a clear error.

Another ″ for later ″?


public MnistDataSet(
  NDArray images, NDArray labels, Type dataType, bool reshape // <=
) 
{
  EpochsCompleted = 0;
  IndexInEpoch = 0;

  NumOfExamples = images.shape[0];

  images = images.reshape(
    images.shape[0], images.shape[1] * images.shape[2]
  );
  images = images.astype(dataType);
  // for debug np.multiply performance
  var sw = new Stopwatch();
  sw.Start();
  images = np.multiply(images, 1.0f / 255.0f);
  sw.Stop();
  Console.WriteLine($"{sw.ElapsedMilliseconds}ms");
  Data = images;

  labels = labels.astype(dataType);
  Labels = labels;
}

Analyzer Warning : V3117 Constructor parameter 'reshape' is not used. MnistDataSet.cs 15

Like some other oddities, this is most likely due to the fact that the functionality is far from being fully implemented and it is quite possible that the reshape parameter will be used somehow in this constructor in the future. But for the time being, it seems as if he is being thrown here just like that. If this is really done ″ for later ″, then it would be wise to mark this with some comment. If not, then it turns out that the code that constructs the object will have to throw an extra parameter into the constructor, and it is quite possible that it would be more convenient not to do this.

Elusive possible null dereference


public static Tensor[] _GatherV2Grad(Operation op, Tensor[] grads)
{
  ....
  if((int)axis_static == 0)
  {
    var params_tail_shape = params_shape.slice(new NumSharp.Slice(start:1));
    var values_shape = array_ops.concat(
      new[] { indices_size, params_tail_shape }, 0
    );
    var values = array_ops.reshape(grad, values_shape);
    indices = array_ops.reshape(indices, indices_size);
    return new Tensor[]
    {
      new IndexedSlices(values, indices, params_shape), // <=
      null,
      null
    };
  }
  ....
}

Analyzer Warning : V3146 Possible null dereference of the 1st argument 'values' inside method. The '_outputs.FirstOrDefault ()' can return default null value. array_grad.cs 199

To understand what the problem is, you should first turn to the indexedSlices constructor code :

public IndexedSlices(
  Tensor values, Tensor indices, Tensor dense_shape = null
)
{
  _values = values;
  _indices = indices;
  _dense_shape = dense_shape;

  _values.Tag = this; // <=
}

Obviously, passing null to this constructor will throw an exception. However, why does the analyzer consider that the values variable may contain null ?

PVS-Studio uses the Data-Flow Analysis technique, which allows you to find the sets of possible values ​​of variables in different parts of the code. A warning tells you that null in the specified variable can be returned with the following line: _outputs.FirstOrDefault () . However, looking at the code above, you can find that the value of the values variable is obtained by calling array_ops.reshape (grad, values_shape) . So where does _outputs.FirstOrDefault () then ?

The fact is that when analyzing the data stream, not only the current function is considered, but also all called; PVS-Studio gets information about the set of possible values ​​of any variable anywhere. Thus, a warning means that the implementation of array_ops.reshape (grad, values_shape) contains a call to _outputs.FirstOrDefault () , the result of which is ultimately returned.

To verify this, go to the reshape implementation :

public static Tensor reshape<T1, T2>(T1 tensor, T2 shape, string name = null)
            => gen_array_ops.reshape(tensor, shape, null);

Then go to the reshape method called inside:
public static Tensor reshape<T1, T2>(T1 tensor, T2 shape, string name = null)
{
  var _op = _op_def_lib._apply_op_helper(
    "Reshape", name, new { tensor, shape }
  );
  return _op.output;
}

The _apply_op_helper function returns an object of the Operation class that contains the output property . It is upon receipt of its value that the code described in the warning is called:

public Tensor output => _outputs.FirstOrDefault();

Tensor is, of course, a reference type, so the default value for it will be null . From all this it can be seen that PVS-Studio meticulously analyzes the logical structure of the code, penetrating deep into the structure of calls.

The analyzer completed its work, indicating a potentially problematic place. The programmer has to check whether a situation may arise when elements in _outputs are absent.

Thus, the static analysis will at least force the developer to pay attention to the suspicious fragment in order to assess whether the error may actually occur there. With this approach, the number of errors that go unnoticed will be rapidly reduced.

Unreliable waiting?


private (LoopVar<TItem>, Tensor[]) _BuildLoop<TItem>(
  ....
) where ....
{
  ....
  // Finds the closest enclosing non-None control pivot.
  var outer_context = _outer_context;
  object control_pivot = null;
  while (outer_context != null && control_pivot == null) // <=
  {

  }

  if (control_pivot != null)
  {

  }
  ....
}

Analyzer Warning : V3032 Waiting on this expression is unreliable, as compiler may optimize some of the variables. Use volatile variable (s) or synchronization primitives to avoid this. WhileContext.cs 212

The analyzer indicates that such an implementation of waiting can be optimized by the compiler, but I doubt that they really tried to implement waiting here - most likely, the code has simply not been added and it will be further developed in the future. It might be worth throwing a NotImplementedException here , given that this practice is used elsewhere in the project. One way or another, I believe that some explanatory comment would obviously not hurt.

Breaking borders


public TensorShape(int[][] dims)
{
  if(dims.Length == 1)
  {
    switch (dims[0].Length)
    {
      case 0: shape = new Shape(new int[0]); break;
      case 1: shape = Shape.Vector((int)dims[0][0]); break;
      case 2: shape = Shape.Matrix(dims[0][0], dims[1][2]); break; // <=
      default: shape = new Shape(dims[0]); break;
    }
  }
  else
  {
    throw new NotImplementedException("TensorShape int[][] dims");
  }
}

Analyzer Warning : V3106 Possibly index is out of bound. The '1' index is pointing beyond 'dims' bound. TensorShape.cs 107

Among the weird snippets of code I saw, there was a real mistake, which is very difficult to notice. The following fragment is erroneous here: dims [1] [2] . Getting an element with index 1 from an array of one element is obviously a mistake. At the same time, if you change the fragment to dims [0] [2] , another error appears - getting an element with index 2 from the dims [0] array , the length of which in this case branch is 2. Thus, this problem turned out to be as if with a "double bottom".

In any case, this code fragment should be studied and corrected by the developer. In my opinion, this example is an excellent illustration of the performance of Data Flow Analysis in PVS-Studio.

Olepatka?


private void _init_from_args(object initial_value = null, ....) // <=
{
  var init_from_fn = initial_value.GetType().Name == "Func`1"; // <=
  ....
  tf_with(...., scope =>
  {
    ....
    tf_with(...., delegate
    {
      initial_value = ops.convert_to_tensor(  // <=
        init_from_fn ? (initial_value as Func<Tensor>)():initial_value,
        name: "initial_value",
        dtype: dtype
      );
    });
    _shape = shape ?? (initial_value as Tensor).TensorShape;
    _initial_value = initial_value as Tensor; // <=
    ....
    _dtype = _initial_value.dtype.as_base_dtype(); // <=

    if (_in_graph_mode)
    {
      ....

      if (initial_value != null) // <=
      {
        ....
      }

      ....
    }

    ....
  });
}

To understand the code above, it is also worthwhile to introduce the implementation of the tf_with function:

[DebuggerStepThrough] // with "Just My Code" enabled this lets the 
[DebuggerNonUserCode()]  //debugger break at the origin of the exception
public static void tf_with<T>(
  T py, Action<T> action
) where T : ITensorFlowObject
{
  try
  {
    py.__enter__();
    action(py);
  }
  finally
  {
    py.__exit__();
    py.Dispose();
  }
}

Analyzer Warning : V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'initial_value', '_initial_value'. ResourceVariable.cs 137

_init_from_args is a fairly voluminous function, so many fragments have been omitted. You can see it completely by clicking on the link . Although the warning at first did not seem really serious to me, after studying, I realized that something was definitely wrong with the code.

Firstly, it should be noted that the method can be called without passing parameters and by default it will be null in initial_value . In this case, an exception will be thrown on the first line. Secondly, verification

initial_value to the null inequality looks strange: if initial_value really became null after calling ops.convert_to_tensor , then _initial_value would be null , which means that calling _initial_value.dtype.as_base_dtype () would also throw an exception.

Analyzer hints that you may need to check for null is _initial_value , but as noted above, referring to this variable take place before this test, so this option would also be incorrect.

Would this tiny mistake be noticed in such a giant function without PVS-Studio? I doubt it very much.

Conclusion


In a project with a lot of examples of strange code, a lot of problems can be hidden. The programmer, getting used to seeing the incomprehensible, ceases to notice errors at the same time. The consequences can be very sad. Indeed, among analyzer warnings there are also false ones, however, in most cases, warnings at least indicate fragments of code that can cause questions when viewed by a person. In the case when the strange code is written intentionally, it is worth leaving explanations so that the fragment is clear to the developer who will work with this code in the future (even if it means leaving comments for himself).

At the same time, static analysis tools, such as PVS-Studio, can be great help in finding potential errors and oddities so that they are visible and not forgotten, and all temporary solutions are subsequently refined and turned into a clean, structured and stable working the code.


If you want to share this article with an English-speaking audience, then please use the link to the translation: Nikita Lipilin. How does strange code hide errors? TensorFlow.NET Analysis .

All Articles