Go: JSON deserialization with incorrect typing, or how to work around API developer errors

image

Recently, I happened to develop an http client on Go for a service that provides a REST API with json as the encoding format. A standard task, but in the course of work I had to face a non-standard problem. I tell you what the point is.

As you know, the json format has data types. Four primitives: string, number, boolean, null; and two structural types: an object and an array. In this case, we are interested in primitive types. Here is an example json code with four fields of different types:

{
	"name":"qwerty",
	"price":258.25,
	"active":true,
	"description":null,
}

As the example shows, the string value is enclosed in quotation marks. Numeric - does not have quotation marks. A boolean type can have only one of two values: true or false (without quotes). And the null type is accordingly null (also without quotes).

And now the problem itself. At some point, in a detailed examination of the json code received from a third-party service, I found that one of the fields (let's call it price) periodically has a string value (the number in quotation marks) in addition to the numerical value. That is, the same query with different parameters can return a number as a number, or it can return the same number as a string. I can’t imagine how the code returning such results is organized at the other end, but apparently this is due to the fact that the service itself is an aggregator and pulls data from different sources, and the developers did not bring the server response json to a single format. Nevertheless, it is necessary to work with what is.

But then I was even more surprised. The logical field (let's call it active), in addition to true and false, returned the string values ​​“true”, “false”, and even numeric 1 and 0 (true and false, respectively).

All this confusion about data types would not be critical if I were to process json say in weakly typed PHP, but Go has strong typing, and requires a clear indication of the type of deserialized field. As a result, there was a need to implement a mechanism that allows converting all values ​​of the active field to a logical type during the deserialization process, and any value of the price field to a numeric one.

Let's start with the price field.

Suppose we have json code like this:

[
	{"id":1,"price":2.58},
	{"id":2,"price":7.15}
]

That is, json contains an array of objects with two fields of a numeric type. The standard deserialization code for this json on Go looks like this:

type Target struct {
	Id    int     `json:"id"`
	Price float64 `json:"price"`
}

func main() {
	jsonString := `[{"id":1,"price":2.58},
					{"id":4,"price":7.15}]`

	targets := []Target{}

	err := json.Unmarshal([]byte(jsonString), &targets)
	if err != nil {
		fmt.Println(err)
		return
	}

	for _, t := range targets {
		fmt.Println(t.Id, "-", t.Price)
	}
}

In this code, we will deserialize the id field to int and the price field to float64. Now suppose our json code looks like this:

[
	{"id":1,"price":2.58},
	{"id":2,"price":"2.58"},
	{"id":3,"price":7.15},
	{"id":4,"price":"7.15"}
]

That is, the price field contains values ​​of both a numeric type and a string. In this case, only the numeric values ​​of the price field can be decoded into type float64, while string values ​​will cause an error about incompatibility of types. This means that neither float64 nor any other primitive type is suitable for deserializing this field, and we need our own custom type with its own deserialization logic.

As such a type, declare a CustomFloat64 structure with a single Float64 field of type float64.

type CustomFloat64 struct{
	Float64 float64
}

And immediately indicate this type for the Price field in the Target structure:

type Target struct {
	Id    int           `json:"id"`
	Price CustomFloat64 `json:"price"`
}

Now you need to describe your own logic for decoding a field of type CustomFloat64.

The “encoding / json” package has two special methods: MarshalJSON and UnmarshalJSON , which are designed to customize the encoding and decoding logic of a specific user data type. It is enough to override these methods and describe your own implementation.

Override the UnmarshalJSON method for an arbitrary type CustomFloat64. In this case, it is necessary to strictly follow the signature of the method, otherwise it simply will not work, and most importantly it will not give an error.

func (cf *CustomFloat64) UnmarshalJSON(data []byte) error {

At the input, this method takes a slice of bytes (data), which contains the value of a particular field of the decoded json. If we convert this sequence of bytes into a string, then we will see the value of the field exactly in the form in which it is written in json. That is, if it is a string type, then we will see exactly a string with double quotes (“258”), if it is a numeric type, then we will see a string without quotes (258).

To distinguish a numeric value from a string value, you must check whether the first character is a quotation mark. Since the double quote character in the UNICODE table takes up one byte, we just need to check the first byte of the data slice by comparing it with the character number in UNICODE. This is number 34. Note that in general, a character is not equivalent to a byte, as it can take more than one byte. A symbol in Go is equivalent to rune (rune). In our case, this condition is sufficient:

if data[0] == 34 {

If the condition is met, then the value has a string type, and we need to get the string between quotes, i.e., the slice byte between the first and last byte. This slice contains a numerical value that can be decoded into the primitive type float64. This means that we can apply the json.Unmarshal method to it, while saving the result in the Float64 field of the CustomFloat64 structure.

err := json.Unmarshal(data[1:len(data)-1], &cf.Float64)

If the data slice does not start with a quotation mark, then it already contains a numerical data type, and we can apply the json.Unmarshal method directly to the entire data slice.

err := json.Unmarshal(data, &cf.Float64)

Here is the complete code for the UnmarshalJSON method:

func (cf *CustomFloat64) UnmarshalJSON(data []byte) error {
	if data[0] == 34 {
		err := json.Unmarshal(data[1:len(data)-1], &cf.Float64)
		if err != nil {
			return errors.New("CustomFloat64: UnmarshalJSON: " + err.Error())
		}
	} else {
		err := json.Unmarshal(data, &cf.Float64)
		if err != nil {
			return errors.New("CustomFloat64: UnmarshalJSON: " + err.Error())
		}
	}
	return nil
}

As a result, using the json.Unmarshal method to our json code, all the values ​​of the price field will be transparently converted to a primitive type float64 for us, and the result will be written to the Float64 field of the CustomFloat64 structure.

Now we may need to convert the Target structure back to json. But, if we apply the json.Marshal method directly to the CustomFloat64 type, then we serialize this structure as an object. We need to encode the price field into a numerical value. To customize the coding logic of the custom type CustomFloat64, we implement the MarshalJSON method for it, while strictly observing the signature of the method:

func (cf CustomFloat64) MarshalJSON() ([]byte, error) {
	json, err := json.Marshal(cf.Float64)
	return json, err
}

All you need to do in this method is again to use the json.Marshal method, but already apply it not to the CustomFloat64 structure, but to its Float64 field. From the method we return the received byte slice and error.

Here is the complete code that displays the results of serialization and deserialization (error checking is omitted for brevity, the number of the byte with the double quotation mark symbol is in constant):

package main

import (
	"encoding/json"
	"errors"
	"fmt"
)

type CustomFloat64 struct {
	Float64 float64
}

const QUOTES_BYTE = 34

func (cf *CustomFloat64) UnmarshalJSON(data []byte) error {
	if data[0] == QUOTES_BYTE {
		err := json.Unmarshal(data[1:len(data)-1], &cf.Float64)
		if err != nil {
			return errors.New("CustomFloat64: UnmarshalJSON: " + err.Error())
		}
	} else {
		err := json.Unmarshal(data, &cf.Float64)
		if err != nil {
			return errors.New("CustomFloat64: UnmarshalJSON: " + err.Error())
		}
	}
	return nil
}

func (cf CustomFloat64) MarshalJSON() ([]byte, error) {
	json, err := json.Marshal(cf.Float64)
	return json, err
}

type Target struct {
	Id    int           `json:"id"`
	Price CustomFloat64 `json:"price"`
}

func main() {
	jsonString := `[{"id":1,"price":2.58},
					{"id":2,"price":"2.58"},
					{"id":3,"price":7.15},
					{"id":4,"price":"7.15"}]`

	targets := []Target{}

	_ := json.Unmarshal([]byte(jsonString), &targets)

	for _, t := range targets {
		fmt.Println(t.Id, "-", t.Price.Float64)
	}

	jsonStringNew, _ := json.Marshal(targets)
	fmt.Println(string(jsonStringNew))
}

Code Execution Result:

1 - 2.58
2 - 2.58
3 - 7.15
4 - 7.15
[{"id":1,"price":2.58},{"id":2,"price":2.58},{"id":3,"price":7.15},{"id":4,"price":7.15}]

Let's move on to the second part and implement the same code for json deserialization with inconsistent values ​​of the logical field.

Suppose we have json code like this:

[
	{"id":1,"active":true},
	{"id":2,"active":"true"},
	{"id":3,"active":"1"},
	{"id":4,"active":1},
	{"id":5,"active":false},
	{"id":6,"active":"false"},
	{"id":7,"active":"0"},
	{"id":8,"active":0},
	{"id":9,"active":""}
]

In this case, the active field implies a logical type and the presence of only one of two values: true and false. Non-boolean values ​​will need to be converted to boolean during deserialization.

In the current example, we admit the following matches. True values ​​correspond to: true (logical), “true” (string), “1” (string), 1 (numeric). The false value corresponds to: false (logical), false (string), 0 (string), 0 (numeric), "" (empty string).

First, we’ll declare the target structure for deserialization. As the type of the Active field, we immediately specify the custom type CustomBool:

type Target struct {
	Id     int        `json:"id"`
	Active CustomBool `json:"active"`
}

CustomBool is a structure with one single bool field of type bool:

type CustomBool struct {
	Bool bool
}

We implement the UnmarshalJSON method for this structure. I’ll give you the code right away:

func (cb *CustomBool) UnmarshalJSON(data []byte) error {
	switch string(data) {
	case `"true"`, `true`, `"1"`, `1`:
		cb.Bool = true
		return nil
	case `"false"`, `false`, `"0"`, `0`, `""`:
		cb.Bool = false
		return nil
	default:
		return errors.New("CustomBool: parsing \"" + string(data) + "\": unknown value")
	}
}

Since the active field in our case has a limited number of values, we can make a decision using the switch-case construct about what the value of the Bool field of the CustomBool structure should be equal to. To check, you need only two case blocks. In the first block, we check the value for true, in the second - false.

When recording possible values, you should pay attention to the role of gravel (this is such a quotation mark on the key with the letter E in the English layout). This character allows you to escape double quotes in a string. For clarity, I framed the values ​​with quotes and without quotes with this symbol. Thus, `false` corresponds to the string false (without quotes, type bool in json), and` false 'corresponds to the string “false” (with quotes, type string in json). The same thing with the values ​​of `1` and` 1 '. The first is the number 1 (written in json without quotes), the second is the string "1" (in json written with quotes). This entry `` "` is an empty string, i.e., in json format it looks like this: "".

The corresponding value (true or false) is written directly to the Bool field of the CustomBool structure:

cb.Bool = true

In the defaul block, we return an error stating that the field has an unknown value:

return errors.New("CustomBool: parsing \"" + string(data) + "\": unknown value")

Now we can apply the json.Unmarshal method to our json code, and the values ​​of the active field will be converted to a primitive type bool.

We implement the MarshalJSON method for the CustomBool structure:

func (cb CustomBool) MarshalJSON() ([]byte, error) {
	json, err := json.Marshal(cb.Bool)
	return json, err
}

Nothing new here. The method serializes the Bool field of the CustomBool structure.

Here is the complete code that displays the results of serialization and deserialization (error checking omitted for brevity):

package main

import (
	"encoding/json"
	"errors"
	"fmt"
)

type CustomBool struct {
	Bool bool
}

func (cb *CustomBool) UnmarshalJSON(data []byte) error {
	switch string(data) {
	case `"true"`, `true`, `"1"`, `1`:
		cb.Bool = true
		return nil
	case `"false"`, `false`, `"0"`, `0`, `""`:
		cb.Bool = false
		return nil
	default:
		return errors.New("CustomBool: parsing \"" + string(data) + "\": unknown value")
	}
}

func (cb CustomBool) MarshalJSON() ([]byte, error) {
	json, err := json.Marshal(cb.Bool)
	return json, err
}

type Target struct {
	Id     int        `json:"id"`
	Active CustomBool `json:"active"`
}

func main() {
	jsonString := `[{"id":1,"active":true},
					{"id":2,"active":"true"},
					{"id":3,"active":"1"},
					{"id":4,"active":1},
					{"id":5,"active":false},
					{"id":6,"active":"false"},
					{"id":7,"active":"0"},
					{"id":8,"active":0},
					{"id":9,"active":""}]`

	targets := []Target{}

	_ = json.Unmarshal([]byte(jsonString), &targets)

	for _, t := range targets {
		fmt.Println(t.Id, "-", t.Active.Bool)
	}

	jsonStringNew, _ := json.Marshal(targets)
	fmt.Println(string(jsonStringNew))
}

Code Execution Result:

1 - true
2 - true
3 - true
4 - true
5 - false
6 - false
7 - false
8 - false
9 - false
[{"id":1,"active":true},{"id":2,"active":true},{"id":3,"active":true},{"id":4,"active":true},{"id":5,"active":false},{"id":6,"active":false},{"id":7,"active":false},{"id":8,"active":false},{"id":9,"active":false}]

findings


Firstly. Overriding the MarshalJSON and UnmarshalJSON methods for arbitrary data types allows you to customize the serialization and deserialization of a specific json code field. In addition to the indicated use cases, these functions are used to work with nullable fields.

Secondly. The json text encoding format is a widely used tool for exchanging information, and one of its advantages over other formats is the availability of data types. Compliance with these types must be strictly monitored.

All Articles