Maurice Kalinowski

Serialization in and with Qt

Published Thursday May 31st, 2018
17 Comments on Serialization in and with Qt
Posted in Automation, Dev Loop, Embedded, Internet of Things | Tags: , , ,

In our first part of this series, we looked at how to set up messages, combine them, and reduce their overhead in the context of telemetry sensors.

This part focuses on the payload of messages and how to optimize them.

There are multiple methods to serialize an object with Qt. In part one, we used JSON. For this, all sensor information is stored in a QJsonObject and a QJsonDocument takes care to stream values into a QByteArray.

QJsonObject jobject;

jobject["SensorID"] = m_id;

jobject["AmbientTemperature"] = m_ambientTemperature;

jobject["ObjectTemperature"] = m_objectTemperature;

jobject["AccelerometerX"] = m_accelerometerX;

jobject["AccelerometerY"] = m_accelerometerY;

jobject["AccelerometerZ"] = m_accelerometerZ;

jobject["Altitude"] = m_altitude;

jobject["Light"] = m_light;

jobject["Humidity"] = m_humidity;

QJsonDocument doc( jobject );

 

return doc.toJson();

JSON has several advantages:

  • Textual JSON is declarative, which makes it readable to humans
  • The information is structured
  • Exchanging generic information is easy
  • JSON allows extending messages with additional values
  • Many solutions exist to receive and parse JSON in cloud-based solutions

However, there are some limitations to this approach. First, creating a JSON message can be a heavy operation taking many cycles. The benchmark in part 2 of our examples repository highlights that serializing and de-serializing 10.000 messages takes around 263 ms. That might not read like a significant number per message, but in this context time equals energy. This can significantly impact a sensor which is designed to run for years without being charged.

Another aspect is that the payload for an MQTT message per sensor update is 346 bytes. Given that the sensor sends just eight doubles and one capped string, this can be a potentially huge overhead.

Inside the comments of my previous post, using QJsonDocument::Compact has been recommended, which reduces the payload size to 290 bytes in average.

So, how can we improve on this?

Remember I was referring to textual JSON before? As most of you know, there is also binary JSON, which might reduce readability, but all other aspects are still relevant. Most importantly, from our benchmarks we can see that a simple switch of doc.toJson() to doc.toBinaryData() will double the speed of the test, reducing the iteration of the benchmark to 125msecs.

Checking on the payload, the message size is now at 338 bytes, the difference is almost neglectable. However, this might change in different scenarios, for instance, if you add more strings inside a message.

Depending on the requirements and whether third-party solutions can be added to the project, other options are available.

In case the project resides “within the Qt world” and the whole flow of data is determined and not about to change, QDataStream is a viable option.

Adding support for this in the SensorInformation class requires two additional operators

QDataStream &operator<<(QDataStream &, const SensorInformation &);
QDataStream &operator>>(QDataStream &, SensorInformation &);

The implementation is straightforward as well. Below it is shown for the serialization:

QDataStream &operator<<(QDataStream &out, const SensorInformation &item)
{
    QDataStream::FloatingPointPrecision prev = out.floatingPointPrecision();
    out.setFloatingPointPrecision(QDataStream::DoublePrecision);
    out << item.m_id
        << item.m_ambientTemperature
        << item.m_objectTemperature
        << item.m_accelerometerX
        << item.m_accelerometerY
        << item.m_accelerometerZ
        << item.m_altitude
        << item.m_light
        << item.m_humidity;
    out.setFloatingPointPrecision(prev);
    return out;}

 

Consulting the benchmarks, using QDataStream resulted in only 26 msecs for this test case, which is close to 10 times faster to textual JSON. Furthermore, the average message size is only 84 bytes, compared to 290. Hence, if above limitations are acceptable, QDataStream is certainly a viable option.

If the project lets you add in further third-party components, one of the most prominent serialization solutions is Google’s Protocol Buffers (protobuf).

To add protobuf to our solution a couple of changes need to be done. First, protobuf uses an IDL to describe the structures of data or messages. The SensorInformation design is

syntax = "proto2";

package serialtest;

message Sensor {
    required string id = 1;
    required double ambientTemperature = 2;
    required double objectTemperature = 3;
    required double accelerometerX = 4;
    required double accelerometerY = 5;
    required double accelerometerZ = 6;
    required double altitude = 7;
    required double light = 8;
    required double humidity = 9;
}

To add protobuf’s code generator (protoc) to a qmake project, you must add an extra compiler step similar to this:

PROTO_FILE = sensor.proto
protoc.output = $${OUT_PWD}/${QMAKE_FILE_IN_BASE}.pb.cc
protoc.commands = $${PROTO_PATH}/bin/protoc -I=$$relative_path($${PWD}, $${OUT_PWD}) --cpp_out=. ${QMAKE_FILE_NAME}
protoc.variable_out = GENERATED_SOURCES
protoc.input = PROTO_FILE
QMAKE_EXTRA_COMPILERS += protoc

Next, to have a comparable benchmark in terms of object size, the generated struct is used as a member for a SensorInformationProto class, which inherits QObject, just like for the QDataStream and JSON example.

class SensorInformationProto : public QObject
{
    Q_OBJECT
    Q_PROPERTY(double ambientTemperature READ ambientTemperature WRITE setAmbientTemperature NOTIFY ambientTemperatureChanged)
[...]

public:
    SensorInformationProto(const std::string &pr);
[...]

     std::string serialize() const;
 [...]

private:
    serialtest::Sensor m_protoInfo;
};

The serialization function of protoInfo is generated by protoc, so the step to create the payload to be transmitted looks like this:

std::string SensorInformationProto::serialize() const
{
    std::string res;
    m_protoInfo.SerializeToString(&res);
    return res;
}

 

Note that compared to the previous solutions, protobuf uses std::string. This means you are losing capabilities of QString, unless the string is stored as a byte array (manual conversion is required). Then again, this will slow down the whole process due to parsing.

From a performance perspective, the benchmarks results look promising. The 10.000 items benchmark only takes 5 ms, with an average message size of 82 bytes.

As a summary, the following table visualizes the various approaches:

Payload Size Time(ms)
JSON (text) 346 263
JSON (binary) 338 125
QDataStream 84 26
Protobuf 82 5

 

One promising alternative is CBOR, which is currently getting implemented by Thiago Macieira for Qt 5.12. However, as development is in progress it has been too early to be included in this post. From discussions on our mailing list, results are looking promising though, with a significant performance advantage over JSON, but with all its benefits.

We have seen various approaches to serialize data into the payload of an MQTT message. Those can be done purely within Qt, or with external solutions (like protobuf). Integration of external solutions into Qt is easy.

As a final disclaimer, I would like to highlight that those benchmarks are all based on the scenario of the sensor demo. The amount of data values per message is fairly small. If those structs are bigger in size, the results might differ and different approaches might lead to the better results.

In our next installment, we will be looking at message integration with DDS. For an overview of all the articles in our automation mini-series, please check out Lars’ post.

Do you like this? Share it
Share on LinkedInGoogle+Share on FacebookTweet about this on Twitter

Posted in Automation, Dev Loop, Embedded, Internet of Things | Tags: , , ,

17 comments

Carl says:

Traveled the same path about 2-3 years ago, with the same results. Only issue is going from Protocol Buffers 2 -> 3 had some pretty big philosophical changes. To be compatible with more languages PBs now define and set all values, as opposed to leaving them undefined and therefor not stored. (larger message size)

The other nice thing about PBs though that is not discussed is the language support. So from one idl I get readers/writer to C++/Java/Python … there are other languages supported but the ones I use.

Nice article.

Giancarlo says:

Nice post!
Just one question: does toBinaryJson() method really exist? In which version? Or did you mean toBinaryData() ?
The docs say that “The binary representation is also the native format used internally in Qt” but: is it exactly BSON?

Thanks,
keep up the good work

Thiago Macieira says:

No, it is not. It’s an internal format that you should consider deprecated right now. It is limited to 128 MB size, so when we raise or lift the limit, it’s going to change.

Instead, I recommend saving as CBOR. See http://doc-snapshots.qt.io/qt5-dev/qcborvalue.html (link doesn’t work *yet*, but will soon).

frank frank says:

Your code snippets contain some markup ()

Maurice Kalinowski Maurice Kalinowski says:

Thanks for the pointer, updated.

jason says:

YAS! When this was asked on the mailing list, I was the only one that suggested Protobuf, and I’m glad someone listened. Thank you for validating my suggestion. 🙂

I recently implemented a solution to take all the QSensorReading classes and serialize them. I have no idea why this is so hard. (It should be baked in from the start) I am not suggesting this is the way it should be done, but 40kb/1300 lines of explicit coding later it works: https://gitlab.com/snippets/1719202

One problem is I either had to override all the existing reading classes, or make one reading class that worked for every sensor. The problem is, I don’t know what sensor to read next from the file, so I had to write a type out, at which time I chose to have one class to write the various reading types. I’d rather have called a function to dynamically register a new type with the datastream, (I won’t know my id) then have the stream request serialization, or I provide a list of Q_PROPERY names at registration time. I then provide a constructor of Class::Class(const QVariantMap &properties) (because not all properties may need to be serialized). This has several advantages, not the least of which is I don’t need to mess around with type-id consistency. qDataStreamRegisterType(“MyType”, QStringList {“prop1”, “prop2”, “versionId”} );

In other related aspects:
Recently, I wanted a dead-simple application database. 3 Requirements:
1. Be able to write the database, containing anything within QVariant capability. ( dataStream <> database )
3. Be able to modify the database in memory. ( database[key] = value )

A QVariantMap is a good choice. I can trivially serialize this, but there is an issue with updating values because there is no T& QMap::operator[] or value() which requires back-patching the modified container into and up the hierarchy because what you get is a copy that you modify. (Maps within maps). So close yet again.

Finally, with the announcement of Optane DDR4 memory, we have a paradigm shift (I don’t use that term lightly) with how data is stored. Relational databases are based around minimizing writes, and to a lesser extent minimizing data duplication. If we can get a persistent composite object memory going, there are huge advances possible in computing and Qt. Applications no longer need to “load” a database. You get a memory offset to the root of the database and go from there. Loads are QVariantMap *database = optane_object_offset(“my db name”); Saves happen when you change the value, and are therefore a no-op. (But we’re missing the function to return a non-const reference!) The only difference with non-optane save will have to read and save a file.

Flavio says:

I guess toBinaryJson() should be toBinaryData()

Maurice Kalinowski Maurice Kalinowski says:

You are perfectly right, I will update the post. Also, check the source at
https://github.com/mauricek/qt_iot_blog_samples/blob/master/part2/serialization/sensorinformation.cpp#L188

Miroslav Krajicek says:

I would like to see protobuf vs. MessagePack comparison. Recently, we have used MQTT with MessagePack for autonomous car simulation. JSON/BSON was not acceptable because base64 is used for binary data – it was just adding latency to sensors which send large amount of binary data per message.

Thiago Macieira says:

Hi Maurice

Nice blog post. I can confirm the trend of your results, even if the numbers are actually off. Your benchmark code was benchmarking the QVector used to store data, which is a bit unfair.

The biggest problem is that you’re not comparing apples to apples. Your QDataStream example is not extensible, like all the other protocols are. That is, if one of the two endpoints is running a different version of the software and has, for example, an extra sensor information, the sender and receiver would be out of sync (I don’t know anything about Protobuf, so I can’t say whether it has the same issue). After I “fix” it by wrapping the data in a QMap, the average message size goes up to 169 bytes.

Here are my numbers:

* QDataStream: 118.9 ms @ 169 bytes
* JSON (text): 218.8 ms @ 290.12 bytes
* JSON (binary): 82.62 ms @ 298.832 bytes
* CBOR (no transformation): 62.4 ms @ 91 bytes
* CBOR (using qfloat16): 63.06 ms @ 89.65 bytes
* CBOR (using integers and qfloat16): 65.87 ms @ 89.28 bytes
* Protobuf: 8.81 ms @ 82 bytes

Another aspect of apples-to-apples is that QJsonDocument and QCborValue are DOM objects, but you are using them only as serialisation. By analysing the benchmark, we see that there’s a lot of time spent in creating the QJsonObject as well as QCborMap. So I wrote an extra benchmark that uses only the low-level stream reader/writer. The results were:

* CBOR streaming: 41.93 ms @ 92 bytes

Maurice Kalinowski Maurice Kalinowski says:

Hey Thiago,

You are perfectly right with everything you wrote. I also hope that I made it clear, that the QDataStream usage in this post is only valid if:
– All endpoints are using Qt (and also the very same version as you highlighted)
– Only basic types are used
The item on extensibility is yet another criterion which can become very important if you consider long lifetime of a system including upgrades.

There are so many items one needs to consider during the design phase of a product and with this blog series we aim to highlight a couple of discussions you need to have to get a reliable and performant product. Qt can help in many situations and sometimes its beneficial to incorporate other technologies with Qt. The usage of protobuf is also just one possibility and does not come out as the holy grail of serialization solutions. However, each product is different and each new project might have other outcomes due to its requirements. That might be protocols, serialization efforts, but also then transport layers etc.

Jean-Michaël Celerier says:

By looking at your benchmarks… why should I use CBOR and not protobuf directly ? It sounds like it’s almost ten times faster. And it has a lot more industry backing than CBOR, as well as community backing :

https://github.com/search?utf8=%E2%9C%93&q=cbor

https://github.com/search?utf8=%E2%9C%93&q=protobuf

Thiago Macieira says:

I think you’re still not comparing apples to apples. Even the faster QCborStreamWriter and Reader are Qt classes, whereas PB isn’t. So there’s a chance that the performance loss is caused by the Qt wrappers themselves.

There are good reasons to believe this. First, we know QIODevice is slow. Using it to write to a QBuffer means it will malloc() at least once, possibly more than once.

Second, I have anecdotal evidence of *Google* choosing TinyCBOR over Protobuf for a W3C protocol and citing performance as one of the reasons. See
https://www.w3.org/2018/05/17-webscreens-minutes.html
https://www.w3.org/2018/05/18-webscreens-minutes.html

And aside from the performance choice, you may want to use CBOR especially because it is an IETF-backed protocol with an RFC, whereas PB isn’t. CBOR has multiple implementations, from tiny constrained devices to bigger ones. This is important in the case Maurice is talking about: IoT. Some of the devices creating the data or consuming small payloads could be constrained ones, running RTOS. I have found one constrained implementation for PB (https://github.com/nanopb/nanopb) but investigating whether it suffices for the uses in questions is left as an exercise to the reader.

Andrew says:

This might be a silly question:

If protobuf is faster than serializing into JSON/BSON and even the upcoming CBOR features, why not just add native Qt support for protobuff?

For example classes that can turn a QObject or QGadget into protobuff compatible objects.

If protobuf is the best solution to this problem, why not fully embrace it instead of adding support for a lesser alternative?

Andrew says:

Actually, thinking about this even more….

Qt already has a mechanism for serializing objects and sending them across the wire:
https://doc.qt.io/qt-5.11/qtremoteobjects-gettingstarted.html

Last time I played with this, i remember thinking how similar the concepts were to protobuf.

You define a model or object, write a definition file, tell the compiler about it so it can auto generate boilerplate serization code, beep boop Qt magic, everything just works.

There was even a way to have the definition file automatically generated based on the objects Q_PROPETY’s.

Again, wouldn’t it just make more sense to fully adopt and support protobuff natively?

Maurice Kalinowski Maurice Kalinowski says:

Hi,

you bring up some nice ideas. Unfortunately, they will not work for every scenario. Like a QDataStream approach, Qt RemoteObjects only work if your solution is done with Qt on all ends and no interoperability with other technologies or languages are required.
I cannot state that protobuf is “the best solution”, as there are more use-cases than the one described in this post. Thiago pointed out some other scenarios as well.
Furthermore, protobuf is not based on an open standard. That is a crucial argument for many industrial companies, to not rely on one single provider.
And lastly, Thiago also mentioned the point about extensibility. With protobuf you define your datatype and get a very effective and performant serialization. But when you need to update your data design, you need to deploy this updated design to all devices/endpoints. Not so with for instance CBOR, which has an in-built mechanism for extensibility. You would not receive any updated values, but neither would you require to do case comparisons on receival of each data set.

So, in this specific case I would vote for using protobuf (or potentially other solutions, which I haven’t checked yet). But you have to be very careful to create a general statement out of one example.

Commenting closed.

Get started today with Qt Download now