String concatenation with QStringBuilder

QString and QByteArray comes with very handy operator+ which allows you to write stuff like this:


QString directory = /*...*/, name = /*...*/;
QString dataFile = directory + QLatin1Char('/') + name + QLatin1String(".dat");

Very convenient.
The QLatin1Char and QLatin1String are just there for correctness, you could omit those while writing your application.

We have something very convenient, but what about the performance of that kind of expression?
Each operator+ will create a temporary string that is then discarded, that means, many allocations and copies.
It would be much faster to do something like that.


QString dataFile = directory;
dataFile.reserve(directory.size() + 1 + name.size() + 4);
dataFile += QLatin1Char('/');
dataFile += name;
datafile += QLatin1String(".dat");

Only one allocation and one copy, this is the optimum. But it unfortunately does not look as good.
What if the first expression could be as fast as the above? The good news is that it is possible.

In Qt 4.6 we introduced a kind of hidden class in Qt: QStringBuilder
And it was improved in Qt 4.8 by adding support for QByteArray as well.

Because it is source incompatible (see below), you need to explicitly enable it.
The way to enable it with Qt 4.7 is described in the 4.7 QString documentation
But this method is now deprecated, and in Qt 4.8, the macro has been replaced with the new QT_USE_QSTRINGBUILDER macro. You need to use that new macro to benefit from the QByteArray changes.

In order to make it work, we use a technique called Expression template.
We changed some of the operator+ that takes strings to return a special template class that will lazily compute the results.

For instance, with QT_USE_QSTRINGBUILDER defined,
string1 + string2 would be of the type QStringBuilder<QString, QString> that is implicitly casted to QString.

It is not source compatible because you might have code that assumes that the result of the operator+ is of the type QString.


QVariant v = someString + someOtherString;
QString s = (someString + someOtherString).toUpper();

The solution is to explicitly cast to QString:


QVariant v = QString(someString + someOtherString);
QString s = QString(someString + someOtherString).toUpper();

QT_USE_QSTRINGBUILDER is already enabled when compiling Qt itself and creator.
Some of the commits that fixes the sources compatibility problems are: 5d3eb7a1 for the previous version that did not support QByteArray yet, and 7101a3fa which was required to add support for QByteArray in Qt 4.8

Technical details

Because I think the implementation shows many nice template features and I thought it would be fun to explain a bit of the implementation details of this class in this article. It is highly technical, and absolutely not required to be understood to use it.

Everything is in qtringbuilder.h, but the snippets pasted in this article may be slightly simplified to ease the understanding.

Let's start by looking at the implementation of the operator+:


template <class A, class B>
QStringBuilder<typename QConcatenable<A>::type, typename QConcatenable<B>::type>
operator+(const A &a, const B &b)
{
   return QStringBuilder<typename QConcatenable<A>::type,
                         typename QConcatenable<B>::type>(a, b);
}

This operator uses SFINAE to enable itself only for types that support concatenation to strings. Indeed, QConcatenable is an internal template class that is specialized only for: QString, QLatin1String, QChar, QStringRef, QCharRef, and also QByteArray and char*.
QConcatenable<T>::type is a typedef to the type T only for the specialized types.
Since, for example, QConcatenable<QVariant>::type does not exist, that operator+ is not enabled if used with QVariant.

The operator+(a,b) simply returns QStringBuilder<A, B>(a, b);.
The result of something like string1 + string2 + string3 would be of type QStringBuilder< QStringBuilder <QString, QString> , QString>

Now we can have a look at this QStringBuilder class


template <typename A, typename B>
class QStringBuilder
{
public:
    const A &a;
    const B &b;

QStringBuilder(const A &a_, const B &b_) : a(a_), b(b_) {}

template <typename T> T convertTo() const;

typedef typename QConcatenable<QStringBuilder<A, B> > ::ConvertTo ConvertTo; operator ConvertTo() const { return convertTo<ConvertTo>(); } };

The ConvertTo typepef is computed to QByteArray or QString, depending on type A and B, we will see later how it is done. So the QStringBuilder class just keeps a reference to its operands.

When QStringBuilder is implicitly converted to QString or QByteArray, the convertTo() function is called:


template <typename A, typename B> template<typename T>
inline T QStringBuilder<A, B>::convertTo()
{
    const uint len = QConcatenable< QStringBuilder<A, B> >::size(*this);
    T s(len, Qt::Uninitialized);
    typename T::iterator d = s.data();
    QConcatenable< QStringBuilder<A, B> >::appendTo(*this, d);
    return s;
}

That function creates an uninitialized QString or QByteArray container of the appropriate size and the individual characters are copied over into that.
The actual copying is delegated to QConcatenable< QStringBuilder<A, B> >::appendTo
The partial template specialization of QConcatenable for QStringBuilder<A, B> is the one that combines the results of the individual pieces. If there are many operator+ in the same line, the A is another QStringBuilder type.


template <class A, class B>
struct QConcatenable< QStringBuilder<A, B> >
{
    typedef QStringBuilder<A, B> type;
    typedef typename QtStringBuilder::ConvertToTypeHelper<
        typename QConcatenable<A>::ConvertTo,
        typename QConcatenable<B>::ConvertTo>::ConvertTo ConvertTo;
    static int size(const type &p)
    {
        return QConcatenable<A>::size(p.a)
            + QConcatenable<B>::size(p.b);
    }
    template<typename T> static inline void appendTo(
        const type &p, T *&out)
    {
        QConcatenable<A>::appendTo(p.a, out);
        QConcatenable<B>::appendTo(p.b, out);
    }
};

the QConcatenable::appendTo function is responsible for copying the string to the final buffer.

For example, here is how QConcatenable looks like for QString


template <> struct QConcatenable<QString>
{
    typedef QString type;
    typedef QString ConvertTo;
    static int size(const QString &a) { return a.size(); }
    static inline void appendTo(const QString &a, QChar *&out)
    {
        const int n = a.size();
        memcpy(out, reinterpret_cast<const char*>(a.constData()),
            sizeof(QChar) * n);
        out += n;
    }
};

How do we know if we need to convert to QString or to QByteArray? Let us try to understand how the ConvertTo type is determined:


namespace QtStringBuilder {
    template <typename C, typename D> struct ConvertToTypeHelper
    { typedef C ConvertTo; };
    template <typename T> struct ConvertToTypeHelper<T, QString>
    { typedef QString ConvertTo; };
}

ConvertToTypeHelper is used to compute QConcatenable< QStringBuilder<A, B> >::ConvertTo. It is a template computation. It could be seen as a function that takes two type argument (C and D) and returns another type in its typedef ConvertToTypeHelper::ConvertTo.
ConvertTo is by default always the first type. But if the second type is QString, the partial template specialization will be used, and QString will be "returned".
In practice that means that if any of the types is QString, QString will be returned.

The specialization of QConcatenable for the unicode aware types (QString, QLatin1String, QChar, ...) has QString for ConvertTo while the other 8-bit characters based type have the QByteArray as the ConvertTo typedef

Now let us see the specialization for QByteArray:


template <> struct QConcatenable<QByteArray> : private QAbstractConcatenable
{
    typedef QByteArray type;
    typedef QByteArray ConvertTo;
    static int size(const QByteArray &ba) { return ba.size(); }
#ifndef QT_NO_CAST_FROM_ASCII
    static inline void appendTo(const QByteArray &ba, QChar *&out)
    {
        QAbstractConcatenable::convertFromAscii(ba.constData(),
                                                ba.size(), out);
    }
#endif
    static inline void appendTo(const QByteArray &ba, char *&out)
    {
        const char *a = ba.constData();
        const char * const end = ba.end();
        while (a != end)
            *out++ = *a++;
    }
};

Same as for QString, but Qt lets you implicitly convert QByteArray to QString, this is why there is an overload that converts from ASCII to unicode. That can be disabled by defining QT_NO_CAST_FROM_ASCII. It is good practice in library code to only have explicit conversion (via QLatin1String) as you do not know which codec the application developer is going to use for its code.

Conclusion

I skipped some of the details, such as the one for supporting the fact that some codecs such as UTF-8 might have a different size (look for ExactSize in the code).

I hope you liked this description.
Let us know in the comments if there are other parts of Qt you would like to see explained.

(By the way, if you have heard of QLatin1Literal, don't bother using it. The compilers have built in strlen that is computed at compile time for string literals)


Blog Topics:

Comments