QStringIterator Class
The QStringIterator class provides a Unicode-aware iterator over QString. More...
Header: | #include <QStringIterator> |
Detailed Description
\since
5.3 \inmodule
QtCore \ingroup
tools
\internal
\reentrant
QStringIterator is a Java-like, bidirectional, const iterator over the contents of a QString. Unlike QString's own iterators, which manage the individual UTF-16 code units, QStringIterator is Unicode-aware: it will transparently handle the surrogate pairs that may be present in a QString, and return the individual Unicode code points.
You can create a QStringIterator that iterates over a given QStringView by passing the string to the QStringIterator's constructor:
QString string(QStringLiteral("a string")); QStringIterator i(string); // implicitly converted to QStringView
A newly created QStringIterator will point before the first position in the string. It is possible to check whether the iterator can be advanced by calling hasNext(), and actually advance it (and obtain the next code point) by calling next():
while (i.hasNext()) char32_t c = i.next();
Similarly, the hasPrevious() and previous() functions can be used to iterate backwards.
The peekNext() and peekPrevious() functions will return the code point respectively after and behind the iterator's current position, but unlike next() and previous() they will not move the iterator. Similarly, the advance() and recede() functions will move the iterator respectively after and behind the iterator's current position, but they will not return the code point the iterator has moved through.
Unicode Handling
QString and all of its functions work in terms of UTF-16 code units. Unicode code points that fall outside the Basic Multilingual Plane (U+10000 to U+10FFFF) will therefore be represented by surrogate pairs in a QString, that is, a sequence of two UTF-16 code units that encode a single code point.
QStringIterator will automatically handle surrogate pairs inside a QString, and return the correctly decoded code point, while also moving the iterator by the right amount of code units to match the decoded code points.
For instance:
QStringIterator i(u"𝄞 is the G clef"); qDebug() << Qt::hex << i.next(); // will print '𝄞' (U+1D11E, MUSICAL SYMBOL G CLEF) qDebug() << Qt::hex << i.next(); // will print ' ' (U+0020, SPACE) qDebug() << Qt::hex << i.next(); // will print 'i' (U+0069, LATIN SMALL LETTER I)
If the iterator is not able to decode the next code point (or the previous one, when iterating backwards), then it will return 0xFFFD
, that is, Unicode's replacement character (see QChar::ReplacementCharacter). It is possible to make QStringIterator return another value when it encounters a decoding problem; please refer to the each function documentation for more details.
Unchecked Iteration
It is possible to optimize iterating over a QString contents by skipping some checks. This is in general not safe to do, because a QString is allowed to contain malformed UTF-16 data; however, if we can trust a given QString, then we can use the optimized unchecked functions.
QStringIterator provides the unchecked counterparts for next(), peekNext(), advance(), previous(), peekPrevious(), and recede(): they're called, respectively, nextUnchecked(), peekNextUnchecked(), advanceUnchecked(), previousUnchecked(), peekPreviousUnchecked(), recedeUnchecked(). The counterparts work exactly like the original ones, but they're faster as they're allowed to make certain assumptions about the string contents.
Note: please be extremely careful when using QStringIterator's unchecked functions, as using them on a string containing malformed data leads to undefined behavior.