![]() |
In a nutshell, it’s because vector<bool>‘s iterators claim to be random-access, but they’re not. Random-access iterators, when you dereference them, must return a real reference. They can only do that if the thing they point to really exists somewhere. But the bool that a vector<bool>::iterator points to does not exist anywhere. It’s actually a bit in a packed integer, and dereferencing a vector<bool>‘s iterator returns an object of some type that merely acts like a bool& without actually being a bool&.
Herb goes so far as to say this:
[…] although a proxied collection can be an important and useful tool, by definition it must violate the standard’s container requirements and therefore can never be a conforming container.At the end of his article, Herb suggests that people stop using vector<bool> and use std::bitset if they want bit-packing. But that just pushes the problem around. Why shouldn’t std::bitset be a conforming container with random-access iterators? If proxied collections are so useful, why should we content ourselves with a standard library that treats them like second-class citizens?
A Brief History of Proxy Iterators
Herb wrote his article in 1999, so we’ve been living with this problem for a long time. Many have tried to fix it and ultimately failed for one reason or another. Mostly it’s because all the solutions have tried to be backwards compatible, shoehorning a richer iterator hierarchy into a standard that doesn’t easily allow it, or else breaking iterators themselves apart into separate objects that control traversal and element access. Each time the committee has balked, preferring instead the devil it knew.
An interesting historical note: the original STL design didn’t have the “true reference” requirement that causes the problem. Take a look at the SGI docs for the Forward Iterator concept. Nowhere does it say that *it should be a real reference. The docs for Trivial Iterators specifically mention proxy references and say they’re legit.
Recently, a who’s who of C++ luminaries put their names on N3351, the so-called Palo Alto TR, which proposes a concept-based redesign of the STL, using the syntax of Concepts Lite. Interestingly, the Palo Alto TR is a throw-back to the original SGI design: there is no “true-reference” requirement on the return type of *it; it merely must be convertible to const ValueType<I> &:
- // This must work, according to the Palo Alto TR
- const ValueType<I> & val = *it;
- std::vector<bool> vb{true, false, true, false};
- auto it = vb.begin();
- const bool & val = *it;
A Panoply of Proxy Problems
To better see the problems with proxy iterators, let’s look at a more interesting example: a zip view. When you zip two sequences together, you get a single sequence where each element is a std::pair of elements from the two source sequences. This can be done lazily, creating pairs on demand as the zip view is iterated:
- std::vector<int> v1 { 1,2,3 };
- std::vector<int> v2 { 9,8,7 };
- auto z = view::zip( v1, v2 );
- auto it = z.begin();
- assert( *it == std::make_pair(1,9) );
- assert( *++it == std::make_pair(2,8) );
- assert( *++it == std::make_pair(3,7) );
- std::pair<int&,int&> p = *z.begin();
- assert( &p.first == &v1[0] );
- assert( &p.second == &v2[0] );
1. Values and References
Recall that the Palo Alto TR requires *it to be convertible to const ValueType<I>&. So we should be able to do this:
- auto z = view::zip( v1, v2 );
- const pair<int,int>& val = *z.begin();
Requiring that an iterator’s reference type be convertible to const ValueType<I>& is over-constraining. But then what useful thing can we say about the relationship between these two types?
2. Algorithm Constraints
All the algorithm signatures in the Palo Alto TR use ValueType in the concept checks in order to constrain the templates. For example, here’s the constrained signature of for_each:
- template<InputIterator I, Semiregular F>
- requires Function<F, ValueType<I>>
- F for_each(I first, I last, F f);
Now imagine code like this:
- // As before, v1 and v2 are vectors of ints:
- auto z = view::zip( v1, v2 );
- // Let Ref be the zip iterator's reference type:
- using Ref = decltype(*z.begin());
- // Use for_each to increment all the ints:
- for_each( z.begin(), z.end(), [](Ref r) {
- ++r.first;
- ++r.second;
- });
Remember the concept check: Function<F, ValueType<I>>. The function we pass to for_each must be callable with an object of the iterator’s value type. In this case, the value type is pair<int,int>. There is no conversion from that to the type the function expects, which is pair<int&,int&>. Bummer.
If we change the lambda to take a pair<int,int>&, then the concept check passes, but the template will fail to instantiate correctly. It’s easy to see why when you look at a typical for_each implementation:
- template<InputIterator I, Semiregular F>
- requires Function<F, ValueType<I>>
- F for_each(I first, I last, F f) {
- for(; first != last; ++first)
- f(*first);
- return f;
- }
The most galling bit is that the code we wrote above — the code with the lambda that takes the reference type — works just fine if we simply delete the requires Function<F, ValueType<I>> constraint. Clearly something is wrong with the constraints, the concepts, or our expectations.
I should add that the problem is not specific to the zip view. Any sequence with a proxy reference type has this problem, vector<bool> included. If we just slap these constraints on the existing algorithms, some code that works today will break, and the only “fix” would be to stop using the standard algorithms. 🙁
3. Permutability of Move-Only Types
Unfortunately, the problems don’t end there. The sort algorithm requires a sequence to be permutable; that is, you should be able to shuffle its elements around. And since it should support move-only types, that means that the sequence’s iterators should be indirectly-movable. The Palo Alto TR has this to say about it:
The IndirectlyMovable and IndirectlyCopyable concepts describe copy and move relationships between the values of an input iterator, I, and an output iterator Out. For an output iterator out and an input iterator in, their syntactic requirements expand to:
- IndirectlyMovable requires *out = move(*in)
Summary, For Now…
Even though the Palo Alto TR lifts the over-constraining requirement that ForwardIterators return real references, the proxy iterator problem remains. On the one hand, it says that proxy iterators are OK. On the other hand, some interesting proxy iterators fail to model the Iterator concept or satisfy the algorithm constraints, and those that do don’t have the right semantics or performance characteristics. What are our options?
- The zip view, vector<bool>, and its ilk are useful, but are not legitimate containers and ranges, and the STL can’t support them, full stop; or
- The iterator concepts (and probably the algorithm constraints) as specified in the Palo Alto TR need to be tweaked somehow to support proxy iterators, and some algorithm implementations probably need to change, too; or
- The language needs to change to better support proxy references (an idea from Sean Parent); or
- Something else.
Written by Eric Niebler
If you found this post interesting, follow and support us.
Suggest for you:
Introduction to Algorithms and Data structures in C++
C++ Programming from Zero to Hero: The Fundamentals
Advanced C++ Programming Training Course
C++: From Beginner to Expert
Learn Advanced C++ Programming

No comments:
Post a Comment