zip()
Should Be Ill-formed| Document #: | DXXXXR0 |
| Date: | 2025-05-31 |
| Project: | Programming Language C++ |
| Audience: |
SG9, LEWG |
| Reply-to: |
Hui Xie <hui.xie1990@gmail.com> Tristan Melen <<>> |
This paper proposes a fix that makes the views::zip()
ill-formed instead of the current views::empty<tuple<>>.
The authors believe that the only reasonable result of
zip() is an
infinite range views::repeat(tuple<>()).
However, due to infinite ranges sometimes cause confusions to some
users, we believe it should be ill-formed. The current views::empty<tuple<>>
is mathematically incorrect.
zip()
DesignAccording to [P2321R2], the reason for the current
behaviour of
zip()
was:
As in range-v3, zipping nothing produces an
empty_viewof the appropriate type.
Searching through the historical commits in range-v3, the behaviour
of zip() was
added in this commit [range-v3].
In this commit, both nullary cartesian_product()
and zip()
were added, both yield an empty range, and both are incorrect. And the
rationale for this commit was
This allows us to remove the nasty MSVC special case for empty
cartesian_products.
In conclusion, this behaviour did not seem to be a result of careful design, but just chosen to work around a “nasty MSVC special case”.
The discussion on the [reflector] indicated that cartesian_product()
should be single(tuple())
and zip()
should be repeat(tuple()).
Luckily, [P2540R1] fixed
cartesian_product. So it no longer
follows the “range-v3” behaviour but the correct one views::single(tuple()).
However, the paper intentionally left
zip’s behaviour unchanged, after
.
The main reason behind this decision was
In particular,
ziphas the property that it is the inner join of the indexed sets, and is the main diagonal of the Cartesian product. However, the identity element forzipisrepeat(tuple<>), the infinite range of repeated empty tuples. If we allowed zip of an empty range of ranges to be its identity element, we would be introducing an inconsistency into the system, where two different formulations of notionally the same thing produces different answers.
However, the authors believe that the above conclusion is flawed.
TODO: explain the main diagonal and identity yield same results
Julia’s
zip()
produces an infinite range of empty tuples, which is in line with the
authors’ view in mathematically sense.
Haskell’s ZipList’s
pure implementation is [haskell_source]
pure x = ZipList (repeat x)This is also what authors expected.
Haskell documentation [haskell_doc] clearly stated that
The only way to ensure
zipWith ($) fsnever removes elements is makingfsinfinite.
Rust’s compiler actively rejects
multizip or
izip! calls
with no argument. This is much better than giving the wrong results.
These languages either only supports a zip with two ranges, or provide zip3, zip4, … functions for more ranges, possibly due to lack of support for variadic arguments. However, they don’t provide a zip0, for a good reason.
zip() is
an empty iterable.
Before Python 2.4,
zip() raises
an TypeError. And Python 2.4 changed
its behaviour to return an empty list. The rational of the change can be
found here [pep0201]. The motivating example is
date, rain, high, low = zip(*csv.reader(file("weather.csv")))
print "Total rainfall", sum(rain)When the csv file has no data, instead of raising an exception, it is useful to have the above code just work.
The only problem of this motivating example is that, it does not work.
Traceback (most recent call last):
File "<python-input-17>", line 2, in <module>
date, rain, high, low = zip(*csv.reader(file))
^^^^^^^^^^^^^^^^^^^^^
ValueError: not enough values to unpack (expected 4, got 0)The
csv.reader
reads the file into list of rows, where each row is a list of elements:
one for each column.
*rows unpack
the rows to the arguments to zip,
then date, rain, high, low = zip(*rows)
unpacks the zipped iterable’s elements, so effectively
date,
rain,
high, and
low all represents a column. The
user confidently wrote this code because they know that the csv file has
4 columns.
Now the motivation of changing
zip() from
TypeError to
[] is that
“it would be nice it still works if the csv file has no records”. The
problem is that the above code assumes that the len(zip(*rows))
is 4 so it unpacks to 4 variables. But
zip(*[])
yields zero element thus the program raises a
ValueError.
This example uses zip(*s)
to transpose a matrix. When the input, say, is [[1,2], [3,4], [5,6]],
it is clear that it is a 3 x 2
matrix and the transpose result is a 2 x 3
matrix. And when the input is
*[], there
is no way to know if this is a 0 x 1,
or 0 x 2,
or 0 x 3
… matrix. zip makes an assumption
that this is a 0 x 0
matrix, but unfortunately in this case, the user wants a 0 x 4
matrix.
This motivating example is not only “not working”, but also a
justification that the cardinality of
zip() cannot
be known.
The author has contacted the creator of Python, Guido van Rossum,
about the design decision of
zip() and
here is the reply from him about the design decision:
To be honest, I don’t recall how the decision around this detail went. I assume it was literally that people showed me the specific example given in the PEP and argued that it should work that way, and I found I didn’t feel like arguing against the treatment of such a minor edge case.
The edge case actually reminds me of the debate about the meaning of 0**0 (see Wikipedia). I think the same reasoning may apply here? Python chose the more useful outcome, for some applications.
In any case, if you want to change C++’s zip to make this an error, you have my blessing.
The author has also contacted Barry Warsaw, the author of [pep0201], who introduced
zip to Python 2.0. And here is his
reply:
I don’t remember the discussion leading to the change either, but the Zen of Python does favor practicality over purity. Guido’s recollection is likely right; folks had some real-world experience where it was more practical to return an empty sequence rather than have to write code to handle an exception.
For example:
>>> def ab(*seq): ... for a, b in zip(*seq): ... print(a, b) ... >>> ab([1, 2, 3], [4, 5, 6]) 1 4 2 5 3 6 >>> ab() >>>Totally contrived of course, but it is nice at least here not to have to handle an exception. Defining that as repeat(()) would also be much less useful as it would infinite loop, so you’d probably have to add some value checking before the zip() call to avoid that.
The authors of this C++ paper agree with the “Zen of Python”,
however, this example seems a bit contrived. The function
ab accepts either exactly two lists,
or exactly zero list. It still needs to handle the
ValueError if the user passes any
other number of lists, for example one list. See this truncated
error:
>>> ab([1,2])
ValueError: not enough values to unpack (expected 2, got 1)And this very example would not work in C++’s
zip, because C++ is a statically
typed language, the line
for (auto [a, b] : zip(rng...))will fail to compile as we cannot bind [a, b]
to a 0-tuple, even we define zip as
an empty range.
Majority of the languages do not support null-ary
zip. Julia makes it an infinite
range, which is more mathematically correct. Python makes it an empty
range.
There were discussions on the reflector and some of them are
defending the current design views::empty<std::tuple<>>.
And I will go through all of them I have seen on the reflector
- zip(empty_range, empty_range, empty_range) is an empty range
- zip(empty_range, empty_range) is an empty range
- zip(empty_range) is an empty range
- zip() is an empty range
Consistency.
This example only makes sense if the input is “empty_range”. Any other ranges will break this “consistency”. For example
Inconsistency.
This is actually a great example to prove that there is no way to have consistency in terms of the size of the result range.
The compelling use case for zip() empty range is a fold over it:
template <typename... Rng> constexpr int inner_product(Rng&&... rng) { int result = 0; for (auto tpl : std::views::zip(rng...)) std::apply([&](auto&&... x) { result += (x * ... * 1); }, tpl); return result; }
This example function implementation is actually incorrect mathematically. First of all, “inner product” space is \(\Re^n \times \Re^n \rightarrow \Re\). The function seems to generalise it into \(\Re^n \times ... \times \Re^n \rightarrow \Re\). There is an important aspect: all the inputs are in the same space \(R^n\).
There are two possible mathematical definition of this function
Generalised inner product, with a precondition that all input
ranges must be of the same length
n.
With this definition, let’s assume we are in \(R^2\) space, and each vector has exactly
two elements (x, y).
The result of inner_product(r0, r1, r2, ...)
is
(x0 * x1 * x2 * ... * 1) + (y1 * y1 * y2 * ... * 1)
and when the input is null-ary, the result is 1 + 1 = 2.
We can see that the function should return
2 instead of
0 when the
problem space is \(R^2\). Similarly,
the result of this function in \(R^n\)
space should be n. The result of
this function depends on the space we are in, i.e, the
n in the precondition. We cannot
infer n with null-ary inputs.
Therefore, there isn’t any meaningful definition for this function with
null-ary inputs.
Generalised inner product of the first
n elements in a vector, where
n is the minimum length of all of
the vectors.
With this definition, the generalised inner product is \(\sum_{i=0}^{minlength(rng...)} (x_{i,1} * x_{i,2}
* ... * 1)\). When the input is null-ary, one way to define minlength(rng...)
is to use its identity, and since the identity of
min is infinity, the result should
be 1 + 1 + 1 + ...,
which is \(\infty\) instead of
0. Or you
can say the min does not make sense
with no inputs and there is no definition for this function when there
is no inputs. In fact, std::min(initializer_list<T>)
follows this pattern. If the initializer_list.size()
is 0, the
behaviour is undefined.
This “compelling use case of
zip()”
simply does not return a mathematically correct answer, no matter how we
interpret its intent.
An infinite range is clearly mathematically the correct identity element. But that’s entirely separate from the question of whether it’s the correct programming solution. I have yet to see a compelling argument for why looping over
zip(rs...)should degenerate into an infinite loop when an empty range is a perfectly reasonable, useful, and practical solution. Have people run into issues with either Python or C++ not providing an infinite range here? I would like to see concrete evidence thereof.
I think to answer “Have people run into issues with either Python or
C++ not providing an infinite range here?”, we should also
answer “Have people used zip with no
arguments” first. The author of this paper is no longer pursuing the
“infinite range” option.
The authors propose to change
zip() to be
ill-formed.
Modify 25.7.25.1 [range.zip.overview] section 2 as
The name views::zip denotes a customization point object ([customization.point.object]). Given a pack of subexpressions Es…, the expression views::zip(Es…) is expression-equivalent to
auto(views::empty<tuple<>>)
if Es is an empty
pack,zip_view<views::all_t<decltype((Es))>...>(Es...).Bump the FTM
__cpp_lib_ranges_zip
pure
in ZipList.