zip() Should Be Ill-formed

Document #: DXXXXR0
Date: 2025-05-31
Project: Programming Language C++
Audience: SG9, LEWG
Reply-to: Hui Xie
<>
Tristan Melen
<<>>

1 Revision History

1.1 R0

2 Abstract

This paper proposes a fix that makes the views::zip() ill-formed instead of the current views::empty<tuple<>>.

3 Design Considerations

The authors believe that the only reasonable result of zip() is an infinite range views::repeat(tuple<>()). However, due to infinite ranges sometimes cause confusions to some users, we believe it should be ill-formed. The current views::empty<tuple<>> is mathematically incorrect.

3.1 Rationale for the Current zip() Design

3.1.1 P2321R2

According to [P2321R2], the reason for the current behaviour of zip() was:

As in range-v3, zipping nothing produces an empty_view of the appropriate type.

Searching through the historical commits in range-v3, the behaviour of zip() was added in this commit [range-v3].

In this commit, both nullary cartesian_product() and zip() were added, both yield an empty range, and both are incorrect. And the rationale for this commit was

This allows us to remove the nasty MSVC special case for empty cartesian_products.

In conclusion, this behaviour did not seem to be a result of careful design, but just chosen to work around a “nasty MSVC special case”.

3.1.2 P2540R1

The discussion on the [reflector] indicated that cartesian_product() should be single(tuple()) and zip() should be repeat(tuple()).

Luckily, [P2540R1] fixed cartesian_product. So it no longer follows the “range-v3” behaviour but the correct one views::single(tuple()).

However, the paper intentionally left zip’s behaviour unchanged, after .

The main reason behind this decision was

In particular, zip has the property that it is the inner join of the indexed sets, and is the main diagonal of the Cartesian product. However, the identity element for zip is repeat(tuple<>), the infinite range of repeated empty tuples. If we allowed zip of an empty range of ranges to be its identity element, we would be introducing an inconsistency into the system, where two different formulations of notionally the same thing produces different answers.

However, the authors believe that the above conclusion is flawed.

TODO: explain the main diagonal and identity yield same results

3.2 TODO

3.3 Other Languages

3.3.1 Julia

Julia’s zip() produces an infinite range of empty tuples, which is in line with the authors’ view in mathematically sense.

3.3.2 Haskell

Haskell’s ZipList’s pure implementation is [haskell_source]

pure x = ZipList (repeat x)

This is also what authors expected.

Haskell documentation [haskell_doc] clearly stated that

The only way to ensure zipWith ($) fs never removes elements is making fs infinite.

3.3.3 Rust

Rust’s compiler actively rejects multizip or izip! calls with no argument. This is much better than giving the wrong results.

3.3.4 Java, C#

These languages either only supports a zip with two ranges, or provide zip3, zip4, … functions for more ranges, possibly due to lack of support for variadic arguments. However, they don’t provide a zip0, for a good reason.

3.3.5 Python

zip() is an empty iterable.

Before Python 2.4, zip() raises an TypeError. And Python 2.4 changed its behaviour to return an empty list. The rational of the change can be found here [pep0201]. The motivating example is

date, rain, high, low = zip(*csv.reader(file("weather.csv")))
print "Total rainfall", sum(rain)

When the csv file has no data, instead of raising an exception, it is useful to have the above code just work.

The only problem of this motivating example is that, it does not work.

Traceback (most recent call last):
  File "<python-input-17>", line 2, in <module>
    date, rain, high, low = zip(*csv.reader(file))
    ^^^^^^^^^^^^^^^^^^^^^
ValueError: not enough values to unpack (expected 4, got 0)

The csv.reader reads the file into list of rows, where each row is a list of elements: one for each column. *rows unpack the rows to the arguments to zip, then date, rain, high, low = zip(*rows) unpacks the zipped iterable’s elements, so effectively date, rain, high, and low all represents a column. The user confidently wrote this code because they know that the csv file has 4 columns.

Now the motivation of changing zip() from TypeError to [] is that “it would be nice it still works if the csv file has no records”. The problem is that the above code assumes that the len(zip(*rows)) is 4 so it unpacks to 4 variables. But zip(*[]) yields zero element thus the program raises a ValueError.

This example uses zip(*s) to transpose a matrix. When the input, say, is [[1,2], [3,4], [5,6]], it is clear that it is a 3 x 2 matrix and the transpose result is a 2 x 3 matrix. And when the input is *[], there is no way to know if this is a 0 x 1, or 0 x 2, or 0 x 3 … matrix. zip makes an assumption that this is a 0 x 0 matrix, but unfortunately in this case, the user wants a 0 x 4 matrix.

This motivating example is not only “not working”, but also a justification that the cardinality of zip() cannot be known.

3.3.5.1 Comments from Guido van Rossum (BDFL)

The author has contacted the creator of Python, Guido van Rossum, about the design decision of zip() and here is the reply from him about the design decision:

To be honest, I don’t recall how the decision around this detail went. I assume it was literally that people showed me the specific example given in the PEP and argued that it should work that way, and I found I didn’t feel like arguing against the treatment of such a minor edge case.

The edge case actually reminds me of the debate about the meaning of 0**0 (see Wikipedia). I think the same reasoning may apply here? Python chose the more useful outcome, for some applications.

In any case, if you want to change C++’s zip to make this an error, you have my blessing.

3.3.5.2 Comments from Barry Warsaw

The author has also contacted Barry Warsaw, the author of [pep0201], who introduced zip to Python 2.0. And here is his reply:

I don’t remember the discussion leading to the change either, but the Zen of Python does favor practicality over purity. Guido’s recollection is likely right; folks had some real-world experience where it was more practical to return an empty sequence rather than have to write code to handle an exception.

For example:

>>> def ab(*seq):
...     for a, b in zip(*seq):
...         print(a, b)
...
>>> ab([1, 2, 3], [4, 5, 6])
1 4
2 5
3 6
>>> ab()
>>>

Totally contrived of course, but it is nice at least here not to have to handle an exception. Defining that as repeat(()) would also be much less useful as it would infinite loop, so you’d probably have to add some value checking before the zip() call to avoid that.

The authors of this C++ paper agree with the “Zen of Python”, however, this example seems a bit contrived. The function ab accepts either exactly two lists, or exactly zero list. It still needs to handle the ValueError if the user passes any other number of lists, for example one list. See this truncated error:

>>> ab([1,2])
ValueError: not enough values to unpack (expected 2, got 1)

And this very example would not work in C++’s zip, because C++ is a statically typed language, the line

for (auto [a, b] : zip(rng...))

will fail to compile as we cannot bind [a, b] to a 0-tuple, even we define zip as an empty range.

3.3.6 Conclusion of other languages

Majority of the languages do not support null-ary zip. Julia makes it an infinite range, which is more mathematically correct. Python makes it an empty range.

3.4 Counter Arguments

There were discussions on the reflector and some of them are defending the current design views::empty<std::tuple<>>. And I will go through all of them I have seen on the reflector

3.4.1 Consistency

  • zip(empty_range, empty_range, empty_range) is an empty range
  • zip(empty_range, empty_range) is an empty range
  • zip(empty_range) is an empty range
  • zip() is an empty range

Consistency.

This example only makes sense if the input is “empty_range”. Any other ranges will break this “consistency”. For example

Inconsistency.

This is actually a great example to prove that there is no way to have consistency in terms of the size of the result range.

3.4.2 Inner product

The compelling use case for zip() empty range is a fold over it:

template <typename... Rng>
constexpr int inner_product(Rng&&... rng)
{
    int result = 0;
    for (auto tpl : std::views::zip(rng...))
        std::apply([&](auto&&... x) {
            result += (x * ... * 1);
        }, tpl);
    return result;
}

This example function implementation is actually incorrect mathematically. First of all, “inner product” space is \(\Re^n \times \Re^n \rightarrow \Re\). The function seems to generalise it into \(\Re^n \times ... \times \Re^n \rightarrow \Re\). There is an important aspect: all the inputs are in the same space \(R^n\).

There are two possible mathematical definition of this function

  1. Generalised inner product, with a precondition that all input ranges must be of the same length n.

    With this definition, let’s assume we are in \(R^2\) space, and each vector has exactly two elements (x, y). The result of inner_product(r0, r1, r2, ...) is

    (x0 * x1 * x2 * ... * 1) + (y1 * y1 * y2 * ... * 1)

    and when the input is null-ary, the result is 1 + 1 = 2. We can see that the function should return 2 instead of 0 when the problem space is \(R^2\). Similarly, the result of this function in \(R^n\) space should be n. The result of this function depends on the space we are in, i.e, the n in the precondition. We cannot infer n with null-ary inputs. Therefore, there isn’t any meaningful definition for this function with null-ary inputs.

  2. Generalised inner product of the first n elements in a vector, where n is the minimum length of all of the vectors.

    With this definition, the generalised inner product is \(\sum_{i=0}^{minlength(rng...)} (x_{i,1} * x_{i,2} * ... * 1)\). When the input is null-ary, one way to define minlength(rng...) is to use its identity, and since the identity of min is infinity, the result should be 1 + 1 + 1 + ..., which is \(\infty\) instead of 0. Or you can say the min does not make sense with no inputs and there is no definition for this function when there is no inputs. In fact, std::min(initializer_list<T>) follows this pattern. If the initializer_list.size() is 0, the behaviour is undefined.

This “compelling use case of zip()” simply does not return a mathematically correct answer, no matter how we interpret its intent.

3.4.3 There are no complaints of being empty range

An infinite range is clearly mathematically the correct identity element. But that’s entirely separate from the question of whether it’s the correct programming solution. I have yet to see a compelling argument for why looping over zip(rs...) should degenerate into an infinite loop when an empty range is a perfectly reasonable, useful, and practical solution. Have people run into issues with either Python or C++ not providing an infinite range here? I would like to see concrete evidence thereof.

I think to answer “Have people run into issues with either Python or C++ not providing an infinite range here?”, we should also answer “Have people used zip with no arguments” first. The author of this paper is no longer pursuing the “infinite range” option.

3.5 Conclusion

The authors propose to change zip() to be ill-formed.

4 Implementation Experience

5 Wording

Modify 25.7.25.1 [range.zip.overview] section 2 as

The name views::zip denotes a customization point object ([customization.point.object]). Given a pack of subexpressions Es…, the expression views::zip(Es…) is expression-equivalent to

5.1 Feature Test Macro

Bump the FTM __cpp_lib_ranges_zip

6 References

[haskell_doc] Haskell ZipList Documentation.
https://en.wikibooks.org/wiki/Haskell/Applicative_functors#ZipList
[haskell_source] implementing pure in ZipList.
https://hackage.haskell.org/package/ghc-internal-9.1201.0/docs/src/GHC.Internal.Functor.ZipList.html
[P2321R2] Tim Song. 2021-06-11. zip.
https://wg21.link/p2321r2
[P2540R1] Steve Downey. 2022-03-14. Empty Product for certain Views.
https://wg21.link/p2540r1
[pep0201] PEP 201 – Lockstep Iteration.
https://peps.python.org/pep-0201/#subsequent-change-to-zip
[range-v3] [zip] Correctly zip no ranges.
https://github.com/ericniebler/range-v3/commit/ef9a650d4da87f02c5c079055e09017825c92fb3
[reflector] [isocpp-lib-ext] zip and cartesian_product base case.
https://lists.isocpp.org/lib-ext/2022/01/22023.php