The ZK Chronicles: Multilinear Extensions

Zero Knowledge ProofsPolynomialsMultilinear ExtensionsSchwartz-Zippel Lemma
F
Frank Mangone
December 26, 2025 · 14 min read · Medium

In the previous installment, we saw our first instance of a protocol that makes use of a general model for computation: circuits.

Although interesting in concept, we noted that ideally we’d like to prove that a computation, represented as a circuit, was correctly executed — not that we know how many inputs satisfy the circuit (recall we studied #SAT).

The question though, is whether we can do that with our current knowledge. And well... We can’t!

It feels we’re really close though, and to be fair, we are. However, getting there will require using a little mathematical gadget we haven’t talked about so far.

Even though things might be a little heavier on the math side today, don’t lose sight of our endgame: to prove we have a satisfying input for some circuit!

And so, it’s time for us to make a new acquaintance: a little tool that will be very helpful in our upcoming efforts.

Into the action we go!

Building Intuition

During the sum-check article, I mentioned how 00 and 11 were two special inputs because of how easy it was to evaluate polynomials at those points.

But let’s be honest: in most real-world scenarios, we don’t actually care about evaluations at 00 and 11 specifically. They’re just numbers like any other — they don’t have any inherent semantic meaning.

The same does not really apply to boolean circuit satisfiability, since the inputs are truly boolean variables to begin with!

So when sum-check asks us to verify sums over a boolean hypercube, an obvious question arises: who even cares?

Kermit with quote 'that's a very good point'

When would we ever need to sum a polynomial over these specific points?

The answer might surprise you: all the time freakin’ time. But to make sense out of all this, we need to look at those boolean hypercubes in a different light.

Encoding Information

The key to this matter is to view them as a way to encode information.

Let’s look at a concrete example: a simple list of numbers. Suppose we have a small list of four values, say [2,3,5,8][2, 3, 5, 8]. To access a specific value of that list, you’d normally provide an index. So for example, to retrieve the value 55, you’d request the value in the list at index 22.

Assuming we’re using zero-indexing, of course!

Now, here’s an observation: the indexes are just integers, and these all have binary representations. What’s more, this little list has 4 possible indexes, which would look like this in binary:

  • Index 000000
  • Index 110101
  • Index 221010
  • Index 331111

You see where I’m going with this, right? Each index is just a point in the boolean hypercube {0,1}2\{0,1\}^2 !

Same information, different representation!

With this, we can imagine the action of reading the information in our list at a given index as a function, where we can already assume the indexes are values in the hypercube (which in this case, is just a boolean square):

f:{0,1}2Ff: \{0,1\}^2 \rightarrow \mathbb{F}

Which is just math notation for saying “a function ff that takes two boolean variables, and returns some value in a field”.

So, following the example, we have:

  • f(0,0)=2f(0,0) = 2
  • f(0,1)=3f(0,1) = 3
  • f(1,0)=5f(1,0) = 5
  • f(1,1)=8f(1,1) = 8

Okay, okay. But isn’t this just more pointless relabeling?

Actually, no! You see, the real kicker here lies in that little function ff we’ve so inconspicuously slid into view. Implicitly, we’re saying that this function is able to encode data.

In fact, we’ve seen a method to obtain said function: interpolation. Obviously, what we obtain from that process is a polynomial. And it is through this type of encoding that the sum-check protocol will become relevant.

Cool! Slowly but surely, it seems we’re getting somewhere.

But I want you to notice something: back in our pass through Lagrange interpolation, we actually focused on the univariate case. Interpolation was presented as a way to obtain a univariate polynomial from evaluations, which we treated as (x,y)(x, y) pairs.

In the sum-check protocol though, we make use of some polynomial gg with vv variables, rather than a single variable. So here, interpolation as we described it will not get us to the type of polynomial we need.

Thus, we’re gonna need something different.

Multilinear Extensions

What we need is a way to go from a multivariate function (like our ff above) defined over a boolean hypercube to a multivariate polynomial. We’ll call said polynomial gg for convenience, given our parallel with sum check.

And we’ll do this in a very special way. Because there are two important conditions to consider here:

  • Agreement on the hypercube: we want to encode some data, meaning that gg must match a set of values at the boolean hypercube.
  • Extension beyond the hypercube: in the sum-check protocol, we are required to evaluate gg at points other than boolean strings.

Combined, these two conditions precisely define the behavior of gg:

It needs to be a multivariate polynomial where each input can be a finite field element, and whose outputs match the encoded values at the boolean hypercube

Yeah... That’s a mouthful. Let’s unpack.

First, we’re saying that we need some function defined over these sets:

g:FvFg: \mathbb{F}^v \rightarrow \mathbb{F}

This is, it takes v inputs in the field mathbbFmathbb{F}, and will allow us to encode a total of 2v2^v values. And for those values, we have a series of restrictions to the function:

  • g(0,0,...,0,0)g(0, 0, ..., 0, 0) encodes f(0,0,...,0,0)=k0f(0, 0, ..., 0, 0) = k_0, so g(0,0,...,0,0)=k0g(0, 0, ..., 0, 0) = k_0, where k0k_0 is some element in mathbbFmathbb{F}.
  • Then, g(0,0,...,0,1)g(0, 0, ..., 0, 1) encodes f(0,0,...,0,1)=k1f(0, 0, ..., 0, 1) = k_1, so g(0,0,...,0,1)=k1g(0, 0, ..., 0, 1) = k_1.
  • Likewise, g(0,0,...,1,0)=k2g(0, 0, ..., 1, 0) = k_2.
  • g(0,0,...,1,1)=k3g(0, 0, ..., 1, 1) = k_3

And so on. Thus, we require that:

f(w)=g(w) w{0,1}vf(w) = g(w) \ \forall w \in \{0,1\}^v

Which is just math notation for: g=fg = f only on the hypercube.

This is the construction we’re after. It’s as if we’re extending our original function ff so that the inputs can take values other than 0s0s and 1s1s, allowing values on the larger finite field. Hence the name: multilinear extensions.

But wait, why multilinear? What does that even mean?

Multilinearity

Before any confusion happens: multivariate and multilinear are not the same thing.

You got me there

Polynomials can have, as we already know, different degrees, as the different terms in the expression itself can be raised to different integer powers. We could even stretch language a bit here and talk about the degree of each variable, as the maximum power a given variable is raised to throughout the whole thing.

Saying that a polynomial is multilinear actually poses a strong restriction on how the expression can look: what we’re saying is that the degree of each variable is at most 11. To be perfectly clear, this is a multilinear polynomial:

g(X1,X2,X3,X4)=X1X4+X2X4+ X3X4g(X_1, X_2, X_3, X_4) = X_1X_4 + X_2X_4 + X_3X_4

And this is not:

g(X1,X2,X3,X4)=X12X4+X2X4+ X3X4g’(X_1, X_2, X_3, X_4) = X_1²X_4 + X_2X_4 + X_3X_4

That single term with X12X_1^2 makes the entire polynomial non-linear!

Alright... But, again, why does this matter?

Turns out that imposing this condition has a couple nice consequences. You see, in principle, we could extend our original function ff using polynomials of any degree. In fact, there are infinitely many ways to do this!

Take for example our 4-element list [2,3,5,8][2, 3, 5, 8]. We could build polynomials such as:

  • g1(X1,X2)=2+X1+3X2+X1X2g_1(X_1, X_2) = 2 + X_1 + 3X_2 + X_1X_2
  • g2(X1,X2)=2+X13+3X2+X1X25g_2(X_1, X_2) = 2 + X_1^3 + 3X_2 + X_1X_2^5
  • g3(X1,X2)=2+100X17+3X299X17+X1X2g_3(X_1, X_2) = 2 + 100X_1^7 + 3X_2 {-} 99X_1^7 + X_1X_2

All three correctly encode the list on {0,1}2\{0,1\}^2, and of course diverge everywhere else. So... What’s makes multilinearity special, specifically?

First of all, there’s the fact that a multilinear expression is simple and efficient. These polynomials will require evaluation at some point, and in that regard, degree matters. Multilinear polynomials tend to be less expensive to evaluate than higher-degree expressions, plus they generally have less coefficients, which also impacts communication costs.

Plus, some soundness arguments rely on some polynomials having low degree!

But this isn’t all. When we restrict ourselves to multilinear polynomials, it just so happens there is only one way to define gg so that it matches all our requirements. In other words, a multilinear extension (or MLE for short) of some function ff defined over the boolean hypercube is unique.

Heck, let me put it in even stronger terms:

There’s a natural, unique way to extend a function ff defined on a boolean hypercube to a multilinear polynomial

This is powerful. So much so, that I think we should prove this statement.

Feel free to skip this next section. I just think it’s worthwhile, since it’s such a useful property of multilinear extensions!

Uniqueness

To show this, we’ll first need a strategy to build such a function gg.

Here’s what we’ll do: for each point ww in {0,1}v\{0,1\}^v, we’ll build a polynomial that evaluates to 11 on ww, and to 00 everywhere else on the hypercube.

That must sound familiar. Can you remember where we saw something similar?

An image of Shrek and Donkey, confused, and with swapped faces

Yes, that’s exactly how we defined Lagrange bases! That’s pretty much what we’re doing here.

Alright then, let’s define that:

χw(X1, ...,Xv)=i=1v[wiXi+(1wi)(1Xi)]\chi_w(X_1, ..., X_v) = \prod_{i=1}^{v} [w_iX_i + (1-w_i)(1-X_i)]

That might look a little odd, but it works just fine:

  • When the XX values are the coordinates of ww, each term evaluates to wi2+(1wi)2w_i^2 + (1 {-} w_i)^2. Note that this equals 11 when wiw_i is either 00 or 11. Thus, the entire product results in 11.
  • When XX takes some other value different than ww, then at least one XiX_i is different from wiw_i, so either Xi=0X_i = 0 and wi=1w_i = 1, or vice versa. In both cases, wiXi+(1wi)(1Xi)w_iX_i + (1 {-} w_i)(1 {-} X_i) equals 00, thus collapsing the entire product to 00.

To get our extension, we just need to add all these terms together, multiplied by the encoded value at a given index, f(w)f(w):

g(X1, ...,Xv)=w{0,1}vf(w)χw(X1,...,Xv)g(X_1, ..., X_v) = \sum_{w \in \{0,1\}^v} f(w) \chi_w(X_1,...,X_v)

You can check for yourself that this expression is multilinear. And as such, we have constructed a valid MLE of ff!

What remains is proving that this expression is the only possible MLE for ff.

I’ll start abusing notation a bit now, to keep things a bit more condensed. When I say p(X)p(X), just assume XX is a vector. This makes sense in the context we’re working with, after all!

To do this, we’ll do something quite clever. Suppose p(X)p(X) and q(X)q(X) are two valid MLEs for ff. From them, let’s build this polynomial hh, which is also multilinear:

h(X)=p(X)q(X)h(X) = p(X) {-} q(X)

Because p(X)p(X) and q(X)q(X) are valid MLEs, they have the same values on the boolean hypercube, meaning that those values will be roots of h(X)h(X).

Most commonly, this is stated as: h(X)h(X) vanishes at {0,1}v\{0,1\}^v.

To prove p(X)p(X) and q(X)q(X) are the same polynomial though, we need to show that h(X)h(X) is identically 00 — that is, its expression is h(X)=0h(X) = 0.

Now, since h(X)h(X) is multilinear, it has degree at most 11 in each variable — so its total degree is at most vv. As we already know, a non-zero polynomial of degree vv can have at most vv roots.

But... We’re claiming h(X)h(X) has 2v2^v distinct roots!

That’s impossible: 2v2^v is greater than vv (for any v1v \geq 1). Unless, of course, we allow h(X)h(X) to be the zero polynomial!

Therefore, the only possibility here is that h(X)=0h(X) = 0, which in turn means that p(X)=q(X)p(X) = q(X). Thus, the MLE is unique, and is exactly the expression we built.

Neat, eh?

A scared cat under a table
Please, just make it stop...

Uniqueness is important because it prevents forgery. By this, I mean that if multiple possible extensions existed, a cheating prover could try to build one to suit their devious needs.

But with MLEs, there’s only one correct extension. The extension process gets protection for free: uniqueness means no wiggle room for cheaters!

Computing MLEs

Alright! We’ve shown that MLEs are unique and well-defined. But there’s something else we need to address: practicality.

Let’s take another quick look at our formula:

g(X1, ... ,Xv)=w{0,1}vf(w)χw(X1,... ,Xv)g(X_1, ... \ , X_v) = \sum_{w \in \{0,1\}^v} f(w) \chi_w(X_1, ... \ ,X_v)

Note that we’re summing over 2v2^v elements, and each of the χ\chi terms involves vv multiplications. Naively, evaluating a single point in gg would take a total of O(v.2v)O(v.2^v) operations. If we’re not careful about how we use these extensions, computational costs could explode, rendering MLEs impractical.

Fortunately, there’s a clever optimization to compute MLEs that brings down evaluation times dramatically, and this is what ultimately makes these little guys very useful.

The trick, once again, has to do with a recursive structure we can exploit. An example here will help conceptualize the general idea.

Suppose we want to compute the extension g(X1,X2,X3)g(X_1, X_2, X_3) of some 3-variable function ff. The fact that each variable is binary allows us to write gg in this way:

g(X1,X2,X3)=(1X1)g0(X2,X3)+X1g1(X2,X3)g(X_1, X_2, X_3) = (1 {-} X_1)g_0(X_2, X_3) + X_1g_1(X_2, X_3)

Where g0g_0 and g1g_1 are restrictions of the original function gg at X1=0X_1 = 0 and X1=1X_1 = 1 respectively.

By using these restrictions, we’ve split the evaluation of our original MLE into the evaluation of two smaller MLEs! Of course, we don’t need to stop there: we can now take g0g_0 and g1g_1, and do the exact same thing, reducing their evaluation to two new MLEs with one less variable.

g0(X2,X3)=(1X2)g00(X3)+X2g01(X3)g_0(X_2, X_3) = (1 {-} X_2)g_{00}(X_3) + X_2g_{01}(X_3)

Once we get to univariate polynomials, their evaluation becomes super simple:

g01(X3)=(1X3)f(0,1,0)+X3f(0,1,1)g_{01}(X_3) = (1 {-} X_3)f(0,1,0) + X_3f(0,1,1)

Remember: X3X_3, as well as the other variables, will take values from the finite field when we’re computing the multilinear extension — not only 00 and 11!

So, in order to evaluate gg, we have to go in the opposite direction. We start with the evaluations of ff, and we take combinations of them. Following through our example would result in this little graph here:

The evaluation graph of a multilinear extension
Click to zoom

The magic happens when we consider the context in which these MLEs are used. You see, in our good ol’ sum-check protocol, we need to calculate gg at points like these in subsequent rounds:

  • g(0,0,0),g(0,1,0),g(0,0,1),g(0,1,1),g(1,0,0),...g(0, 0, 0), g(0, 1, 0), g(0, 0, 1), g(0, 1, 1), g(1, 0, 0), ...
  • g(r1,0,0),g(r1,1,0),g(r1,0,1),g(r1,1,1)g(r_1, 0, 0), g(r_1, 1, 0), g(r_1, 0, 1), g(r_1, 1, 1)
  • g(r1,r2,0),g(r1,r2,1)g(r_1, r_2, 0), g(r_1, r_2, 1)

If you need to, go check the protocol again!

On each round, one variable is fixed to a challenge value rr, but the remaining variables still take boolean combinations that are repeated, and can be reused.

We can see this in action in our little example. Suppose we need to find g(0,0,0)g(0,0,0), and then g(r1,0,0)g(r_1,0,0). The first evaluation prompts us to calculate g0(0,0)g_0(0,0) and g1(0,0)g_1(0,0):

An evaluation that depends on precalculated values

If we store those values for later use, we can simply reuse them when we’re required to calculate g(r1,0,0)g(r_1,0,0):

g(r1)=(1r1)g0(0,0)+r1g1(0,0)g(r_1) = (1 -r_1)g_0(0,0) + r_1g_1(0,0)

What we’ve done is use one of the oldest tricks in the book in algorithm design: memoization!

All we do is store intermediate values when we execute the full evaluations at the beginning of the sum-check, and then we simply reuse them later, significantly cutting down evaluation costs!

This gained efficiency mainly helps keep the prover’s runtime manageable during the sum-check protocol, or really, in any other context where we encode data in this binary fashion.

Cool! We’ve now covered most of the tricks behind MLEs. But before we go, we’ll need to talk about just one more little thing, which happens to be a fundamental we’ll use a lot moving forward.

The Schwartz Zippel Lemma

The insight we used moments ago to prove uniqueness — that a non-zero polynomial of degree dd can have at most dd roots — is actually a special case of a more general and powerful result we’ll use many times throughout the series, called the Schwartz-Zippel lemma. And now’s the perfect time to present it!

Here’s the idea: if p(X1,...,Xv)p(X_1, ..., X_v) is a non-zero polynomial of total degree dd over a field F\mathbb{F}, and we pick random values r1r_1, r2r_2, ..., rvr_v from F\mathbb{F}, what’s the probability that we happen to land on a root?

This value happens to be exactly:

Pr[p(r1, ...,rv)=0]d/F\textrm{Pr}[p(r_1, ..., r_v) = 0] \leq d / |\mathbb{F}|

As long as dd is much smaller than the size of the finite field, then this probability is negligible! Thus, if we choose a random set of inputs, and we actually get p(r1,...,rv)=0p(r_1, ..., r_v) = 0, then there’s a really high chance that the original polynomial was the constant polynomial p(X1,...,Xv)=0p(X_1, ..., X_v) = 0!

This seems simple, but it’s actually a super powerful tool. One very simple application is showing that two polynomials are identical: given p(X)p(X) and q(X)q(X), we can define h(X)h(X) as:

h(X)=p(X)q(X)h(X) = p(X) {-} q(X)

When we select a random point rr*, and check that p(r)=q(r)p(r*) = q(r*), we’re in fact saying that h(r)=0h(r*) = 0, which means with high probability that p(X)p(X) equals q(X)q(X).

This should be very reminiscent of the reasoning we used during the soundness analysis of the sum-check protocol. Although we didn’t say it explicitly, we used the Schwartz-Zippel lemma!

We’ll use this a lot, so definitely keep this one in mind!

Summary

Alright, that was probably a lot! Let’s take stock of what we’ve learned today.

First, the boolean hypercube works well as an indexing strategy. We could leverage that to encode information, since the index could serve as an input of some function, and the data we wanted to encode could be its output.

Then, to use this function in sum-checking, we identified it needed to be extended. This is how we arrived at multilinear extensions: unique extensions of these functions no longer bounded to a boolean hypercube.

Though I didn’t mention it explicitly, it’s very easy to construct these MLEs. The Lagrange bases we used are super simple. For this reason, MLEs are the superior choice when extending functions!

With MLEs, we can take just about any dataset — a database, a vector, a matrix — and encode it as a polynomial that can somehow work well with sum-checks.

Obviously, there’s something we haven’t addressed yet: what role do sum checks play in all this?

I mean, encoding data is fine and everything, but where’s the computation?

That’s where circuits come in. And that will be our next stop, where things will finally start clicking into place!

See you on the next one!

Did you find this content useful?

Support Frank Mangone by sending a coffee. All proceeds go directly to the author.
ETH
Network