This is my attempt to answer a question I have asked myself many times over the last few months: What is a thunk in an Excel lambda function?
Background
Many of the new dynamic array functions that create arrays, such as MAKEARRAY, SCAN, REDUCE and so on, will not allow an element of the array created to contain an array.In short, an array of arrays is not currently supported.
As an example, consider the SCAN function. The description on the support site says
Scans an array by applying a LAMBDA to each value and returns an array that has each intermediate value
To show you what this means, consider the array of 10 integers created by SEQUENCE(10):
Scan takes this form:
=SCAN ([initial_value], array, lambda(accumulator, value))
A very simple SCAN function can iterate through each item in that array of 10 integers and apply some function to it.
The function that’s used as the third parameter is commonly seen like this:
LAMBDA(a,b,(some calculation involving a, b or both))
Where a is the “accumulator”, which is another way of saying it’s the result from this function during the previous iteration (the previous row of the array), and b is the value in the current row of the array passed in to scan.
The initial_value is there so that the accumulator can be given a value during the first iteration.
To see how this works, let’s look at a simple example:
=SCAN(0,A1#,LAMBDA(a,b,a+b))
The initial_value for the accumulator a is zero. The array is the dynamic array in cell A1, which as we’ve seen is SEQUENCE(10), and the function is:
LAMBDA(a,b,a+b)
SCAN starts on on the first row of array. It sets a to be equal to the initial_value, which is zero. b is the value from row 1 of array, which in this example is 1. So, the function returns a+b=0+1=1 and the first output row is 1.
SCAN then moves to the next row. On row 2, a=(the result of the function from the prior row)=1, b=2, so a+b=1+2=3.
Similarly, on row 3, a=3, b=3, and a+b=6.
SCAN continues in this way until row 10, where a=45, b=10 and a+b=55, which of course is just the sum of the integers from 1 to 10.
So what does all this have to do with thunks? Well so far not much. Because we’ve only been using simple addition.
Things get complicated when the value we want to put in the output array is an array itself.
Enter arrays
What if we wanted to use SCAN to create an array of arrays of letters for which each row has an array of 1 row and a number of columns determined by the value on the current row of SEQUENCE(10)?
The first row would have an array with 1 row and 1 column: {“A”}
The second row would have an array with 1 row and 2 columns: {“A”,”B”}
And so on.
We can create such a function on the first row and drag it down to the 10th row:You might think that we can just use SCAN to create the array on each row and output it in a single dynamic array.=MAKEARRAY(1,$B26,LAMBDA(r,c,CHAR(64+c)))
The problem here is that SCAN does not allow for the result of an iteration to be an array. The result must be a value.
As you can see below, if we try to use this MAKEARRAY function inside the SCAN’s lambda function, it doesn’t work:
In the formula:
=SCAN(0,A1#,LAMBDA(a,b,MAKEARRAY(1,b,LAMBDA(r,c,CHAR(64+c)))))
The calculation in the lambda function within SCAN is the MAKEARRAY function. Unsurprisingly, it makes an array. The result of this lambda is, at each iteration, suppose to be assigned to the accumulator a. But since the result of this lambda is an array, it cannot be assigned to the accumulator, and so we get a #CALC! error.
It turns out that this problem of not being able to assign an array to an output of certain functions is quite common.Thunk to the rescue!
This is a thunk:
LAMBDA(x,LAMBDA(x))
It’s a lambda function with one parameter containing a lambda function with no parameters.
The parameter of the outer lambda – x – can be anything we want it to be. A text string, an integer, a decimal, a date, an array, another lambda function, anything.
This parameter is passed into the calculation section of the outer lambda. The calculation is a lambda, which I’m going to refer to as the inner lambda. This inner lambda has no parameters. Just a calculation.
The way we can think about this thunk is we pass a parameter into this outer lambda and it stores the parameter inside the inner lambda. It doesn’t do anything to it. Just puts it there and leaves it there for us to use later. This is particularly useful if that parameter happens to be a function itself, but we’ll get to that in another post.
What we need to remember right now is that it puts that parameter inside that inner lambda, and the inner lambda holds on to it.
Let’s take a look at this thunk thing.
If we just use a plain thunk in a cell, it gives us a #CALC! error.
This is perhaps not surprising, as we know that when we use a lambda in a cell, we need to provide the parameters to that lambda in parentheses at the end of the formula. So let’s try that:
So we’ve provided a value for x – the parameter of the outer lambda. But it’s still returning a #CALC! error.
Well, yes and no. The truth is it’s showing us a #CALC! error, but hiding behind that error is the value LAMBDA(“hello”) – i.e. a parameter-less lambda function with a calculation equal to the value of the outer lambda!
Well, that’s great and all. But perhaps not immediately obvious why it’s of any use.
The thing about calling a lambda function is that you must complete the formality of providing the parentheses for the parameters – even if there are no parameters.
Look what happens when we add an open parenthesis and close parenthesis:
So we’ve got the parenthetical “hello” as the parameter to the outer lambda and for retrieving the value from the inner lambda, we have an empty parenthetical.
The effect of adding this empty parenthetical to the end of the formula is to evaluate the inner lambda and retrieve the value being stored in it. In this case, it’s just the word “hello”.
Let’s try it with an array.
We pass a 5-row array for the parameter x of the outer lambda: It returns a #CALC! error as before, but remember that hiding behind that error is the array itself.When we add the empty parenthetical, we get the array:
So, we can store this array in the inner lambda and retrieve it with this empty parenthetical.
This is where things start to get interesting.
Array of thunks
Let’s jump back to the 10-integer array.
What we’re going to do here is use our new found information about thunks to use SCAN to create an array of thunks.
=LET(
_thunk,LAMBDA(x,LAMBDA(x)),
_thunks,SCAN(0,$A$1#,LAMBDA(a,b,_thunk(MAKEARRAY(1,b,LAMBDA(r,c,CHAR(64+c)))))),
_thunks)
First, we’re using LET to define a single thunk. It’s just the same formula as described above. An outer lambda with a single parameter and inner lambda with no parameters.
Next, we’re using SCAN. We’re going to scan through the array in cell A1 again. Similarly to before, we’ll have zero as the initial_value and we’ll define that familiar lambda with parameters a and b. This time, however, we are going to take that MAKEARRAY function and use it as the parameter x of the thunk.
As we saw above, the thunk will take that MAKEARRAY function and put it inside the inner lambda, where it will be treated as a value.
Because it’s treated as a value, it can be used in the SCAN lambda. That “value” will of course return #CALC! for each row until we provide the empty parenthetical, so the result of this SCAN looks a lot like an array of #CALC! errors:
But remember, each one of those #CALC! errors is actually a single thunk. And each one of those thunks contains that MAKEARRAY function. And we can evaluate, or activate, that MAKEARRAY, by adding an empty parenthetical to the end of the formula.
Take a look at this:
=TRANSPOSE(LET(
_thunk,LAMBDA(x,LAMBDA(x)),
_thunks,SCAN(0,$A$1#,LAMBDA(a,b,_thunk(MAKEARRAY(1,b,LAMBDA(r,c,CHAR(64+c)))))),
INDEX(_thunks,10,1))())
In this function, we’re using INDEX to get the 10th row from the array of thunks, then using the empty parenthetical to retrieve the array from the thunk, and finally wrapping the whole thing in TRANSPOSE. The result is a vertical array of the first 10 letters in the alphabet.
Again, not super useful yet. But now that we know how to get the array of letters from the 10th thunk, it’s just a few steps further to get ALL of the arrays from ALL of the thunks.
=LET(
_thunk,LAMBDA(x,LAMBDA(x)),
_thunks,SCAN(0,$B$26#,LAMBDA(a,b,_thunk(MAKEARRAY(1,b,LAMBDA(r,c,CHAR(64+c)))))),
_cols,MAX(SCAN(0,_thunks,LAMBDA(a,b,COLUMNS(b())))),
_out,MAKEARRAY(ROWS(_thunks),_cols,LAMBDA(r,c,INDEX(INDEX(_thunks,r,1)(),1,c))),
IFERROR(_out,""))
All that’s been done here is to build a rectangular array that is as wide as the widest array from the array of thunks. The array is then populated by each of the arrays in thunks in the array of thunks.
See where the empty parenthetical is? It’s attached to that inner call to INDEX. It’s there because that call to INDEX is grabbing a single element from _thunks, which is an array of thunks, which means that each element is a thunk and… you guessed it, we have to activate that thunk with the empty parenthetical.
The outer call to INDEX is then retrieving individual elements from each row’s array and placing them in the proper column in the output array.
In summary
So that’s it for this introduction to thunks and I hope it’s answered the question posed at the beginning of this post: “What is a thunk in an Excel lambda function?”
If you’d like to go away with a short answer to the question, try this:
A thunk is a parameter-less lambda where we can store complex values until we need them
Mohamed Helmy says:
Thank you very much for this simple and beautiful explanation
The first time I understand this topic in a better way than before
OP says:
Thank you for the encouraging words, Mohamed. I appreciate it!
Rathinagiri Subbiah says:
Great explanation!
OP says:
Thanks!
Chris Graham says:
This is very cool! I’m trying to implement the lower triangular Cholesky decomposition of an array without much success as the Cholesky–Banachiewicz algorithm (https://en.wikipedia.org/wiki/Cholesky_decomposition#The_Cholesky%E2%80%93Banachiewicz_and_Cholesky%E2%80%93Crout_algorithms) depends on prior values of the triangle it previously calculated. I’ve tried MAP and MAKEARRAY and neither seem to be able to refer back on prior values they calculated.
Do you think a thunk would work here to get the sum variable from the k loop?
OP says:
Hi Chris
Sorry for the *very* late reply here. Silly me didn’t realize comments needed to be approved.
Did you come up with a resolution in the end? Would be interesting to see your LAMBDA implementation of that algorithm. I bet it’s a beast!
Cheers
Owen
Sam says:
Hi Owen – How do you extend this to solve the below problem
A1 : = “Bruce,Wayne”
A2 : = “Clark,Kent”
In B1 =TEXTSPLIT(A1:A2,”,”) produces an array of arrays, how do we produce a 2 x 2 array of FirstName Last Name
In C1 = BYROW(A1:A2,LAMBDA(r,TEXTJOIN(“,”,,r))) also produces a array of array
How can we handle this via a generic ThunkLamda = LAMDBA(x,LAMDBA(X))
I know we can solve the above problem using TEXTJOIN + TEXTSPLIT combo or via a REDUCE + VSTACK pattern however – I am keen to learn if there is a generic ThunkLamda function that can be used.
Cheers
Sam
OP says:
Great question, Sam.
Sorry for the late reply here. Not sure whether a thunked expression is necessary here. This seems to work:
=LET(
_arrayText, ARRAYTOTEXT(A1:A2, 0),
_splitText, TRIM(TEXTSPLIT(_arrayText, “,”)),
WRAPROWS(_splitText, 2)
)
Nguyen Duc Thanh says:
I need to slightly modify your non-thunk formula because of my Excel Settings for it to work:
=LET(
_arrayText; SUBSTITUTE(ARRAYTOTEXT(A1:A2; 0); ";"; ",");
_splitText; TRIM(TEXTSPLIT(_arrayText; ","));
WRAPROWS(_splitText; 2)
)
OP says:
Great, thank you! That will be helpful for other users, though I’m surprised that ARRAYTOTEXT doesn’t automatically read the locale settings. Cheers
Sam says:
I got it to work !
=LET(
vThunk, LAMBDA(x,LAMBDA(x)),
vThunks,BYROW(A1:A2,LAMBDA(r,vThunk(TEXTSPLIT(r,”,”)))),
vRows,ROWS(vThunks),
vCols,REDUCE(0,vThunks,LAMBDA(acc,curr,COLUMNS(curr()))),
vResult,MAKEARRAY(vRows,vCols,LAMBDA(r,c,INDEX(INDEX(vThunks,r,1)(),1,c))),
vResult)
Cheers
Sam
OP says:
Works like a charm! I replied to your other message with a non-thunk solution if it’s of interest. Cheers 👍
Peter Bartholomew says:
“A thunk is a parameter-less lambda where we can store complex values until we need them”
A nice definition! The catch is that it is a function (lazy evaluation) and every time you require an element of an array, the entire array is re-evaluated and the element extracted from the result. This can be extremely expensive in terms of compute time. Useful nevertheless.
OP says:
Very good point, Peter. I haven’t done extensive testing on the laziness of LAMBDA, but I seem to recall that in my initial investigation (many months ago), LET isn’t as lazy as I expected and identifiers are evaluated whether they’re used or not. On the flip side of course, it’s better to evaluate once and not need it then evaluate many times for each element. A tricky subject unfortunately and the documentation from Microsoft is not forthcoming on these things.
Prateek says:
How can I use thunk with RANK.AVG function. In column A, I’m having positions (suppose E1,E2,G1,G6,E1,G6,G6,E2) and column B, I’m having scores (3,10,10,7,6,1,9,8). I want to calculate the rank of each row based out of their positions.
OP says:
Hi Prateek
I’m afraid I don’t fully understand your question. It doesn’t sound like thunks are necessary here, but can you post some image of the problem? I will try to help.
Owen
Keith says:
Interesting technique. I’ve been trying to adapt it to a challenging scenario where I need to have access to a running list of values during a SCAN function (looking up a calculated value from a previous row). I tried using MAKEARRAY in the LET outside of the SCAN to prepare an array of thunks the size of my working set, then within the SCAN I attempt to execute a thunk from a previous index to retrieve the prior value, and before the scan finishes I attempt to save the new value to the thunk at the current index in the thunks array. It’s not working, but maybe I’m missing something.
Here is a simplified formula trying to store and retrieve the values:
=LET(
nums,SEQUENCE(10),
values,MAKEARRAY(ROWS(nums),1,LAMBDA(r,c,LAMBDA(x))),
data,SCAN(0,nums,LAMBDA(a,v,INDEX(values,v)(a))),
values)
This version gives me an array of #CALC, but when I try values() as last argument I get a single #REF error. If I try wrapping a MAP around it to add the empty parenthesis after each item, Excel won’t even accept the formula.
Thanks for taking a look.
Owen says:
Hi Keith
Your MAKEARRAY call refers to x – but x isn’t defined anywhere else.
If you’re looking to have reference to previous (or next) rows in a SCAN operation, you can access them with something similar to this:
https://gist.github.com/ncalm/e9c1b7f6ffa5f5ffbf347e956bac4b2e
Mokhtar Moussa says:
Hello,
Great read, thank you for the article Owen.
I’ve been searching for a way to use scan in a way where it accesses the previous iteration input as in the lambda function.
What I’m trying to do (if it makes sense at all) is something like
scan(0, array, Lambda(accumulator a, currentValue b, previousValue c, If(b=c, a+b, a)))
Thank you for your thoughts in advanced.
OP says:
Hi Mokhtar, generally you can do this by scanning over a sequence of the same length as the array. Have a read of this post.