Here’s a data challenge I saw on a LinkedIn post:
My goal was to produce the matrix using Python in Excel.
The first thing to note is that the matrix is entirely composed of the integers 0 through 9. So:
n = list(range(10))
You can see in the image above that the larger matrix is composed of smaller chunks of 5 integers. All of the grey boxes are the integers 5 through 9 either in order or in reverse order. The yellow boxes are the integers 0 through 4, again in order or in reverse order. So we can define two building blocks x and y. In the range A2:E2 are the integers 9, 8, 7, 6, 5. This can be x:
n = list(range(10))
# the integers 5 - 9
x = n[-5:]
And in F2:I2 are the integers 0,1,2,3,4. This can be y:
n = list(range(10))
# the integers 5 - 9
x = n[-5:]
# the integers 0 - 4
y = n[5:]
Let’s look at the first row in the matrix:
In terms of x and y, this row is composed of:
- reverse of x
- y
- x
- reverse of y
Remembering that to reverse a list we can slice it with [::-1], we can define the first row:
n = list(range(10))
x = n[-5:]
y = n[:5]
row1 = x[::-1] + y + x + y[::-1]
row1 is repeated 5 times, then we have a different pattern:
That is:
- reverse of y
- x
- y
- reverse of x
So row 2 can be defined as:
n = list(range(10))
x = n[-5:]
y = n[:5]
row1 = x[::-1] + y + x + y[::-1]
row2 = y[::-1] + x + y + x[::-1]
row2 is repeated 5 times, then there’s another pattern:
We’ll call this row3 and it’s comprised of:
- x
- reverse of y
- reverse of x
- y
Expressed as:
n = list(range(10))
x = n[-5:]
y = n[:5]
row1 = x[::-1] + y + x + y[::-1]
row2 = y[::-1] + x + y + x[::-1]
row3 = x + y[::-1] + x[::-1] + y
Finally, row4:
- y
- reverse of x
- reverse of y
- x
n = list(range(10))
x = n[-5:]
y = n[:5]
row1 = x[::-1] + y + x + y[::-1]
row2 = y[::-1] + x + y + x[::-1]
row3 = x + y[::-1] + x[::-1] + y
row4 = y + x[::-1] + y[::-1] + x
These rows are lists. We need 5 copies of each row and then we want the whole thing as an array:
n = list(range(10))
x = n[-5:]
y = n[:5]
row1 = x[::-1] + y + x + y[::-1]
row2 = y[::-1] + x + y + x[::-1]
row3 = x + y[::-1] + x[::-1] + y
row4 = y + x[::-1] + y[::-1] + x
np.array([row1] * 5 + [row2] * 5 + [row3] * 5 + [row4] * 5)
This code gives us the correct output:
This is all well and good, but perhaps there’s a way to simplify the code.
There are a few things we can do. First, those reversal slices are somewhat repetitive – [::-1]. Python has a “slice” object which allows us to define a slice, save it as a variable, then use that variable wherever we want. Let’s do that now.
n = list(range(10))
x = n[-5:]
y = n[:5]
s = slice(None,None,-1)
row1 = x[s] + y + x + y[s]
row2 = y[s] + x + y + x[s]
row3 = x + y[s] + x[s] + y
row4 = y + x[s] + y[s] + x
np.array([row1] * 5 + [row2] * 5 + [row3] * 5 + [row4] * 5)
Next, note that row4 is the reverse of row1. Remember that those plus operators are list concatenations, not additions.
s = slice(None, None, -1)
row1 = x[s] + y + x + y[s]
reverse_the_components = x[s][s] + y[s] + x[s] + y[s][s]
which_is_the_same_as = x + y[s] + x[s] + y
which_reversed_is = y + x[s] + y[s] + x
row4 = y + x[s] + y[s] + x
Similarly, row3 is the reverse of row2. This means we can change the solution to this:
n = list(range(10))
x = n[-5:]
y = n[:5]
s = slice(None,None,-1)
row1 = x[s] + y + x + y[s]
row2 = y[s] + x + y + x[s]
row3 = row2[s]
row4 = row1[s]
np.array([row1] * 5 + [row2] * 5 + [row3] * 5 + [row4] * 5)
And further simplify to this:
n = list(range(10))
x = n[-5:]
y = n[:5]
s = slice(None,None,-1)
row1 = x[s] + y + x + y[s]
row2 = y[s] + x + y + x[s]
np.array([row1] * 5 + [row2] * 5 + [row2[s]] * 5 + [row1[s]] * 5)
But there’s more! If we think of the 10 rows represented by the duplicates of row1 and row2 as block1 (in green), and the remaining 10 rows as block2 (in red), we can see that block2 is just block1 rotated by 180 degrees. Spreadsheet row 2 is the reverse of spreadsheet row 21. Spreadsheet row 7 is the reverse of spreadsheet row 16, and so on.
Taking this into account, we can make further changes:
n = list(range(10))
x = n[-5:]
y = n[:5]
s = slice(None,None,-1)
row1 = x[s] + y + x + y[s]
row2 = y[s] + x + y + x[s]
block1 = np.array([row1] * 5 + [row2] * 5)
block2 = np.rot90(block1, k=2)
np.vstack([block1,block2])
In the code above, block1 is creating the rows encased in the green box in the image above. block2 is using the np.rot90 function to take block1 and rotate it by 90 degrees twice (thus rotating it 180 degrees). np.vstack is then vertically stacking block1 and block2. This is slightly more concise:
n = list(range(10))
x = n[-5:]
y = n[:5]
s = slice(None,None,-1)
row1 = x[s] + y + x + y[s]
row2 = y[s] + x + y + x[s]
block1 = np.array([row1] * 5 + [row2] * 5)
np.vstack([block1,np.rot90(block1, k=2)])
Things are looking more interesting, but there’s another change we can make to make the code marginally faster. The line creating block1 is using * 5 to repeat each of row1 and row2. But we don’t need to do that, because NumPy has a function called np.repeat which can achieve the same thing.
This:
n = list(range(10))
x = n[-5:]
y = n[:5]
s = slice(None,None,-1)
row1 = x[s] + y + x + y[s]
row2 = y[s] + x + y + x[s]
print(np.repeat([[row1] + [row2]], 5, axis=1))
Prints this:
Note there are several sets of square brackets around the output. This is because there is a redundant length 1 dimension.
n = list(range(10))
x = n[-5:]
y = n[:5]
s = slice(None,None,-1)
row1 = x[s] + y + x + y[s]
row2 = y[s] + x + y + x[s]
block1 = np.repeat([[row1] + [row2]], 5, axis=1)
print(f"Shape of block1={block1.shape}")
# Shape of block1=(1, 10, 20)
That first dimension can be removed with np.squeeze:
block1 = np.squeeze(np.repeat([[row1] + [row2]], 5, axis=1))
print(f"Shape of block1={block1.shape}")
# Shape of block1=(10, 20)
As a final modification, let’s note that we can think of row1 as two segments:
- x[s] + y
- x + y[s]
And that row2 is the reverse of those segments, while keeping the segments in the same order. So as a final version of this code:
n = list(range(10))
x = n[-5:]
y = n[:5]
s = slice(None,None,-1)
segment1 = x[s] + y
segment2 = x + y[s]
row1 = segment1 + segment2
row2 = segment1[s] + segment2[s]
block1 = np.squeeze(np.repeat([[row1] + [row2]], 5, axis=1))
np.vstack([block1,np.rot90(block1, k=2)])
Summary
All of the different iterations of solutions to this challenge in this post run fast enough, but I have tested them and found that this last version is the fastest of the bunch, despite being the longest.
Thanks for sticking with me through this exploration of NumPy functions and remember: there’s no prize for shortest code!