Blog
2026-04-20
This is an article that I wrote for
Chalkdust issue 23, and the
new puzzle it introduces appears on the cover of issue 23.
In January, I paid a visit to MathsWorld, the recently opened maths discovery centre
in London, alongside some other members of the Chalkdust team.
One of the highlights of the trip was playing the two-player game Genius Square.
In Genius Square, you start with a six-by-six board and roll seven dice. These dice tell you where
to place seven cylindrical blocks, for example:
The two players then race to fit the pieces shown above into the remaining space on the
board. The pieces that the players have are
the five tetrominoes,
the two triominoes,
a domino,
and a single square (or monomino); these are all the shapes you can
make by gluing together up to four squares (if rotations and reflections are considered the same
shape).
If you want to ruin/improve your copy of Chalkdust, you could cut out the pieces shown above and
try to fit them in the board to the left.
There's some clever design in this game:
if, instead of rolling the dice, you were to randomly pick any set of seven spaces to place the cylinders,
the puzzle is not guaranteed to have a solution.
The locations printed on the dice have been carefully chosen so that any combination that you can roll
leads to a solvable puzzle.
A puzzle-a-day
The Genius Square puzzle is similar to another rearrangement puzzle: the puzzle-a-day calendar,
created by the Norwegian puzzle makers DragonFjord.
In this puzzle, you are given the pieces below
and asked to place them on the board to cover everything except
today's date. For example, on 22 July, you could place the pieces like this:
DragonFjord make and sell wooden and plastic versions of puzzle-a-day, which
you can buy from Maths Gear—who
also provide the top prize for the crossnumber—to
avoid the cost of shipping directly from Norway.
In puzzle-a-day, it's possible to arrange the pieces to make every single combination
of a number and a month, including days that don't exist like 31 September and 30 February.
While we were considering options for the cover of this issue, we discussed putting something
like Genius Square on the cover, and I began to wonder
if it would be possible to make
a puzzle like puzzle-a-day but where it was only possible to make days that actually appear on the
calendar.
A new puzzle
After spending a while scribbling on squared paper and getting nowhere,
I had an idea: I could put the months in regions that were disconnected from the day numbers. Then,
by carefully choosing the shape of the month regions and the arrangement of the dates, I could force
the solver to use different combinations of pieces on the day numbers for different months.
Once I'd had this idea, I threw together some Python code that could see which day numbers
you could and couldn't leave uncovered with a set of pieces, and waited for it to find a good
set of pieces. It found this board and these pieces:
As in Tetris, I've named the pieces after letters that they vaguely resemble.
In January, March, May, July, August, October and December, you have to use
a P, an O and the A in the month regions.
The remaining pieces can make any day from 1 to 31.
In April, June, September and November, you need to use the C, an O and the A in the month regions.
This leaves pieces that can make any day from 1 to 30, but importantly can't make 31.
In February, you need to use both Os and the C in the month regions.
This leaves pieces that can make any day from 1 to 29, but not 30 or 31.
Now all we need to do is find another new arrangement that somehow works differently in leap and non-leap years...
(Click on one of these icons to react to this blog post)
You might also enjoy...
Comments
Comments in green were written by me. Comments in blue were not written by me.
Add a Comment
2025-09-06
Recently, Matt Parker released a video about a puzzle related to the year 2025:
due to 2025 being the square of a triangle number, the following fact is true:
$$
1^3+2^3+3^3+4^3+5^3+6^3+7^3+8^3+9^3
=
(1+2+3+4+5+6+7+8+9)^2
= 2025.
$$
This fact can be rexpressed as: the total area of
one 1×1 square, two 2×2 squares, three 3×3 squares and so on up to nine 9×9 squares
is the same as the area of a 45 by 45 square. This leads to a question: is it possible to arrange this large collection of squares to make the larger square?
The general form of this puzzle (where we sum to \(n\) rather than to 9) is called the partridge puzzle. It was named this by Robert Wainwright
as the version with \(n=12\) reminded him of the total number of gifts in the 12 days of Christmas (although this link isn't exact as the total number of gifts is
12×1 + 11×2 + 10×3 + ... + 1×12 rather than the sum of the cubes).
For Matt's video, I made an interactive tool that lets you arrange the pieces and attempt to solve the problem: you can play with it at mscroggs.co.uk/squares.
How many solutions?
When \(n=1\), the question becomes the very boring "can you arrange a 1×1 square to make a 1×1 square?". The answer is clearly "yes".
For \(n=2\) and \(n=3\), you should be able to convince yourself that it's impossible. It's harder to convince youself what's going on for larger value of \(n\), but I can tell you that
for \(n=4\) there are no solutions. Similarly for \(n=5\), \(n=6\) and \(n=7\) there are no solutions.
You may be starting to think that for any \(n\) except 1 there won't be solutions, but surprisingly there are 18656 solutions for \(n=8\) (or 2332 solutions if you count rotations and
reflections as the same solution). For \(n=9\) (the 2025 version of the puzzle), there are also a lot of solutions. I wrote some code for Matt to find them all: there are 1730280 of them (or 216285
if you count rotations and reflections as the same solution). You can download a zip file containing all the solutions from Zenodo.
Let me know if you do anything interesting with these solutions.
None of the solutions for \(n=8\) or \(n=9\) has rotational or reflectional symmetry. I conjecture that there are no symmetric solutions for any \(n\) greater than this: it's reasonably easy to explain why there can never be a solution with rotational symmetry (unless \(n=1\)), but I haven't yet found a good justification for why there
aren't reflectionally symmetric solutions.
For \(n=10\), it is currently unknown how many solutions there are and so the OEIS sequence
(that gives the counts if rotations and reflections count as the same solution) stops at \(n=9\). My code that generated all the solution for \(n=9\) took around a week to find
all the solutions, so very much isn't capable of working out the number of solutions for \(n=10\).
Heat maps
Once I had the list of all solutions, I decided to make some heat maps to show where each piece
was most commonly placed. Here's the heat map for the 1×1 square:
Heat map of the location of the 1×1 square in the puzzle for \(n=9\): white squares will never contain the 1×1 square; the darker the red, the more likely the position is to contain the 1×1 square.
The amount of white (or near-white) in the plot surprised me: there's some positions that the 1×1 squares is placed in a lot and it nearly never ends up in many places. Here's the heat maps for the 2×2 to 9×9 squares:
Heat maps of the locations of the 2×2 to 9×9 squares for \(n=9\)
(Click on one of these icons to react to this blog post)
You might also enjoy...
Comments
Comments in green were written by me. Comments in blue were not written by me.
Hi,
I am the author of the OEIS sequence. It's a pity that the sequence was not mentioned in Matt Parker's video.
Earlier this year I've made some analysis of the solutions: https://habr.com/ru/articles/889958/
In particular, there are solutions where all squares from 1 to 9 stack in one row or column (1+2+...+9 = 45).
As for the symmetry, the proof is the following: The symmetry could be horizontal (which is nearly the same as vertical) or diagonal.
In case of horizontal, the square of size 1 must be located on the center line. It will be either near the wall, or between 2 larger squares, that are centered on the center line. In both cases a lane of width 1 arises, that cannot be filled with any other square.
In case of diagonal, the square of size 1 must be on the diagonal and at first sight there is no lane of width 1. But, as long as you put all diagonal squares and then any square adjacent to the square of size 1, such a lane arises.
Your heatmap for size 1 is great!
I am the author of the OEIS sequence. It's a pity that the sequence was not mentioned in Matt Parker's video.
Earlier this year I've made some analysis of the solutions: https://habr.com/ru/articles/889958/
In particular, there are solutions where all squares from 1 to 9 stack in one row or column (1+2+...+9 = 45).
As for the symmetry, the proof is the following: The symmetry could be horizontal (which is nearly the same as vertical) or diagonal.
In case of horizontal, the square of size 1 must be located on the center line. It will be either near the wall, or between 2 larger squares, that are centered on the center line. In both cases a lane of width 1 arises, that cannot be filled with any other square.
In case of diagonal, the square of size 1 must be on the diagonal and at first sight there is no lane of width 1. But, as long as you put all diagonal squares and then any square adjacent to the square of size 1, such a lane arises.
Your heatmap for size 1 is great!
Danila P.
@Oleg:
The includes got filtered:
Util.h
//#include [bits/stdc++.h] // not including all
#include [filesystem] // just include what's needed
#include [array] // just include what's needed
#include [mutex] // just include what's needed
:)
The includes got filtered:
Util.h
//#include [bits/stdc++.h] // not including all
#include [filesystem] // just include what's needed
#include [array] // just include what's needed
#include [mutex] // just include what's needed
:)
Lord Sméagol
@Oleg:
I removed my macros:
#define __tzcnt_u32(v) ((v) ? (_tzcnt_u32(v)) : (32))
#define __lzcnt32(v) ((v) ? (_lzcnt_u32(v)) : (32))
replacing them with simple inline code
Util.h
//#include // not including all
#include // just include what's needed
#include // just include what's needed
#include // just include what's needed
#if 1 // use safe localtime
struct tm buf; // use safe localtime
auto err = localtime_s(&buf, &cur_time); // use safe localtime
return std::put_time(&buf, "%F %T"); // use safe localtime
#else // use safe localtime
return std::put_time(std::localtime(&cur_time), "%F %T");
#endif // use safe localtime
State.h
changed _mm_set_epi8(0x80 to -0x80 to stop warnings
inline replacement:
//int i = __tzcnt_u32(mask); // for no BMI; without zero test, as not needed here
int i = _tzcnt_u32(mask); // for no BMI; without zero test, as not needed here
inline replacement:
//int last_idx_before_mid = 31 - __lzcnt32(off_mask); // for no BMI; without zero test, as not needed here
int last_idx_before_mid = _lzcnt_u32(off_mask); // for no BMI; without zero test, as not needed here
Solver.h
inline replacement:
//return ini.size(); // to stop warning
return (int)ini.size(); // to stop warning
inline replacement:
//const int dim = __tzcnt_u32(mask); // for no BMI; without zero test, as not needed here
const int dim = _tzcnt_u32(mask); // for no BMI; without zero test, as not needed here
I tried '9' runs: with asserts: 10:31, without: 10:18 (saved 2%)
A minute slower than the faulty version, but still not too bad for a 2013 (Q3) CPU :)
I removed my macros:
#define __tzcnt_u32(v) ((v) ? (_tzcnt_u32(v)) : (32))
#define __lzcnt32(v) ((v) ? (_lzcnt_u32(v)) : (32))
replacing them with simple inline code
Util.h
//#include // not including all
#include // just include what's needed
#include // just include what's needed
#include // just include what's needed
#if 1 // use safe localtime
struct tm buf; // use safe localtime
auto err = localtime_s(&buf, &cur_time); // use safe localtime
return std::put_time(&buf, "%F %T"); // use safe localtime
#else // use safe localtime
return std::put_time(std::localtime(&cur_time), "%F %T");
#endif // use safe localtime
State.h
changed _mm_set_epi8(0x80 to -0x80 to stop warnings
inline replacement:
//int i = __tzcnt_u32(mask); // for no BMI; without zero test, as not needed here
int i = _tzcnt_u32(mask); // for no BMI; without zero test, as not needed here
inline replacement:
//int last_idx_before_mid = 31 - __lzcnt32(off_mask); // for no BMI; without zero test, as not needed here
int last_idx_before_mid = _lzcnt_u32(off_mask); // for no BMI; without zero test, as not needed here
Solver.h
inline replacement:
//return ini.size(); // to stop warning
return (int)ini.size(); // to stop warning
inline replacement:
//const int dim = __tzcnt_u32(mask); // for no BMI; without zero test, as not needed here
const int dim = _tzcnt_u32(mask); // for no BMI; without zero test, as not needed here
I tried '9' runs: with asserts: 10:31, without: 10:18 (saved 2%)
A minute slower than the faulty version, but still not too bad for a 2013 (Q3) CPU :)
Lord Sméagol
@Oleg: Happy new year!
I just added this:
#if 0
int last_idx_before_mid = 31 - __lzcnt32(off_mask); // 31 - LZCNT ==> index of MSb
#else
// if off_mask can never be zero, no need for check to override BSR result
assert(off_mask);
// a '9' run didn't reveal any 0 [you would know for sure for other sizes]
// need unsigned long result
unsigned long last_idx_before_mid;
// get index of MSb [no need for adjustment if off_mask can never be zero]
_BitScanReverse(&last_idx_before_mid, off_mask);
#endif
a run of '9' now produces the correct result: 1,730,280 :)
I just added this:
#if 0
int last_idx_before_mid = 31 - __lzcnt32(off_mask); // 31 - LZCNT ==> index of MSb
#else
// if off_mask can never be zero, no need for check to override BSR result
assert(off_mask);
// a '9' run didn't reveal any 0 [you would know for sure for other sizes]
// need unsigned long result
unsigned long last_idx_before_mid;
// get index of MSb [no need for adjustment if off_mask can never be zero]
_BitScanReverse(&last_idx_before_mid, off_mask);
#endif
a run of '9' now produces the correct result: 1,730,280 :)
Lord Sméagol
@Lord Sméagol: Hello and happy New Year!
9 minutes is cool!
The answer is wrong because of _lzcnt instruction, as you suspected, as turns out it works differently on different cpus: https://nextmovesoftware.com/blog/2017...
With this error, solutions having 1x1 square directly in the center are not counted.
I guess, gcc/clang do it correctly because I specify -march=native (so it checks cpu and generates correct instruction), and run where I compile. But it's a potential problem I probably need to add some assertions to the code.
Maybe on your hardware you can either use WSL and clang compiler, or set constexpr bool USE_SSE_QUADRANT_FILL=false, to fall back to slower.
You could also try to use BitScanReverse instead of __lzcnt, but it has different input/output so I'm not sure how hard would that be to fix it.
9 minutes is cool!
The answer is wrong because of _lzcnt instruction, as you suspected, as turns out it works differently on different cpus: https://nextmovesoftware.com/blog/2017...
With this error, solutions having 1x1 square directly in the center are not counted.
I guess, gcc/clang do it correctly because I specify -march=native (so it checks cpu and generates correct instruction), and run where I compile. But it's a potential problem I probably need to add some assertions to the code.
Maybe on your hardware you can either use WSL and clang compiler, or set constexpr bool USE_SSE_QUADRANT_FILL=false, to fall back to slower.
You could also try to use BitScanReverse instead of __lzcnt, but it has different input/output so I'm not sure how hard would that be to fix it.
Oleg
@Oleg: I pulled your code into Visual Studio 2026 and dealt with some warnings:
The COLLAPSES initializer was easy: Use (char) for 0x80
I changed unsafe localtime to:
struct tm buf;
auto err = localtime_s(&buf, &cur_time);
return std::put_time(&buf, "%F %T");
I did a quick change of __tzcnt_u32, __lzcnt32 to use _tzcnt_u32, _lzcnt_u32
Setting the compiler to use AVX and optimize for speed.
An '8' run worked, so I tried '9'
And it found only 1,729,930 solutions!
I suspected _tzcnt_u32, _lzcnt_u32 might be causing it, so I covered that:
#define __tzcnt_u32(v) ((v) ? (_tzcnt_u32(v)) : (32)) // Match BMI : should return 32 for value 0
#define __lzcnt32(v) ((v) ? (_lzcnt_u32(v)) : (32)) // Match BMI : should return 32 for value 0
but it didn't fix it.
Anyway, I decided to test multi-threading.
First I hunted for the initial depth sweet spot:
>Puzzle_Oleg.exe run 9 8 24
>Puzzle_Oleg.exe run 9 9 24
...
>Puzzle_Oleg.exe run 9 18 24
>Puzzle_Oleg.exe run 9 19 24
And found that 15 was fastest. [It didn't like 19]
'9' run [with the 1,729,930 problem] on Xeon E5-2697-v2
12t: 11m 52s
24t: 9m 1s
so HyperThreading is helping by about 48%
The COLLAPSES initializer was easy: Use (char) for 0x80
I changed unsafe localtime to:
struct tm buf;
auto err = localtime_s(&buf, &cur_time);
return std::put_time(&buf, "%F %T");
I did a quick change of __tzcnt_u32, __lzcnt32 to use _tzcnt_u32, _lzcnt_u32
Setting the compiler to use AVX and optimize for speed.
An '8' run worked, so I tried '9'
And it found only 1,729,930 solutions!
I suspected _tzcnt_u32, _lzcnt_u32 might be causing it, so I covered that:
#define __tzcnt_u32(v) ((v) ? (_tzcnt_u32(v)) : (32)) // Match BMI : should return 32 for value 0
#define __lzcnt32(v) ((v) ? (_lzcnt_u32(v)) : (32)) // Match BMI : should return 32 for value 0
but it didn't fix it.
Anyway, I decided to test multi-threading.
First I hunted for the initial depth sweet spot:
>Puzzle_Oleg.exe run 9 8 24
>Puzzle_Oleg.exe run 9 9 24
...
>Puzzle_Oleg.exe run 9 18 24
>Puzzle_Oleg.exe run 9 19 24
And found that 15 was fastest. [It didn't like 19]
'9' run [with the 1,729,930 problem] on Xeon E5-2697-v2
12t: 11m 52s
24t: 9m 1s
so HyperThreading is helping by about 48%
Lord Sméagol
Add a Comment















