mscroggs.co.uk
mscroggs.co.uk

subscribe

Comment

Comments

Comments in green were written by me. Comments in blue were not written by me.
@Lord Sméagol: Hello and happy New Year!
9 minutes is cool!
The answer is wrong because of _lzcnt instruction, as you suspected, as turns out it works differently on different cpus: https://nextmovesoftware.com/blog/2017...
With this error, solutions having 1x1 square directly in the center are not counted.

I guess, gcc/clang do it correctly because I specify -march=native (so it checks cpu and generates correct instruction), and run where I compile. But it's a potential problem I probably need to add some assertions to the code.

Maybe on your hardware you can either use WSL and clang compiler, or set constexpr bool USE_SSE_QUADRANT_FILL=false, to fall back to slower.
You could also try to use BitScanReverse instead of __lzcnt, but it has different input/output so I'm not sure how hard would that be to fix it.
Oleg
on /blog/119
               
@Oleg:

The includes got filtered:

Util.h

//#include [bits/stdc++.h] // not including all
#include [filesystem] // just include what's needed
#include [array] // just include what's needed
#include [mutex] // just include what's needed

:)
Lord Sméagol
on /blog/119
               
@Oleg:

I removed my macros:
#define __tzcnt_u32(v) ((v) ? (_tzcnt_u32(v)) : (32))
#define __lzcnt32(v) ((v) ? (_lzcnt_u32(v)) : (32))
replacing them with simple inline code


Util.h

//#include // not including all
#include // just include what's needed
#include // just include what's needed
#include // just include what's needed

#if 1 // use safe localtime
struct tm buf; // use safe localtime
auto err = localtime_s(&buf, &cur_time); // use safe localtime
return std::put_time(&buf, "%F %T"); // use safe localtime
#else // use safe localtime
return std::put_time(std::localtime(&cur_time), "%F %T");
#endif // use safe localtime


State.h

changed _mm_set_epi8(0x80 to -0x80 to stop warnings

inline replacement:
//int i = __tzcnt_u32(mask); // for no BMI; without zero test, as not needed here
int i = _tzcnt_u32(mask); // for no BMI; without zero test, as not needed here

inline replacement:
//int last_idx_before_mid = 31 - __lzcnt32(off_mask); // for no BMI; without zero test, as not needed here
int last_idx_before_mid = _lzcnt_u32(off_mask); // for no BMI; without zero test, as not needed here


Solver.h

inline replacement:
//return ini.size(); // to stop warning
return (int)ini.size(); // to stop warning

inline replacement:
//const int dim = __tzcnt_u32(mask); // for no BMI; without zero test, as not needed here
const int dim = _tzcnt_u32(mask); // for no BMI; without zero test, as not needed here


I tried '9' runs: with asserts: 10:31, without: 10:18 (saved 2%)
A minute slower than the faulty version, but still not too bad for a 2013 (Q3) CPU :)
Lord Sméagol
on /blog/119
               
@Oleg: Happy new year!

I just added this:

#if 0
int last_idx_before_mid = 31 - __lzcnt32(off_mask); // 31 - LZCNT ==> index of MSb
#else

// if off_mask can never be zero, no need for check to override BSR result
assert(off_mask);
// a '9' run didn't reveal any 0 [you would know for sure for other sizes]

// need unsigned long result
unsigned long last_idx_before_mid;

// get index of MSb [no need for adjustment if off_mask can never be zero]
_BitScanReverse(&last_idx_before_mid, off_mask);
#endif

a run of '9' now produces the correct result: 1,730,280 :)
Lord Sméagol
on /blog/119
               

Archive

Show me a random blog post
 2026 

May 2026

World Cup stickers 2026

Apr 2026

A new puzzle every day
Mixing Wordle with other games

Feb 2026

Christmas (2025) is over
 2025 

Dec 2025

Christmas card 2025

Nov 2025

Christmas (2025) is coming!

Sep 2025

The partridge puzzle

Aug 2025

TMiP 2025 puzzle hunt

Jun 2025

A nonogram alphabet

Mar 2025

How to write a crossnumber

Jan 2025

Christmas (2024) is over
Friendly squares
 2024 

Dec 2024

A regular expression Christmas puzzle
Christmas card 2024

Nov 2024

Christmas (2024) is coming!

Feb 2024

Zines, pt. 2

Jan 2024

Christmas (2023) is over
 2023 
▼ show ▼
 2022 
▼ show ▼
 2021 
▼ show ▼
 2020 
▼ show ▼
 2019 
▼ show ▼
 2018 
▼ show ▼
 2017 
▼ show ▼
 2016 
▼ show ▼
 2015 
▼ show ▼
 2014 
▼ show ▼
 2013 
▼ show ▼
 2012 
▼ show ▼

Tags

hats arrangement puzzles ternary captain scarlet frobel sorting gaussian elimination kenilworth data visualisation christmas card probability stickers standard deviation ucl preconditioning fractals manchester science festival logic partridge puzzle gather town harriss spiral football approximation friendly squares world cup crochet signorini conditions databet christmas hyperbolic surfaces royal institution london news trigonometry tennis royal baby squares countdown curvature accuracy game show probability boundary element methods errors programming map projections sound rust fonts go correlation pizza cutting stirling numbers manchester nine men's morris final fantasy dates polynomials european cup matrix multiplication sport determinants pokémon wordle light kings golden spiral cambridge statistics inverse matrices pi approximation day books wave scattering games pac-man data anscombe's quartet convergence mathsteroids phd national lottery london underground reddit big internet math-off wordle reuleaux polygons matrices bodmas wool braiding thirteen speed regular expressions sobolev spaces newcastle turtles nonograms mean computational complexity python game of life geometry menace electromagnetic field oeis finite element method propositional calculus golden ratio runge's phenomenon javascript matrix of cofactors coventry platonic solids gerry anderson error bars binary crossnumber fence posts rhombicuboctahedron rugby arithmetic machine learning logs pascal's triangle martin gardner chess a gamut of games graph theory exponential growth raspberry pi recursion warwick draughts simultaneous equations 24 hour maths php pythagoras inline code bots geogebra mathslogicbot chebyshev puzzles pi weather station numbers radio 4 talking maths in public interpolation triangles crosswords plastic ratio matt parker logo dataset chalkdust magazine realhats folding tube maps hexapawn flexagons hannah fry bluesky palindromes bubble bobble people maths misleading statistics bempp edinburgh finite group mathsjam asteroids weak imposition guest posts coins matrix of minors datasaurus dozen zines numerical analysis youtube dragon curves video games advent calendar graphs cross stitch the aperiodical dinosaurs noughts and crosses estimation craft tetris pokémon alphabets live stream tmip latex folding paper quadrilaterals crossnumbers

Archive

Show me a random blog post
▼ show ▼
© Matthew Scroggs 2012–2026