mscroggs.co.uk
mscroggs.co.uk

subscribe

Comment

Comments

Comments in green were written by me. Comments in blue were not written by me.
@Oleg: Great!
Going by your time differences between single and multi-thread, I assume you are not using SMT.

My performance is limited by my old tech (Ivy Bridge E5-2697-v2) !
I have got multi-threading working in C and it looks like there are no bugs :)
But I am still not getting any benefit from HyperThreading, maybe 16 general purpose registers aren't enough for the compiler!

I have just started converting my inner search function to asm, which will give me full control of ALL registers and let me use some coding tricks that are not availble using C.

Once I get it going, I will try a '10' run, which should verify your results.
Lord Sméagol
on /blog/119
               
@Lord Sméagol: Right, I wasn't using SMT (I meant it when I said without multithreading, sorry for the bad wording). I tried to use intel-based VM before with 4 cpu/8 threads, but the speedup was about 5.5 times only, and the price was only 20% less (I used Azure spot instances which are not so expensive, but some automation is needed to restart them every time they are stopped by Azure).
To give you all details, my program spent 21 days on an AMD EPYC 9004 (8 cores without SMT, Azure spot instance Standard F8als v6) using 8 threads (that is, about 160 CPU-days!)
I've published the source code, still planning to write about the optimizations: https://github.com/lightln2/partridge-...
(anonymous)
on /blog/119
               
@(anonymous): Thanks for the clarification.
I'm still (slowly) building my asm funcion. I think I have settled on register allocation, leaving only rcx as a 'scratch' register because cl will be needed for some variable shifts.
I also use the xmm registers (14 so far) to minimize memory operations to hopefully let HT/SMT get some decent gains.

How long would Matt Parker's 'terrible Python code' take to solve this problem ? :)
Ok, his maths knowledge might produce some decent algorithms, but it would help him a lot to use something that compiles to native code.
Lord Sméagol
on /blog/119
               

Archive

Show me a random blog post
 2026 

May 2026

World Cup stickers 2026

Apr 2026

A new puzzle every day
Mixing Wordle with other games

Feb 2026

Christmas (2025) is over
 2025 

Dec 2025

Christmas card 2025

Nov 2025

Christmas (2025) is coming!

Sep 2025

The partridge puzzle

Aug 2025

TMiP 2025 puzzle hunt

Jun 2025

A nonogram alphabet

Mar 2025

How to write a crossnumber

Jan 2025

Christmas (2024) is over
Friendly squares
 2024 

Dec 2024

A regular expression Christmas puzzle
Christmas card 2024

Nov 2024

Christmas (2024) is coming!

Feb 2024

Zines, pt. 2

Jan 2024

Christmas (2023) is over
 2023 
▼ show ▼
 2022 
▼ show ▼
 2021 
▼ show ▼
 2020 
▼ show ▼
 2019 
▼ show ▼
 2018 
▼ show ▼
 2017 
▼ show ▼
 2016 
▼ show ▼
 2015 
▼ show ▼
 2014 
▼ show ▼
 2013 
▼ show ▼
 2012 
▼ show ▼

Tags

finite group countdown dataset warwick royal institution matrix of cofactors numbers dinosaurs radio 4 convergence stirling numbers edinburgh thirteen world cup numerical analysis football matrices captain scarlet ternary quadrilaterals manchester science festival curvature exponential growth guest posts sport crosswords mathslogicbot mean stickers european cup latex pascal's triangle palindromes programming partridge puzzle noughts and crosses manchester matt parker hexapawn london estimation dragon curves rugby golden spiral christmas card alphabets asteroids a gamut of games data final fantasy recursion sorting sobolev spaces nine men's morris graphs fence posts bots craft inline code youtube python bubble bobble cross stitch plastic ratio talking maths in public graph theory pizza cutting puzzles friendly squares mathsjam data visualisation ucl hannah fry inverse matrices chalkdust magazine pokémon wordle squares statistics 24 hour maths game show probability pi approximation day london underground big internet math-off php wave scattering logo geometry tennis folding tube maps dates rust correlation games tetris news coins databet platonic solids geogebra zines fonts tmip gather town go boundary element methods nonograms error bars propositional calculus martin gardner crochet gerry anderson live stream folding paper sound turtles oeis newcastle rhombicuboctahedron crossnumber matrix of minors errors advent calendar weak imposition flexagons reddit bluesky chess draughts kenilworth machine learning finite element method hyperbolic surfaces kings signorini conditions crossnumbers menace triangles standard deviation binary misleading statistics books realhats royal baby weather station game of life bempp javascript gaussian elimination computational complexity video games map projections frobel anscombe's quartet cambridge reuleaux polygons pac-man wool people maths interpolation logs golden ratio pythagoras polynomials accuracy probability arrangement puzzles chebyshev simultaneous equations runge's phenomenon phd logic trigonometry regular expressions mathsteroids electromagnetic field braiding approximation preconditioning pi hats determinants matrix multiplication light national lottery datasaurus dozen bodmas pokémon harriss spiral the aperiodical speed fractals arithmetic wordle raspberry pi christmas coventry

Archive

Show me a random blog post
▼ show ▼
© Matthew Scroggs 2012–2026