mscroggs.co.uk
mscroggs.co.uk

subscribe

Comment

Comments

Comments in green were written by me. Comments in blue were not written by me.
@Oleg: Great!
Going by your time differences between single and multi-thread, I assume you are not using SMT.

My performance is limited by my old tech (Ivy Bridge E5-2697-v2) !
I have got multi-threading working in C and it looks like there are no bugs :)
But I am still not getting any benefit from HyperThreading, maybe 16 general purpose registers aren't enough for the compiler!

I have just started converting my inner search function to asm, which will give me full control of ALL registers and let me use some coding tricks that are not availble using C.

Once I get it going, I will try a '10' run, which should verify your results.
Lord Sméagol
on /blog/119
               
@Lord Sméagol: Right, I wasn't using SMT (I meant it when I said without multithreading, sorry for the bad wording). I tried to use intel-based VM before with 4 cpu/8 threads, but the speedup was about 5.5 times only, and the price was only 20% less (I used Azure spot instances which are not so expensive, but some automation is needed to restart them every time they are stopped by Azure).
To give you all details, my program spent 21 days on an AMD EPYC 9004 (8 cores without SMT, Azure spot instance Standard F8als v6) using 8 threads (that is, about 160 CPU-days!)
I've published the source code, still planning to write about the optimizations: https://github.com/lightln2/partridge-...
(anonymous)
on /blog/119
               
@(anonymous): Thanks for the clarification.
I'm still (slowly) building my asm funcion. I think I have settled on register allocation, leaving only rcx as a 'scratch' register because cl will be needed for some variable shifts.
I also use the xmm registers (14 so far) to minimize memory operations to hopefully let HT/SMT get some decent gains.

How long would Matt Parker's 'terrible Python code' take to solve this problem ? :)
Ok, his maths knowledge might produce some decent algorithms, but it would help him a lot to use something that compiles to native code.
Lord Sméagol
on /blog/119
               

Archive

Show me a random blog post
 2026 

Feb 2026

Christmas (2025) is over
 2025 
▼ show ▼
 2024 
▼ show ▼
 2023 
▼ show ▼
 2022 
▼ show ▼
 2021 
▼ show ▼
 2020 
▼ show ▼
 2019 
▼ show ▼
 2018 
▼ show ▼
 2017 
▼ show ▼
 2016 
▼ show ▼
 2015 
▼ show ▼
 2014 
▼ show ▼
 2013 
▼ show ▼
 2012 
▼ show ▼

Tags

geogebra gather town craft squares regular expressions kings fence posts anscombe's quartet oeis exponential growth reddit matrices phd dragon curves big internet math-off draughts matrix of minors data visualisation graph theory geometry arithmetic data christmas card royal baby dinosaurs finite group video games palindromes final fantasy plastic ratio frobel hexapawn javascript python correlation stickers convergence recursion flexagons coventry fonts cross stitch the aperiodical raspberry pi realhats hannah fry rust sobolev spaces dataset numerical analysis live stream mean youtube golden ratio warwick mathsteroids wool fractals captain scarlet london advent calendar approximation tmip london underground crossnumber interpolation guest posts inline code manchester newcastle a gamut of games talking maths in public bubble bobble alphabets polynomials coins rhombicuboctahedron asteroids pac-man countdown reuleaux polygons harriss spiral machine learning edinburgh thirteen world cup go manchester science festival platonic solids games determinants mathsjam computational complexity finite element method chebyshev matt parker datasaurus dozen ternary pythagoras error bars runge's phenomenon rugby graphs propositional calculus braiding wave scattering triangles speed pizza cutting logs crossnumbers golden spiral statistics european cup numbers errors bodmas databet electromagnetic field kenilworth royal institution crochet sorting game of life logo ucl chalkdust magazine hyperbolic surfaces nonograms puzzles boundary element methods mathslogicbot pi dates inverse matrices estimation books matrix of cofactors pi approximation day national lottery standard deviation signorini conditions zines folding tube maps folding paper people maths sound weather station php menace misleading statistics map projections sport partridge puzzle probability trigonometry preconditioning matrix multiplication football pascal's triangle stirling numbers bots 24 hour maths logic hats curvature crosswords news simultaneous equations chess nine men's morris accuracy friendly squares game show probability weak imposition latex quadrilaterals gaussian elimination programming binary martin gardner radio 4 christmas turtles light cambridge gerry anderson bempp noughts and crosses tennis

Archive

Show me a random blog post
▼ show ▼
© Matthew Scroggs 2012–2026