mscroggs.co.uk
mscroggs.co.uk

subscribe

Comment

Comments

Comments in green were written by me. Comments in blue were not written by me.
@Oleg: Great!
Going by your time differences between single and multi-thread, I assume you are not using SMT.

My performance is limited by my old tech (Ivy Bridge E5-2697-v2) !
I have got multi-threading working in C and it looks like there are no bugs :)
But I am still not getting any benefit from HyperThreading, maybe 16 general purpose registers aren't enough for the compiler!

I have just started converting my inner search function to asm, which will give me full control of ALL registers and let me use some coding tricks that are not availble using C.

Once I get it going, I will try a '10' run, which should verify your results.
Lord Sméagol
on /blog/119
               
@Lord Sméagol: Right, I wasn't using SMT (I meant it when I said without multithreading, sorry for the bad wording). I tried to use intel-based VM before with 4 cpu/8 threads, but the speedup was about 5.5 times only, and the price was only 20% less (I used Azure spot instances which are not so expensive, but some automation is needed to restart them every time they are stopped by Azure).
To give you all details, my program spent 21 days on an AMD EPYC 9004 (8 cores without SMT, Azure spot instance Standard F8als v6) using 8 threads (that is, about 160 CPU-days!)
I've published the source code, still planning to write about the optimizations: https://github.com/lightln2/partridge-...
(anonymous)
on /blog/119
               
@(anonymous): Thanks for the clarification.
I'm still (slowly) building my asm funcion. I think I have settled on register allocation, leaving only rcx as a 'scratch' register because cl will be needed for some variable shifts.
I also use the xmm registers (14 so far) to minimize memory operations to hopefully let HT/SMT get some decent gains.

How long would Matt Parker's 'terrible Python code' take to solve this problem ? :)
Ok, his maths knowledge might produce some decent algorithms, but it would help him a lot to use something that compiles to native code.
Lord Sméagol
on /blog/119
               

Archive

Show me a random blog post
 2026 

May 2026

World Cup stickers 2026

Apr 2026

A new puzzle every day
Mixing Wordle with other games

Feb 2026

Christmas (2025) is over
 2025 

Dec 2025

Christmas card 2025

Nov 2025

Christmas (2025) is coming!

Sep 2025

The partridge puzzle

Aug 2025

TMiP 2025 puzzle hunt

Jun 2025

A nonogram alphabet

Mar 2025

How to write a crossnumber

Jan 2025

Christmas (2024) is over
Friendly squares
 2024 

Dec 2024

A regular expression Christmas puzzle
Christmas card 2024

Nov 2024

Christmas (2024) is coming!

Feb 2024

Zines, pt. 2

Jan 2024

Christmas (2023) is over
 2023 
▼ show ▼
 2022 
▼ show ▼
 2021 
▼ show ▼
 2020 
▼ show ▼
 2019 
▼ show ▼
 2018 
▼ show ▼
 2017 
▼ show ▼
 2016 
▼ show ▼
 2015 
▼ show ▼
 2014 
▼ show ▼
 2013 
▼ show ▼
 2012 
▼ show ▼

Tags

phd preconditioning guest posts quadrilaterals martin gardner binary programming weather station platonic solids zines python logs mean tetris matrix of minors game show probability sorting royal baby crosswords christmas signorini conditions estimation captain scarlet mathslogicbot pizza cutting oeis crochet dragon curves dates graph theory arithmetic puzzles matrix of cofactors friendly squares runge's phenomenon news menace hats national lottery people maths numbers talking maths in public game of life coins datasaurus dozen flexagons advent calendar big internet math-off rust world cup realhats errors speed bots crossnumbers matrices arrangement puzzles determinants crossnumber countdown chebyshev matt parker sport accuracy radio 4 football pascal's triangle wordle the aperiodical polynomials squares turtles pi approximation day draughts interpolation partridge puzzle manchester rhombicuboctahedron dinosaurs european cup electromagnetic field pythagoras geometry logic weak imposition asteroids javascript logo pi stickers games geogebra 24 hour maths gaussian elimination london underground golden ratio pokémon sound finite group pokémon wordle recursion gerry anderson machine learning fractals data chalkdust magazine hyperbolic surfaces bubble bobble mathsjam ternary frobel fonts nonograms bodmas go approximation finite element method alphabets numerical analysis braiding reddit inline code nine men's morris light sobolev spaces pac-man databet cross stitch php triangles mathsteroids noughts and crosses matrix multiplication convergence rugby error bars wool anscombe's quartet hexapawn reuleaux polygons ucl folding tube maps books wave scattering manchester science festival craft map projections regular expressions chess standard deviation edinburgh latex gather town curvature harriss spiral kings plastic ratio video games newcastle statistics royal institution simultaneous equations london dataset graphs raspberry pi inverse matrices stirling numbers coventry cambridge a gamut of games tennis misleading statistics data visualisation probability final fantasy computational complexity correlation trigonometry tmip fence posts warwick youtube bempp propositional calculus folding paper thirteen bluesky boundary element methods christmas card live stream hannah fry kenilworth palindromes exponential growth golden spiral

Archive

Show me a random blog post
▼ show ▼
© Matthew Scroggs 2012–2026