mscroggs.co.uk
mscroggs.co.uk

subscribe

Comment

Comments

Comments in green were written by me. Comments in blue were not written by me.
@Oleg: Great!
Going by your time differences between single and multi-thread, I assume you are not using SMT.

My performance is limited by my old tech (Ivy Bridge E5-2697-v2) !
I have got multi-threading working in C and it looks like there are no bugs :)
But I am still not getting any benefit from HyperThreading, maybe 16 general purpose registers aren't enough for the compiler!

I have just started converting my inner search function to asm, which will give me full control of ALL registers and let me use some coding tricks that are not availble using C.

Once I get it going, I will try a '10' run, which should verify your results.
Lord Sméagol
on /blog/119
               
@Lord Sméagol: Right, I wasn't using SMT (I meant it when I said without multithreading, sorry for the bad wording). I tried to use intel-based VM before with 4 cpu/8 threads, but the speedup was about 5.5 times only, and the price was only 20% less (I used Azure spot instances which are not so expensive, but some automation is needed to restart them every time they are stopped by Azure).
To give you all details, my program spent 21 days on an AMD EPYC 9004 (8 cores without SMT, Azure spot instance Standard F8als v6) using 8 threads (that is, about 160 CPU-days!)
I've published the source code, still planning to write about the optimizations: https://github.com/lightln2/partridge-...
(anonymous)
on /blog/119
               
@(anonymous): Thanks for the clarification.
I'm still (slowly) building my asm funcion. I think I have settled on register allocation, leaving only rcx as a 'scratch' register because cl will be needed for some variable shifts.
I also use the xmm registers (14 so far) to minimize memory operations to hopefully let HT/SMT get some decent gains.

How long would Matt Parker's 'terrible Python code' take to solve this problem ? :)
Ok, his maths knowledge might produce some decent algorithms, but it would help him a lot to use something that compiles to native code.
Lord Sméagol
on /blog/119
               

Archive

Show me a random blog post
 2026 

May 2026

World Cup stickers 2026

Apr 2026

A new puzzle every day
Mixing Wordle with other games

Feb 2026

Christmas (2025) is over
 2025 

Dec 2025

Christmas card 2025

Nov 2025

Christmas (2025) is coming!

Sep 2025

The partridge puzzle

Aug 2025

TMiP 2025 puzzle hunt

Jun 2025

A nonogram alphabet

Mar 2025

How to write a crossnumber

Jan 2025

Christmas (2024) is over
Friendly squares
 2024 

Dec 2024

A regular expression Christmas puzzle
Christmas card 2024

Nov 2024

Christmas (2024) is coming!

Feb 2024

Zines, pt. 2

Jan 2024

Christmas (2023) is over
 2023 
▼ show ▼
 2022 
▼ show ▼
 2021 
▼ show ▼
 2020 
▼ show ▼
 2019 
▼ show ▼
 2018 
▼ show ▼
 2017 
▼ show ▼
 2016 
▼ show ▼
 2015 
▼ show ▼
 2014 
▼ show ▼
 2013 
▼ show ▼
 2012 
▼ show ▼

Tags

menace wave scattering curvature raspberry pi map projections national lottery a gamut of games ternary bubble bobble partridge puzzle phd geometry boundary element methods data visualisation recursion football kings gather town datasaurus dozen manchester science festival javascript matrix of cofactors graphs quadrilaterals golden spiral trigonometry turtles dates oeis machine learning logo plastic ratio people maths cambridge friendly squares guest posts kenilworth news palindromes draughts the aperiodical exponential growth fonts wordle geogebra bempp european cup accuracy captain scarlet warwick pascal's triangle christmas logs propositional calculus stirling numbers dinosaurs folding paper statistics finite element method bodmas ucl harriss spiral fractals crossnumbers dragon curves gaussian elimination numerical analysis regular expressions databet braiding pokémon wordle tetris speed platonic solids gerry anderson squares big internet math-off manchester triangles zines crosswords alphabets rust countdown pythagoras reddit golden ratio tmip standard deviation mathsjam computational complexity wool preconditioning graph theory games cross stitch data probability sound reuleaux polygons logic london arrangement puzzles runge's phenomenon game show probability crochet nine men's morris hannah fry asteroids books flexagons estimation php mean pac-man matrix multiplication electromagnetic field christmas card finite group rugby bluesky matt parker youtube inverse matrices hats weather station pokémon coins final fantasy 24 hour maths talking maths in public arithmetic nonograms radio 4 go simultaneous equations correlation edinburgh numbers realhats royal baby chalkdust magazine fence posts binary live stream newcastle matrices bots python rhombicuboctahedron sorting sobolev spaces chebyshev determinants programming crossnumber sport light london underground craft misleading statistics hyperbolic surfaces advent calendar martin gardner tennis thirteen hexapawn pizza cutting video games error bars inline code stickers errors matrix of minors signorini conditions interpolation puzzles royal institution anscombe's quartet polynomials convergence world cup folding tube maps weak imposition pi approximation day mathsteroids approximation game of life mathslogicbot frobel noughts and crosses dataset latex pi coventry chess

Archive

Show me a random blog post
▼ show ▼
© Matthew Scroggs 2012–2026