@(anonymous): Thanks for the clarification.
I'm still (slowly) building my asm funcion. I think I have settled on register allocation, leaving only rcx as a 'scratch' register because cl will be needed for some variable shifts.
I also use the xmm registers (14 so far) to minimize memory operations to hopefully let HT/SMT get some decent gains.
How long would Matt Parker's 'terrible Python code' take to solve this problem ? :)
Ok, his maths knowledge might produce some decent algorithms, but it would help him a lot to use something that compiles to native code.
I'm still (slowly) building my asm funcion. I think I have settled on register allocation, leaving only rcx as a 'scratch' register because cl will be needed for some variable shifts.
I also use the xmm registers (14 so far) to minimize memory operations to hopefully let HT/SMT get some decent gains.
How long would Matt Parker's 'terrible Python code' take to solve this problem ? :)
Ok, his maths knowledge might produce some decent algorithms, but it would help him a lot to use something that compiles to native code.
on /blog/119