-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get the core hrend and vrend assembly routines to compile properly in GCC #19
Comments
I'm pretending I didn't see this one. |
:) It actually shouldn't be too bad. Just a lot of tedious stuff. I am going to do it without dummy constraints first and just use hard-coded clobbers |
Just a quick note, if you want the C versionof these functions to run faster use this (inverse sqrt); inline float f_rsqrt( const float number )
} It makes the c version run about 50% speed of the asm version, which is an adequate improvement. I've nearly finished converting the renderer to intrinsics, so this issue is almost complete. |
You don't need the second iteration of the Newton-Raphson approximation. It's adequate with one iteration in the renderer as the inputs are quantized from an original full precision sqrt, and stored in a lookup table in Ken's original code. |
Ah, this is http://en.wikipedia.org/wiki/Fast_inverse_square_root I assume? There must also be an intrinsic which uses the single instruction to do this I'd hope. |
That's right, it's the infamous code from quake that has had whole articles written about it. The intrinsic is the reciprical sqrt which you will find in the v/hrend(z)sse part of the renderer which operates on 4 values at a time. |
Wait, so is the instruction itself or intrinsic bad with the memory-access bound? Also could you push your work? |
I'll tidy up a bit, and push so you can have a look. |
As far as intrinsics go, any intrinsics that use the __m64 datatype are not supported on x86_64, that's not to say that you can't use mmx registers in assembly. It can all be mitigated with ifdefs, so there can still be a non fatbin version of the executable which is coalesced at compile time. It's just something to be aware of. Take a look at the Intel optimization manual for gotchas. It's related to emms as well becuase mmx registers are shared with with fpu 80 bit registers. All of x86/87 is a kludge, because of backwards compatability. Just like Windows, the price you pay for a general solution is complexity. Compared to most chipsets x86 is a frankenmonster ;) |
Ha, I thought you were going to say "Just like Windows, the price you pay for backwards compatibility, is kludge". OK, yeah I didn't make the connection between no __m64 on x86-64 and intrinsics. Yeah fatbin stuff just make it harder to think, I wouldn't mind having dedicated binaries. Ideally our builds will mostly be MinGW anyways where intrinsics and x86-64 work together fine. |
Certainly use "p" (link-time constant) constraints for pointers to global variables. Ideally use "dummy variables" to avoid hard coding any intermediate constants either.
The text was updated successfully, but these errors were encountered: