I realize there is a tradeoff between accuracy and speed when computing square roots, and that you're never going to get a super-accurate result with a float type anyway. But, if I were a user of the library, I would expect the results to be as close to correct as what a float can store (or just about). Here are some sample outputs I feel are unacceptably inaccurrate (I selected perfect squares to make it more obvious what the error is):
1 -> 0.9999957
4 -> 1.999991
16 -> 3.999983
400 -> 19.99993
676 -> 25.99988
2500 -> 49.9998
10000 -> 99.9996
410881 -> 640.9977
1000000 -> 999.9957
Note that the output for 410881 corresponds to what would be a reasonable output for a number 3 smaller, and that the output for a million corresponds to what it should be for a number 8.6 smaller.
It looks like the algorithm is based on Newton's method, but using only two iterations. That's not quite as bad as it might sound, since it uses the kludgey Quake III hack (including use of a magic number and manipulating the bits of a float as if it were an integer) to generate the initial value, which actually works surprisingly well. Still, I think there should be at least one more iteration in order to get reasonably acceptable output values.
Also, I think it might be nice to provide an inverse square root function, which would require almost no additional code, and would avoid the need for an unnecessary multiplication and division when computing this value.
Best Regards