Thank you both for your replies.
Yes, I know according to
http://www.opengl.org/wiki/GLSL_Uniform some ATI Driver report the maximum number of vec4 instead of floats (like nvidia).
This would mean: ATI 1024 = NVIDIA 4096.
But unfortunately the shader does not work (no link error, also no validation error), so could really be a driver bug or sth else. I'll keep on searching/testing maybe with a different driver version.
As this is currently a really special case, we may simply switch to SW skinning, performance wise that seems ok (although uniform buffer objects seem very tempting but I am lazy, fearing code merge issue with upcoming horde versions
Thanks marciano for the interesting link.