by cmcconnell » Thu Oct 16, 2014 9:03 am
When I asked if you had copied the ldf file, I was assuming that dotproduct, created a special ldf (like matmul does), but that's not the case, it just uses internal.ldf.
Having had a quick look, I see it creates four pointers - a,b,c,d - associated with specific addresses. Presumably that works because the addresses were chosen to be well away from the stack and other areas where the compiler/linker will be placing code and/or variables. (And it's such a simple program that there are no other variables, and hardly any code.)
But that seems like a very bad idea as a template for larger scale development. There are bound to be problems as the program gets bigger. If you do need to share or split data between cores (rather than just with the ARM), there has to be a better way. The matmul example has a 'shared_core' section, defined in the ldf, but there may be an issue with how that is being done, as described in the thread I referenced earlier.
If your code is as stripped down as dotproduct and yet failing, then I guess it can't be the stack that is causing the problem. The IDLE instruction at the end might be important, preventing the program from exiting, so you should make sure you keep that in place.
Your best approach would probably be to take the working dotproduct code and modify it step by step, until you find the change which breaks it.
Also, have you actually determined that the values of d being read by the host are all zero (e.g., using printf() ), or are you just assuming that? If random garbage is being returned, then the test (all_done==16) will never be true. You could make all_done unsigned, and change the test to (all_done >= 16). Then add assert(all_done==16) to test for nonsense results.
Colin.