Parallella Community

by **djm** » Mon Sep 15, 2014 5:33 am

Does OpenCL __local memory (declared as __local in kernel function signature, and set to null arg_value and appropriate size in clSetKernelArg() ) map to the 32K local memory of each epiphany RISC core? I ask because when I tried to launch a kernel to run on each core, and have each kernel write the address of its __local pointer into a global memory buffer, the addresses were all the same, and all with high order 12 bits set to 0x808.

I'm confused. In one sense, as per OpenCL, local memory should be shared by all work items in a work group. For OpenCL on epiphany via COPRTHR there is only 1 work group with max 16 work items, each work item corresponding to a RISC core. So __local buffers should indeed be shared by all epiphany cores as per OpenCL. The answer I got is 'correct' except that each core will pay a different access cost because although the __local memory was in local storage for 1 core, it is not local for any other core, and is accessed through the inter-core mesh.

How do I make use of the 32K RISC core local memory in an OpenCL kernel to speed up execution by avoiding repeated access to global memory?

Thanks for the help.

by **dar** » Mon Sep 15, 2014 2:02 pm

by **djm** » Mon Sep 15, 2014 3:30 pm

Thanks for this excellent explanation! I was suspecting that this was the case but couldn't find it spelled out so nicely anywhere.

by **djm** » Mon Sep 15, 2014 3:33 pm

A related question … if I don't know the size of the per-core buffer at compile time, can malloc be called from the kernel code? Thanks.

by **dar** » Mon Sep 15, 2014 4:03 pm

by **djm** » Mon Sep 15, 2014 7:49 pm

Thanks again for your clear and detailed response.

by **cmcconnell** » Mon Sep 15, 2014 9:18 pm

by **dar** » Mon Sep 15, 2014 10:13 pm

All of that is very much correct I would think, but just a word of caution. Epiphany is operated as a co-processor, and has memory allocated from the host using dmalloc() and has a very specific layout in per-core local memory where memory is at a premium. It is not clear how this is reconciled with the LDF. The conventional malloc() call uses pages for accounting and a complex highly efficient allocation algorithm that may not operate correctly if the heap is set so as to define an effective total memory of a few pages worth of storage. Its possible magic just happens - that always nice - but I would test before assuming how it will work. These are interesting questions. Some of this I just do not know the answer. In the end, the best and most efficient solution to dynamic per-core memory allocation will be to use a specialized alloca() implementations designed to operate in a constrained memory environment more like a cache than global memory, allocating memory by raising the free mem boundary controlled within the coprthr implementation. You can see reference to this boundary in the kernel launch code where it must be set just past the program and special data used by the implementation.

If anyone experiments with malloc() I would find the results interesting. Just expect the result may be worst than a seg fault.

-DAR

by **dobkeratops** » Wed Dec 02, 2015 11:38 pm

by **jar** » Fri Dec 04, 2015 3:57 pm

The problem with the OpenCL standard as it relates to the Epiphany architecture is that the OpenCL C language does not define both locality and accessibility. The locality implicitly defines accessibility. The __private memory is accessible by threads within the processing element and __local memory is accessible by all threads within an OpenCL workgroup. Epiphany doesn't really have a hardware workgroup and it can access all memory at any location. There's no programming mechanism within OpenCL to allow thread 0 in workgroup 0 to access __private or __local memory from thread 0 in workgroup 1, although Epiphany could do it.

In my opinion, it would be good to just let OpenCL be and not try to force Epiphany to conform to it. The Epiphany architecture is much more capable than the virtual OpenCL device model.

Parallella Community

OpenCL __local memory

OpenCL __local memory

Re: OpenCL __local memory

Re: OpenCL __local memory

Re: OpenCL __local memory

Re: OpenCL __local memory

Re: OpenCL __local memory

Re: OpenCL __local memory

Re: OpenCL __local memory

Re: OpenCL __local memory

Re: OpenCL __local memory

Who is online