I don't have any src code yet, but just thought I'd put the idea out there..

I think parallell processing could handle this pretty well (and it would be interesting to see how one could implement scaling when additional resources are added):

Given some Composite number C

HOST

do

set x = sqrt(C); floor x (or ceiling maybe??) //all numbers below repeat above sqrt(C)

send x to all e-cores.

wait for prime factors to come back in memory

ECORES

get x

(this part i'm not sure how would happen yet)

Split up the number line from 2->x (integers only) for n number of cores

each core check divisibility of all 6K+/-1 in the selected range for C //all primes of the form 6K+/-1

if a hit found; write value to memory for HOST to pick up.

Example would be like C = 101*23 = 2323 (small composite number)

sqrt(C) = 48.19....

x = 48 (floor result)

wait for ECORE results (interrupt/loop to look at memory)..

ECORES

core0 get x, C

core0 tell core1-coreM value of C

core0 tell core1-coreM here is your range to look at: //(assume M=15 for 16 core board)

core0 range 3->5

core1 range 6->9

core2 range 10->12

...

core15 range 45->48

Each ECORE (1->M):

accept range from ecore0

create array in range for the form 6*k+/-1

C mod (i) // pointer i iterates through array

if value at array(i) ==0, you have a hit.. pass value to HOST.

Could do some other stuff like.. if you know it is a semi-prime, once you get a hit just kill all other processes (you've found the ONLY lower prime factor value)

I dunno, poke fun if you wish

edit1 6/11/2014: it's def floor not ceiling