strategies for increasing sales of Epiphany III and IV

Forum for anything not suitable for the other forums.

strategies for increasing sales of Epiphany III and IV

Postby piotr5 » Fri Feb 13, 2015 11:47 am

discussion continued from Parallella Epiphany IV
sebraa wrote:
piotr5 wrote:
sebraa wrote:I would have liked to have more space for the data. The algorithm I used becomes more efficient with larger block sizes, plus a double-buffering approach would have allowed for a streaming implementation.
this I fail to understand. maybe I'm too inexperienced to say that, but wouldn't as you say "shifting complexity over to communication" just do what you want and eliminate the memory-requirements?
Do you talk about "it is possible" or do you talk about "it is good"? Because a lot is possible, and you can always chop an algorithm in small pieces to avoid memory usage. But most of the time, the algorithm is useless afterwards. If your video player cannot do realtime video decoding, it does not work for you. If your weather forecast calculations take one week for a day, then it does not work for you.

Compressing data only works if you know it to be compressible. If it is not, you may actually increase memory requirements, defeating the purpose. Oh, and if you rely on stuff being compressible and sometimes it isn't, you have lost.
this is true, if the communication-latency eats up the speed-increase from larger blocks being used, then my suggestion wont work. however, such concepts like "compressible" and "parallelizable" do not exist on their own, they always come with some circumstances to which they are related. for example incompressible means it either has been already compressed or it is the result of some unknown random events (which happen to map 1-on-1 to the used data-size) -- uncompressible basically means lack of knowledge. and when we speak of parallellizable we actually mean either parallellization through splitting up whatever loops, or making use of whatever known algorithm for parallellization -- invent more algorithms and maybe it is parallelizable again. in that respect for example an integer division is not parallellizable, while calculating partial sums/products of a sequence is parallelizable down to logarithmic runtime and linear/quadratic core-number (linear since you can use multiple cores to perform a single plus or times operation, depending on number-size). therefore quite naturally, when given multiple cores almost all your procedual algorithms are useless and you must come up with new ones. however, you can build up on existing algorithms, the few that are available for parallellization.

however, we are talking of the article I mentioned, the one about chopping down a program and its data to a size which fits an epiphany core. basically you load a function and its dependencies into memory, long before its needed, with a dma transfer. there is a lot of hand-optimizing required here though. if some dependency wont fit, it needs to be loaded just in time, as soon as it's known which data really is needed. what I am suggesting is that instead of executing some instruction after a long wait for its code and data to arrive, you load code and data as soon as you know it will be needed. this isn't just-in-time compilation, it's just-in-time linking!
sebraa wrote:
piotr5 wrote:however, it seems you missed my point: the reason why games did sell back then was because they contained new ideas nobody ever did think of before.
Yes, because any idea was new - it had never been done before. This is not true anymore, so the comparison is invalid. Ever heard of the video game crash in the 80s, killing e.g. Atari? Those games had lost their novelty.
yes, I remember there was a time where games suddenly became huge in terms of memory and video-data. i.e. since simple games lost their novelty, more complex games were introduced. however, parallella is a million times faster than the computers back then, even a single epiphany core is. what I am suggesting here is to connect several parallellas such that memory-size is adequate, and sell that as an educational gaming platform! let epiphany handle the video-data...
sebraa wrote:
piotr5 wrote:so to attract people to parallella we first need to attract people who have unique ideas for games.
Stupid idea, because that requires both game programmer AND game players to have access to Parallella systems, which by the way are completely useless as gaming systems. If it doesn't even boot correctly all the time, it is useless as any non-research system. Gamers have GPUs (which are parallel), game developers have GPUs and the knowledge to use them, so use GPUs for games, because - surprise! - they are designed for games. Use the right tools for the problem.
not so sure gpu is really designed for games. once upon a time people did think a processor which offers 2d-acceleration by moving fixed objects on screen and detecting collision between them, that such a processor would be designed for games -- and now the project "enlightenment" is making use of 3d-acceleration, GL-shader-language, for drawing a desktop environment. you know, the requirements game-programmers demand are changing all the time, hard to say what they will need next. still, I dream, parallella can give them whatever they really need.
sebraa wrote:
piotr5 wrote:you said it, better hardware is cheaper than programmers, so why not make better software cheaper than programmers? we'd need to lower the complexity of programming under memory-constraints of epiphany, so let's make use of the large amount of memory and cpu-speed available to the compilers, make the programming languages and libs more complex!
Awesome idea! And hey, let's sell the updates for reasonable prices as well! An infinite amount of income to cover up mistakes, too, great!

Really: Are YOU able to program a complex library or application without errors? No? See, nobody else is either. Look up CVEs, most are just small mistakes with really big consequences. Or design mistakes, because the designer did not understand his own idea - it was too complex, then. Oh, and nobody has the time to understand a complex system, either. Guess why? Because programmers are paid by the time they spend coding, and they are very expensive.

Ever tried to understand XML? It looks easy, right? Look up Wikipedia to find a small XML document (just a few bytes) which takes a few GB of memory, just to parse it. That is the result of a complex design.

Make it complex AND simple to use, so that inexperienced programmers can use it? Yeah, just take a look at the clusterfuck of PHP and (on a lesser scale) Arduino. There have never been problems with those, right? Thanks a lot for the idea, but no. Just no. Complexity creates an un-maintaneable mess, which nobody understands and which provides infinite amounts of problems, but no solutions.
that's why I came up with the topic of games: to create a novely-experience for the gamer, games need to become more complex than those which lost their novelty -- for lack of new ideas in that level of complexity. similarily, in order to make programming easier/faster, the development-tools needs to become more complex. now the two together: to get more complex games, libraries must become more complex, and for programming those libraries the programming language needs to increase its complexity too.

all that increase of complexity has grown over the years, with lots of people working on it. the final goal is to have high complexity shifted over to the libs and the programs using them need to look simple. now we have this new platform, the parallella, it offers a variety of new challanges. so again a certain complexity must grow over time, especially languages would probably need to get altered to allow for libraries which take care of such things like parallellization.
sebraa wrote:
piotr5 wrote:what I have in mind is c++, allowing for program-execution during program-compilation. why are there no attempts to handle memory-restrictions from there?
What you have in mind is called just-in-time-compilation. And nobody does it on memory-starved systems like Epiphany, because there it is an inherently stupid idea. A compiler is large. A compiler uses large memory structures. A compiler needs a program to compile. That program needs data to run.

My program barely fit into 8K of code, so that I could at least squeeze in 24K of data. Adding a compiler is impossible.

Oh, and a good compiler is slow. If you do JIT, you will not have a good compiler. It will be complex and hard to understand, but not good. Because you want your program to start now, and not in two minutes. And you want your program to react now, not in five minutes for a complicated function.
IMHO JIT is dead. it is useful for games which transform game-data into something that has much higher complexity and faster run-time, but this usefulness is at an end with those games having lost their novelty and hardware being capable of quickly interpreting the scripts at runtime.

no, I didn't mean just-in-time compilation. I meant the other way around: you have a fast processor with many gigabyte of ram, and you wish to transform your sources into some code which runs on epiphany. inside your sourcecode and its libs there are instructions for the compiler how to fulfill your wish, actual programs the compiler needs to execute in order to give you the program you desire. in c++ this was called template programming. now c++ also has a special variable-type which allows for functions marked with that type to manipulate data of that type during the compilation, such that the finished program wont need to execute that function anymore as the result is already known.
sebraa wrote:
piotr5 wrote:what I have in mind in particular is a wrapper around he notion of "pointers". basically on epiphany you need that objects have dynamic address, changing physical location whenever they are needed. you make use of this poiner and somewhere else in the code a fitting dma-copy-instruction is inserted.
Yes, that does work. But where does the data come from? And, what happens if I request an object of size, say, 4 MB? Would you do a DMA-Copy from shared memory every time I access some part of that data? One for each byte? It might be better to actually leave the data in shared memory, then.

Executing some math functions (but not even my main code!!) from shared memory reduced my cycle time from "less than one second per iteration" down to "more than 30 minutes per iteration".

Do you really understand what you are suggesting?
yes, I understand what I am suggesting. what I suggest is that a hash-function which is using a hash-table of size 4MB would automatically get transformed (by compiler and libs) into a hash-function which uses a tree of hash-tables of smaller size. i.e. hash-lookup would be split up into multiple look-ups during the execution of the hash-function. and the hash-function would be transformed into one delivering a partial result of the first few bits, so that look-up can start. (since table-size is >16k for avoiding collisions, I expect the hash-function and data-loading to be complicated/time-consuming.) depending on the hash-function this might require some sort of AI transforming mathematical formulas in a way that these first few bits can get processed first. and this AI would need to be provided by the libs, to be executed during compilation, so the finished hash-function is already using the correct formula. I'm not saying this is always possible, but maybe programmers will learn to use hash-functions where it is...
sebraa wrote:
piotr5 wrote:this really is some advanced kind of artificial intelligence I'm talking about here.
Artificial intelligence is based on (implicit and explicit) knowledge. You do know that this knowledge needs to be stored somewhere, right? Where do you want to store the knowledge for an artificial intelligence, running on the Epiphany, which predicts and understands(!) complicated source code, which it can compile down to binary and run most efficiently on the Epiphany? I do not even talk about the mundane tasks, like parsing the input or outputting assembly tokens. Just the knowledge about "how to do it".

An AI is only CPU-bound if it has enough memory to keep all of its required knowledge in memory at all times. Otherwise, it is simply I/O bound, because the data needs to come from somewhere. That is, by the way, true for any algorithm (especially for database systems).
that's why I suggest the AI to run only during compilation. during runtime, only linking is needed, but even that can be prepared accordingly during compilation so that it isn't using that much time.
sebraa wrote:
piotr5 wrote:btw, since we're talking of complexity and epiphanyIV, creating programs for epiphanyIII is much easier than for eIV.
Why? They are essentially the same chip, just more powerful. For my approach, it would increase the calculation throughput by four (4x as many cores), and decrease the communication throughput by four (4x as much data, but not more communication throughput). By the way, my code spends more than 90% of its time waiting to get its results out of the system.
that's exactly what I am pointing at: waiting for results, in the current level of complexity for development tools, can only be prevented manually. you need to inform yourself which parts of your program can be executed during that wait, how part of your results can be obtained quicker for continuing with that partial data, maybe spend the wait more productively by adding the waiting core as an additional core for doing the calculations! this means higher complexity in communication, and therefore more and more of your wait-time will be for data-transfer. once you reached that level you will need to debug the data-flow.

what I ask is: do you know of a tool that helps with debugging the data-flow? the epiphany simulator looks quite promising in that respect, but what it needs is some ideas for how to display the information on where the data is moving, so you can decide which core to be located where. data collision is not really an issue with 16 cores, with 64 cores it might be interesting to try that kind of optimization. everytime you access the shared memory your data flows to the fgpa and back. cores doing that will delay actual data-transfer on that path! there are up to 7 cores which potentially could decide to talk with shared memory!
sebraa wrote:
piotr5 wrote:something which analyzed epiphany assembler code and figures out which data is where and when? does such a tool already exist for other hardware?
I do not know. I, as a programmer, decide which data I need at a specific point in time, and then I make sure that the data is where I needed. In the best case, I bring in the data before I need it, to avoid waiting on it. Because then I know that I have the important data available.


Just-in-time is, from my point of view, a bad choice in most cases. Not because it inherently is a bad idea, but because it is extremely reliant on perfect knowledge (which you cannot guarantee for any non-trivial system) and tends to break down badly on unforeseen or uncontrollable events.

Just like just-in-time logistics in the real world. Do not store anything, build it when it is ordered. And then a bit of water hits a small area in Japan, a power plant explodes, some machinery is destroyed and half the world sits there, waiting for already-ordered gadgets to be delivered. And they cannot be delivered, because they have never been built in the first place. And building them is impossible, because the machines to build them are destroyed.

Yes, JIT is an awesome idea.

luckily computers are not as unforseeable and unpredictable as our real world. of course JIT-whatever is a bit too advanced for now. in order to program whatever artificial intelligence for speeding up the JIT-stuff, first some experience must be acquired by its programmers. to summarize:

one needs experience with parallellizing functions, and with optimizing away the wait-time for program-execution. next those optimizations need to be automatized in the compiler and libs. only after that the question of dataflow will arise. hand-optimized code rarely is complex enough to require optimization of data-flow.

so, once you experiemented with code-placement and the directory reloc of https://bitbucket.org/__mib/epiphany-msc.git you can take a look at gcc and how it is making use of graphite and lto. then try to implement your code-placement and wait-time-reduction ideas there. the point is that the programs you send to epiphany-cores are all working together, so they need some sort of linking-step for applying various lto-strategies -- even though you wont link them into a compact program just yet. gcc developers have already broken a taboo by implementing automatic loop-parallelization. let's go further in that direction! there is only a limited amount of things a compiler can do, should do. I suspect we'd need some new tool which takes sourcecode (maybe also sourcecode for glsl) and transforms it into sourcecode for abiding epiphany's memory restrictions, and this tool would need to share its code-base with gcc/llmv, staying in sync...
piotr5
 
Posts: 230
Joined: Sun Dec 23, 2012 2:48 pm

Re: strategies for increasing sales of Epiphany III and IV

Postby xilman » Fri Feb 13, 2015 9:35 pm

piotr5 wrote:in that respect for example an integer division is not parallellizable
Not true, in general.

Integer diivision has the same asymptotic complexity as multiplication. Integer multiplication is essentially a convolution of the digits in each operand. FFT convolutions are parallelizable and are used for integer multiplications once the book-keeping overhead is outweighed by the Nlog(N)loglog(N) aysmptotic complexity.

As floating point numbers are but elaborately encoded integers the same comments apply to FP arithmetic.

Reading up on Schönhage–Strassen might prove illuminating.
xilman
 
Posts: 80
Joined: Sat May 10, 2014 8:10 pm
Location: UK

Re: strategies for increasing sales of Epiphany III and IV

Postby sebraa » Sat Feb 14, 2015 2:55 am

Hi,

piotr5 wrote:however, such concepts like "compressible" and "parallelizable" do not exist on their own, they always come with some circumstances to which they are related. for example incompressible means it either has been already compressed or it is the result of some unknown random events
In theory you are right, in practise your reasoning is useless. As soon as you accept external input data and rely on compression to reduce your memory footprint, you will fail badly when someone is able to provide "random" input.

piotr5 wrote:therefore quite naturally, when given multiple cores almost all your procedual algorithms are useless and you must come up with new ones.
Wrong. I use a perfectly fine procedural, single-core algorithm - instead of parallelizing the algorithm, I slice the input data and run multiple non-parallel algorithm instances simultaneously, and I get an almost perfect calculation speedup for that.

piotr5 wrote:however, parallella is a million times faster than the computers back then, even a single epiphany core is. what I am suggesting here is to connect several parallellas such that memory-size is adequate, and sell that as an educational gaming platform! let epiphany handle the video-data...
I guess I repeat myself here... use the right tools for the right job. Handling video data (for gaming) is best done by systems designed to handle video data for gaming, also known as GPUs. The Epiphany has its uses, but it is neither the professional gaming industry (they don't need Epiphany) nor the indie studios (because neither they nor their customers have Epiphany chips).


Real-time video decoding, possibly encoding as well, at a low-power envelope, is a much different story. But then, mobile phones nowadays have decent GPUs too... Epiphanys advantage lies in "additional codec support via software update".

piotr5 wrote:not so sure gpu is really designed for games. [...] you know, the requirements game-programmers demand are changing all the time, hard to say what they will need next. still, I dream, parallella can give them whatever they really need.
What would that be, if it is changing all the time?

piotr5 wrote:all that increase of complexity has grown over the years, with lots of people working on it. the final goal is to have high complexity shifted over to the libs and the programs using them need to look simple.
And now the programmer thinks he knows what happens, even though something else happens in reality. And you get awarded another CVE. In your program, because you made the mistake of using a really simple-looking, but very complex library which had a bug.

piotr5 wrote:IMHO JIT is dead. it is useful for games which transform game-data into something that has much higher complexity and faster run-time, but this usefulness is at an end with those games having lost their novelty and hardware being capable of quickly interpreting the scripts at runtime.
Could you please open your mind a bit and think of something else than "games"? Because really, that is not where the Parallella shines... oh, and JIT is all but dead. I don't really want to bore you, so I just mention the word "bytecode".

piotr5 wrote:[discussion about hash tables and AI] I'm not saying this is always possible, but maybe programmers will learn to use hash-functions where it is...
You lost me there, sorry. But it doesn't matter anyway.

piotr5 wrote:
sebraa wrote:By the way, my code spends more than 90% of its time waiting to get its results out of the system.
you need to inform yourself which parts of your program can be executed during that wait, how part of your results can be obtained quicker for continuing with that partial data, maybe spend the wait more productively by adding the waiting core as an additional core for doing the calculations!
I think you misunderstood me. The cores wait for the results to get out of the system. I cannot get more input data to process, if all memory is filled with last iteration's output data. Even though I do some serious floating-point math, my implementation is not CPU-bound.

Anyway, this whole discussion is still off-topic from my perspective, even with the new title. And I had a reason to take the other conversation away from the public forums and responded privately. I really did not like that you just posted my private message into a public forum. So I will not respond/discuss this with you any further. Congratulations.

(Clarification to the moderators: There is no need to remove or change that post.)
sebraa
sebraa
 
Posts: 495
Joined: Mon Jul 21, 2014 7:54 pm

Re: strategies for increasing sales of Epiphany III and IV

Postby piotr5 » Sat Feb 14, 2015 3:29 pm

xilman wrote:[division not parallellizable]Not true, in general.
Integer diivision has the same asymptotic complexity as multiplication.
it's true that multiplying with a fractional number is parallellizable. however, obtaining that multiplicative inverse of a natural number is not. fft might work for that, in theory, but no such algorithm is known. the reason is that fft is for multiplication modulo some number, so the multiplicative inverse wont be what you're looking for unless the result of the original division actually is supposed to be an integer. that's just my own opinion of course. according to my research on the net chip-designers just claim that division isn't parallellizable and therefore avoid adding that instruction to their assmebler-language or claim it takes a lot of time. only known possibility to speed up division is the way intel did patent for their pentium processor (the one with the bug in division-instruction). not useful for epiphany developers, even if it weren't patented, since it takes up a lot of space on the chip.

sebraa wrote:Hi,[...]
]I guess I repeat myself here... use the right tools for the right job. Handling video data (for gaming) is best done by systems designed to handle video data for gaming, also known as GPUs. The Epiphany has its uses, but it is neither the professional gaming industry (they don't need Epiphany) nor the indie studios (because neither they nor their customers have Epiphany chips).
here I must clarify: I never thought of parallella as a gaming-console, nor as a mixture between gaming and office. however, parallella has 4 slots where you can plug in some extensions (as opposed to sodering or tucking in individual cables). this allows for developing and selling new kinds of input-devices. and you guess right, those should be input devices for games! aren't gamers already bored of joypads and mice? of course they are, hence the success of cameras as input for games. IMHO here really lies the strength of parallella, gaming with really exotic input devices! video-encoding and decoding definitely is a good application too, but why would anyone wish to encode into whatever exotic or new format? and for decoding there are cheaper solutions, other SBCs like raspberry pi...
sebraa wrote:What would that be, if it is changing all the time?
when requirements for gpu changes all the time, a general purpose programmable coprocessor might suit game-developers better than any gpu! you need a new functionality? code it yourself! once game-developers hit the wall and require speed-upgrade, just require gamers to use more parallellas connected by eLink...
sebraa wrote:because you made the mistake of using a really simple-looking, but very complex library which had a bug.
[...]
so I just mention the word "bytecode".
tried java, and I'm not happy with it. its libs are full of bugs, even though they were so simple-looking. same happened when I tried python...

you know, when I said JIT is dead I meant that as soon as complicated programs need to be executed during compilation, such a program isn't fit for JIT compilation, you'd need to compile it twice then. why not just compile it as soon as you're finished with it? I know why: to allow for automatic code-injection! that's the big hope of AI-developers, use JIT as self-altering code. so we have one trend towards complex programs being executed at compilation, and we have another trend of AI-developers going away from code-injection in the usual sense since you just cannot debug a program which has grown in an evolution/genetic algorithm. when you create programs for sale, then you need to react quickly to user-complaints. it's much quicker to debug some program that was executed during compilation, than to debug code-injection problems. the former simply has better documentation (i.e. code is more readable to humans). it's just like you said, the use of JIT introduces so much complexity that bugs are guaranteed since you cannot test all possible applications!

sebraa wrote:The cores wait for the results to get out of the system. I cannot get more input data to process, if all memory is filled with last iteration's output data.

same thing as main processor waiting for data to get in: just start the transfer earlier! (maybe risk some late changes after the data has been sent.) once half your data is out, as you said, this part of the double-buffer can be filled again. I am not aware what you are doing, but what I have in mind is to arrange your data in such a way that it's less likely first half is used lateron, and if you happen to need it nonetheless read from shared memory where you sent it to. such a thing like "compression" doesn't really exist, you always transform (hopefully bijectively) one number into another (hopefully smaller) number. and conversely for each successful compression there is one instance where the size only gets bigger. for this reason compression cannot be done if you know nothing of the input, on average you always get 100% ratio, no compression at all. the whole sense of compression algorithms is to take weighted average, with most common files weighted more heavily than rare files. for example if you take the current date as a seed for RNG to produce those files, you could create an algorithm delivering good compressing during the next few years. maybe your actual project requires random access to your output-data, but the randomness of that access could be predicted somehow. so create a bitmap of data-blocks which aren't needed anymore and purge them. even though more dma-transfers are needed, they happen in parallell with your further processing, so you're still faster than if you wait for the whole algorithm to finish before transferring. of course this would make your algorithm more complicated, prone to errors and maybe slower. and that's what I meant when I said you cannot just use the plain procedual algorithm, you need to consider the actual data-transfer too, this will change your code! again: once you have experience with such changes from procedual to communication-intensive, please contribute your insights to gcc or whatever development tools!

it is true, it was quite self-important of me to just quote your whole private letter in public. sorry for that. I really didn't see any reason why you would want to keep it private, and you didn't mention it in your message. if I did something against your will, I appologize. parallella is an open platform, so I assumed your projects would be equally open and public. I appologize again, if that wasn't the case, that was a 3rd mistake of mine in a single posting. self-importance, short-sightedness and ignorance of your project's planning. those I'm guilty of.
piotr5
 
Posts: 230
Joined: Sun Dec 23, 2012 2:48 pm

Re: strategies for increasing sales of Epiphany III and IV

Postby xilman » Sat Feb 14, 2015 8:55 pm

piotr5 wrote:
xilman wrote:[division not parallellizable]Not true, in general.
Integer diivision has the same asymptotic complexity as multiplication.
it's true that multiplying with a fractional number is parallellizable. however, obtaining that multiplicative inverse of a natural number is not. fft might work for that, in theory, but no such algorithm is known. the reason is that fft is for multiplication modulo some number, so the multiplicative inverse wont be what you're looking for unless the result of the original division actually is supposed to be an integer. that's just my own opinion of course.

Sorry, but you really must change your opinion. Perhaps I didn't phrase it well enough before but division really is equivalent to multiplication to within a constant factor.

Here is a slightly more explicit hint. To evaluate p/q first, produce a Newton-Raphson iteration scheme which converges to 2^N/q for an appropriate value of N with twice the precision at each step. Then evaluate 2^N*p*(1/q) and apply any necessary (small and fast) corrections.

This link is somewhat more informative than my previous hint: https://en.wikipedia.org/wiki/Division_ ... er_methods
xilman
 
Posts: 80
Joined: Sat May 10, 2014 8:10 pm
Location: UK

Re: strategies for increasing sales of Epiphany III and IV

Postby piotr5 » Sun Feb 15, 2015 7:31 am

and what makes you think any newton method would be parallellizable? it is true that multiplication is equivalent in speed to division for procedual algorithms, but parallell algorithms are faster than procedual! in parallell you only need 3 steps for multiplication (fft, bit-multiply, un-fft), independent of whatever length -- as long as you match that length with cores, that's constant runtime. can you achieve constant runtime (provided infinite number of cores) for division too? in theory it seems you could (proof-idea: calculate p/q...(p-q+1)/q modulo 2^N in parallell and find the correct number for output), but since no good algorithm (with logarithmic core-count) is known yet, as of yet division is not considered parallellizable -- maybe in future it will be...
piotr5
 
Posts: 230
Joined: Sun Dec 23, 2012 2:48 pm

Re: strategies for increasing sales of Epiphany III and IV

Postby xilman » Sun Feb 15, 2015 10:41 am

piotr5 wrote:and what makes you think any newton method would be parallellizable? it is true that multiplication is equivalent in speed to division for procedual algorithms, but parallell algorithms are faster than procedual! in parallell you only need 3 steps for multiplication (fft, bit-multiply, un-fft), independent of whatever length -- as long as you match that length with cores, that's constant runtime. can you achieve constant runtime (provided infinite number of cores) for division too? in theory it seems you could (proof-idea: calculate p/q...(p-q+1)/q modulo 2^N in parallell and find the correct number for output), but since no good algorithm (with logarithmic core-count) is known yet, as of yet division is not considered parallellizable -- maybe in future it will be...
Ok, let's get ever more explicit.

You agree that multiplication is parallelizable. Good.

I could re-type the contents of https://en.wikipedia.org/wiki/Division_ ... n_division but that seems like needless labour. The summary is: given an estimate X_i to 1/D the next estimate is X_i(2-DX_i). The two multiplications can be parallelized as you already agree.

The final multiplication by the reciprocal can also be parallelized.

Each step of the algorithm can be parallelized. Ergo the entire algorithm can be parallelized.
xilman
 
Posts: 80
Joined: Sat May 10, 2014 8:10 pm
Location: UK

Re: strategies for increasing sales of Epiphany III and IV

Postby piotr5 » Sun Feb 15, 2015 8:44 pm

I understand now! guess I was too fixated on constant runtime. suppose logarithmic runtime is ok too. also, when I said newton is not parallellizable I meant the recursive plugging-in of the old values, didn't think of parallellizing the other operations. you are right, the algorithm is parallellizable in theory. suppose those chip-developers meant practical obstacles then...
piotr5
 
Posts: 230
Joined: Sun Dec 23, 2012 2:48 pm

Re: strategies for increasing sales of Epiphany III and IV

Postby amity soft » Sat Mar 28, 2015 10:34 am

Hi Friends,
I am new to this forum, happy to join here.
Thanks for all your information, they are very useful to me..
amity soft
 
Posts: 1
Joined: Sat Mar 28, 2015 10:29 am
Location: Chennai


Return to General Discussion

Who is online

Users browsing this forum: No registered users and 5 guests

cron