Parallella Community

by **piotr5** » Sat Nov 21, 2015 4:15 am

don't get me wrong, I too am in favour of neuronal networks, I've read what image-recognition google has implemented in TensorFlow. but imho it's much less wasteful to do neuronal networking in fpga or dedicated chips (which after so many years still hasn't happened). i.e. it should be a side-effect of connecting lots of hardware into one big supercomputer, leveraging the money people spend for entertainment into building something with more neurons than human has, letting it evolve in a technological sense. as for creating a sentient being, something that uses big data for fortune-telling, or something capable of using limbs, I believe you are wrong in assuming neuronal networks would be better fit for that than the traditionally used methods. of course hand-crafted is nonsense. but genetic algorithms or compressed sensing are the building-blocks for automatically crafting what otherwise would have been hand-crafted. and if something goes wrong, their output is much more easy to debug than some data stored in neuronal networks. you're talking of functions being too complex for us to understand, but isn't that complexity actually the raison d'etre for computer-algebra? fact is, computer-programs don't write themselves, one way or another you're doing the programming externally, it's merely a choice of programming-language. the programming language for neuronal networks is "sensory" input plus a per-neuron trigger telling it if the signal response is good or bad or unknown. why don't we program in the language of turing-machines? even isaac asimov didn't dare to paint a world where AI is trained by means we fail to understand scientifically. the science and mathematics of neuronal networking isn't well-understood yet, so if google called my toaster a chair then it isn't clear which input has caused that behaviour even less how to get rid of it. what we can do is recognize certain mechanisms in neuronal networks and model them as some field we understand better, but that's unnecessary additional work. the most important feature of any programming language is debugging, you don't have that to a full extent for neuronal networks yet!

in a course on AI I learned how mathematical problems get solved by computers: you draw a dependency graph for the formula you want to simplify, and then analyze the features of each node to find the ones you can quickly eliminate. simplifying the graph means you have simplified the formula. of course, internally the computer doesn't draw a graph on screen, it's using matrix-manipulations instead. but if you wish to debug the process, you'd better transform the matrices into graphs. our science is focused on formulas, so it should be quite natural to make use of the little we know about matrices for understanding how an autonomously learning machine works, instead of the even more scarce knowledge we have of neuronal networks. also sometimes machine-learning isn't really necessary for artificial intelligence. OCR doesn't really require a learning phase if you do have the used font available. i.e. the only learning that's happening is filterning out the various fonts from all the Big Data, and afterwards you can use grassman-manifolds to take care of non-linear deformations. in 2D this has been proven to be a good method. allegedly it could also be possible to do 3D processing like face-recognition that way...

I forgot to go into the topic of what I meant with compressed weight functions: already google is using TensorFlow to train a neuronal network in huge server-farms and then execute the learned in tiny smart-phones. so my idea is that after learning the weights should be rearranged in a way that you can represent them by formulas, so to say interpolate the formulas through the weights and find an approximation which results in the same weight being used. later on the cell-phone, the input would be compressed in the same way, creating a formula which is then folded with the weights to produce the output. sounds like a lot of work. but do that for many neurons per core simultaneously, and you'll be able to simplify that task by creating a procedure which passes input through many layers of neurons simultaneously. i.e. as you said, the idea of using hand-crafted formulas failed because the formulas are too complicated to be hand-crafted. so why not translate the resulting neuronal network into a neuronal network of complicated formulas? i.e. input transforms into a formula, then some complicated calculations are done and output is a single value representing the output of a whole cluster of neurons. so after the learning process this approach should be analyzed to figure out if the flops-count could be reduced that way...

by **dobkeratops** » Sat Nov 21, 2015 8:32 pm

by **piotr5** » Sun Nov 22, 2015 6:35 pm

sorry, guess I've assumed you're familiar with that idea: compress the weights into an analytic function, then compress the input into another analytic function, and then use analytic methods to get another function which does do what your neuronal network of choice is supposed to do. i.e. multiply functions instead of individual values to save on some calculations. i.e. you do work with the compressed data and don't uncompress it except for the purpose of learning. the idea is to do such calculations in bulk, multiple layers of neurons at once to obtain a single output from loads of data input by algebraic formulas and procedures. the hope is that some parts of the formula will cancel-out so you actually do save on operations, or that at least you can use other methods for doing multiple operations at once (i.e. use algorithms for calculating the power of some number instead of actually multiplying that many times).

also I should emphasize that analytic methods doesn't necessarily mean hand-crafted formulas. of course with movement you could hand-craft the basic laws of physics about the forces involved in limb-movement, but afterwards the computer must be responsible for sticking those formulas together into an actual optimization-problem. true movement is smooth and thereby the optimization problem can be solved with the help of calculating derivatives and asking for some inequality-properties. as you said, the experiment with the fly implies something must have gone wrong with predicting the fly's perception, most likely a mechanical problem. in the same way I guess (please someone do that experiment) a deterministic algorithm for teaching a robot walk by trial and error will also differ from its predictions based on laws of physics. (i.e. simulate the robot to obtain info on what sensory data it will obtain, and feed that to the same AI as used in the robot. then compare its movements.) as every engineer knows, reality doesn't obey the laws of physics, because there's a lot that could go wrong with the setup. so regardless of infallible calculations, you always need to test nonetheless. so you need to adapt as you move-along, recalculate the whole optimization problem and better yet discover how small changes in variables will affect the result. all that needs to be done by the computer, live, during the movement. movement doesn't fit a neuronal network because it's a dynamical problem, you must redo all the calculations you have already done billions of times because there are too many situations for the few neurons to store efficiently.

of course it's know that with neurons you can simulate a turing-machine. the interesting question is, what methods of doing so exist, and given a neuronal network how can you discover which neurons actually have emulated a turing machine in some way? same question should be asked about representing a mathematical formula inside a neuronal network (and my idea of compressing weights goes into that direction). your agumentation for using NN is a bit faulty, the reason there are the success stories is merely because it's easier to program a NN-simulation than to actually do the algorithm for a specific task. the difficulty is, each task requires knowledge about different areas of mathematics. just because people had success doesn't mean they couldn't have with other methods. of course NN can do various things, but the question is if it could do actual non-trivial stuff, things for which mathematicians needed centuries to develop. how long would a neuronal network need to learn such things? I agree that perception and limb-functions are among the more likely applications of NN, but what makes you think your NN would be any better at predicting the stock-market than software created by an actual student or graduate of an university for economy? this way it is a bit misleading to offer NN as the solution of everything, to make use of unused cores to perform some NN tasks in background. human needs years to learn a language, why would NN be any faster? you want to wait several years till a computer solves a problem that has gone obsolete by that time? IMHO people would quickly lose interest in NN once they notice how long it takes. if you want to promote NN, create a database with NN-backups for successfully learned NN-activities. but until people actually upload to such a database it definitely will not help parallella. the real problem here is we have information stored in many many different formats, NN is just one among them, and compared to the analytic approach we do not have much ready-made, trained NN for download available. and of course different input formats for NN only add to the huge jungle of ways to store one and the same set of data...

by **dobkeratops** » Sun Nov 22, 2015 9:26 pm

by **piotr5** » Mon Nov 23, 2015 8:26 am

I agree that parallella is fit for NN, and that it could boost interest in both. however I don't see it as an urgent goal for now since the area of problems NN can solve is quite small compared to what parallella is capable of. i.e. let's do that once the 1k core parallella is out. that's why I propose a new approach to NN where you use massively powerful gpgpus for training, and then transform the loads of otherwise linear transformations into small amount of polynomial operations and execute them on parallella 16. what parallella currently isn't capable of is big data, usb2 is quite slow for reading some terabytes of data required for training. so until that changes a read-only neuronal network should be the goal. the idea here is to just store all input-data and apply to it NN without training it any further, then on another computer (or over the network on some server) perform the training with the newly obtained input. what I propose is you take a section of the neuronal network, consisting of many layers and of all the dependencies they have and create some polynomial which performs exactly the same operations the NN would have done. but keep in mind, the polynomial then is acting on a whole sequence of inputs, i.e. it's acting on a function from natural numbers to reals. these inputs then can be interpolated into a function mapping reals on reals. and then parallella will perform polynomial operations with such functions in order to get a singlle value out (which then maybe is just an approximation of what a real NN would get). maybe along the way also some partial results can be sent to other cores which work on intersecting portions of the neuronal net. once all results of this layer are known, again next layers can form a cluster taking input from multiple parallella cores having done full or partial calculations on their portion of the neuronal net. and so on. that's the task parallella should perform. before that can happen you need to create a neuronal net through training on big data, and then figure out the optimal way to slice it up into sub-nets which allow for fastest execution on parallella. and also ways for calculating the partial results should be found here too. i.e. instead of emulating a whole neuronal net on the parallella, you create a transformation from neuronal nets to actual epiphany programs and implement it on some bigger computer with faster file-access. of course in general stubbornly transforming a neuronal net into a formula will end up eating more fpu-operations than the neuronal net approach, but the hope here is that the solution the server found is so well-behaved that some sub-nets can be simplified into that. the point is, it's no hand-crafted choice anymore if you truely transform some neuronal net into a formula which is relatively polynomial in comparison to the linear behaviour of the original NN...

as for auto-encoding, this whole idea sounds a bit like compressed sensing. it is true that systems with many cores will have most of them unused most of the time, but I doubt neuronal nets change anything about that. officially your cores will be occupied, but in reality they don't do much. for example someone used compressed sensing for facial recognition: store photos of yourself in a matrix, along with other people's faces, then figure out what vecore multiplied with that matrix will result in your image, and you'll realize that if the input was your photo the vector will use mostly data from your own photos while other entries are empty since other photos aren't used. well, of course, there are many possible vectors producing your input image, but you seek one with minimal amount of non-zero values. then you ignore what those values are and just look at their index to understand which face has the highest similarity to the photo provided. but of course you could then use the actual non-zero entries of the resulting sparse vector to recreate the input, instant compression based on some fixed set of available photos. but the more faces you're trying to recognize the bigger your matrix will get. bigger matrix means more operations need to be performed. i.e. a lot of the work being performed is quite useless. in what way is this any different from the inability to make use of all the cores?

I think the reason neuronal nets are popular again is not so much the speed of the computers but the speed and size of the storage devices. nonetheless in my observation it's mostly the big-data companies going into that direction. to increase popularity of parallella this isn't sufficient. it's true, some torrents do contain big data. for example a password-cracking software could be trained by a weighted list of passwords some hackers have accumulated. or there are massive collections of mp3s, books, photos and videos available. and storage is cheap enough to collect them. individual users could do a lot with them. but truely specialized data is rare, too difficult to acquire for the ordinary user. I really see little chance people will actually train NN with anything. it's like toying with statistics, you need a certain amount of data-samples or else your results will be flawed...

as I said I see robotics as the killer-app for parallella because of fpga and the power-efficient fast computation units. a robot doesn't have much moveable parts and therefore analytic methods are sufficient for exploring movement even without neuronal nets. but of course a combination of these methods might be best. human does need a lot of neurons for doing movement. makes one think: maybe neuronal nets are not the optimal tool for that task? when learning literacy human usually spends most of the time drawing the alphabet. so compared to OCR robotics really is too complicated for that approach. and already with ocr, how comes there's no neuronal network available for circumventing all those captcha tests? NN has popularity on their side, more obscure methods for ocr also exist. but yet, people don't research into that direction any further. automatic OCR still makes a lot of mistakes on lower resolution, and on higher resolution it's incredibly slow. so I must agree with you that an n-body system of limbs and joints is very complex and requires some level of approximation to fit the currently available ressources. mut for approximation there are many possibilities, doesn't have to be NN!

turning parallella into pci card isn't that good an idea. well, it is possible to attach it to the otherwise unused old pci slots (as opposed to pcie), maybe it'd even fit under some dual-slot graphics cards. but theoretically parallella is capable of much faster communication than that. however, who would use that anyway? what for? a robot is a funny toy, a pleasure to observe, maybe even useful once it's become smarter. so far the only robots in common use are cars, and even they aren't allowed to drive autonomously. next comes quad-copters, they too are quite restricted in their freedom if they carry a camera. cell-phones, laptops and tablets accompany us all our life too, but they aren't robots. therefore I see huge potential for whatever products which are small and move autonomously, the market in that area is by far not saturated, we even have no idea yet what uses something alike to the robot-arm in iron-man could have in our everyday lives. I think the hype into 3d-printers could easily turn into a hype for (self-printed) parallella-powered robots...

by 8l » Mon Nov 23, 2015 12:31 pm

intel's two product shows their direction.
one is for multicore, other is for fpga(from altera) fusion with cpu.
http://wccftech.com/intel-sc15-knights- ... fications/
http://www.pcworld.com/article/3006601/ ... -year.html

to win intel, somehow have to overcome price–performance ratio with innovative approach software/hardware aspect..

i bet on rose's law:
https://www.flickr.com/photos/jurvetson/8054771535

by **dobkeratops** » Mon Nov 23, 2015 4:15 pm

by **piotr5** » Wed Nov 25, 2015 11:07 am

my critique still stands, most people wont use NN because the voice-samples they provide isn't enough for teaching the computer, and thereby the learning-speed will be too slow to be useful, compared to pre-trained systems. I'm certain parallella is capable of understanding spoken words, my old 200Mhz computer was too. but for practical use there are 2 major problems which a NN can't solve over night: voice-detection in the sense of focusing on a single speaker among many in a bar or night-club, and dialect-lore in the sense of being able to decode the various known language-shifts and sloppy pronounciation. as you said, the reason why a computer can learn faster than human is because it can read BigData. but while sitting on a personal computer, there is no big data available, learning speed then naturally is the same as for human, i.e. don't expect your dictation-program to work properly anywhere before 2 years of active learning have passed...

keep in mind, most problems in reality have a complexity higher than an order of 50, and to produce a polynomial of degree 50 you need more than 4 layers, no matter if it's NN or boolean algebra graphs. so quite naturally each core should handle multiple layers at once, to extend the depth of problems solveable. not saying an approximation containing less than 50 parameters is bad, what I say instead is that you need to work with a polynomial of degree 18 to simplify the 3rd root of some square-roots. i.e. you can only approximate smooth systems, stuff that already has a good linear approximation, discrete input however is much too chaotic to even consider those polynomial approximations. all I just said of course is meant only informally, it is my intuition which tells me that artificial intelligence needs a certain degree of depth for actually solving stuff for which no general-purpose mathematical solution is known...

as you know I agree that gamers and web-admins should be the major customers of parallella. but I still fail to see how they would get anything out of parallella which they didn't get from other cheaper hardware. IMHO standalone parallella should become a gaming-platform. there's been 3d games long before graphics-cards and thousands of Mhz. just attach a camera as input-device and some portable display as output, maybe put all that into a pair of shoe-box 3d-goggles with acceleration-sensor and make money with games stored on sdd. whatever powerful gaming-consoles can offer the graphics, but parallella can run without cables on battery for many hours. who'd really want to swap batteries every 10 minutes of gaming? (do the math, 15V at 5A is 15 times 3000mAh batteries per hour. at 1A you'd have 3 batteries per hour too, that's a bit too bulky for a wearable console. parallella with display would consume so little you'd use up one battery per hour. of course die-hard gamers want to play for about 10 hours without interruption...) only problem here is you'd need to actually invest some work into building such a system and adapt it to the individual gamer's shape. so it again boils down to robots giving the gamers an individualized solution instead of one-size-fits-all helmets. similar with web-services, portable servers in your jacket is what parallella could offer through its small size and low power-consumption. add to that software-defined radio and your server is untrackable because of switching connection-type frequently. a whole cloud of such servers then could really make it impossible to censor their content. nonetheless every radio has an antenna who's movement determines signal-strength. also independence from power-grids should be a major design-goal for any physically hidden server. following light-sources and following signal-strength, all that requires some kind of robot performing miniscule movements under your jacket to tell you which direction you should face...

but before you can clean your bike with a robot, you first need to solve the problem with emulating the shoulder-joints and its muscles, as I mentioned in the other thread. physical access of a half-sphere is the minimum requirement for any kind of autonomous surface-massaging. would parallella have good management of power (i.e. solar-panels and battery working together), then more people would be inspired to create their own mechanical solution to that problem in order to move the solar panel when it isn't mounted on solid ground. and eventually you'll find one which actually solves your bike-cleaning problem.

and I emphasize again, I see no use of parallella in a pci slot. mobile memory slot or emm slot, that would make sense for mobile applications, but pci is far away from making use of parallella's strengths. yes I too would wish a board with 15x15 epiphany chips and 60 eLink cables (each with 60 pins for a total of 3600 "wires") allowing for up to 60Ghz input and 60Ghz output. that's an application where only 16 core epiphany chips can be used. only question is, what system would actually be capable of performing those 120G transactions per second? a 3 core 64-bit system clocked at 5Ghz? 6 core at half that clock? when you already have that, what is epiphany then good for? another 4Tflops? or is it the 28MB you want at extra high speed? and how would you actually connect all those 3600 outbound eLink wires? pci definitely is not the way to do that...

by **dobkeratops** » Wed Nov 25, 2015 7:12 pm

by **piotr5** » Thu Nov 26, 2015 10:31 am

Parallella Community

TensorFlow

Re: TensorFlow

Re: TensorFlow

Re: TensorFlow

Re: TensorFlow

Re: TensorFlow

Re: TensorFlow

Re: TensorFlow

Re: TensorFlow

Re: TensorFlow

Re: TensorFlow

Who is online