don't get me wrong, I too am in favour of neuronal networks, I've read what image-recognition google has implemented in TensorFlow. but imho it's much less wasteful to do neuronal networking in fpga or dedicated chips (which after so many years still hasn't happened). i.e. it should be a side-effect of connecting lots of hardware into one big supercomputer, leveraging the money people spend for entertainment into building something with more neurons than human has, letting it evolve in a technological sense. as for creating a sentient being, something that uses big data for fortune-telling, or something capable of using limbs, I believe you are wrong in assuming neuronal networks would be better fit for that than the traditionally used methods. of course hand-crafted is nonsense. but genetic algorithms or compressed sensing are the building-blocks for automatically crafting what otherwise would have been hand-crafted. and if something goes wrong, their output is much more easy to debug than some data stored in neuronal networks. you're talking of functions being too complex for us to understand, but isn't that complexity actually the raison d'etre for computer-algebra? fact is, computer-programs don't write themselves, one way or another you're doing the programming externally, it's merely a choice of programming-language. the programming language for neuronal networks is "sensory" input plus a per-neuron trigger telling it if the signal response is good or bad or unknown. why don't we program in the language of turing-machines? even isaac asimov didn't dare to paint a world where AI is trained by means we fail to understand scientifically. the science and mathematics of neuronal networking isn't well-understood yet, so if google called my toaster a chair then it isn't clear which input has caused that behaviour even less how to get rid of it. what we can do is recognize certain mechanisms in neuronal networks and model them as some field we understand better, but that's unnecessary additional work. the most important feature of any programming language is debugging, you don't have that to a full extent for neuronal networks yet!
in a course on AI I learned how mathematical problems get solved by computers: you draw a dependency graph for the formula you want to simplify, and then analyze the features of each node to find the ones you can quickly eliminate. simplifying the graph means you have simplified the formula. of course, internally the computer doesn't draw a graph on screen, it's using matrix-manipulations instead. but if you wish to debug the process, you'd better transform the matrices into graphs. our science is focused on formulas, so it should be quite natural to make use of the little we know about matrices for understanding how an autonomously learning machine works, instead of the even more scarce knowledge we have of neuronal networks. also sometimes machine-learning isn't really necessary for artificial intelligence. OCR doesn't really require a learning phase if you do have the used font available. i.e. the only learning that's happening is filterning out the various fonts from all the Big Data, and afterwards you can use grassman-manifolds to take care of non-linear deformations. in 2D this has been proven to be a good method. allegedly it could also be possible to do 3D processing like face-recognition that way...
I forgot to go into the topic of what I meant with compressed weight functions: already google is using TensorFlow to train a neuronal network in huge server-farms and then execute the learned in tiny smart-phones. so my idea is that after learning the weights should be rearranged in a way that you can represent them by formulas, so to say interpolate the formulas through the weights and find an approximation which results in the same weight being used. later on the cell-phone, the input would be compressed in the same way, creating a formula which is then folded with the weights to produce the output. sounds like a lot of work. but do that for many neurons per core simultaneously, and you'll be able to simplify that task by creating a procedure which passes input through many layers of neurons simultaneously. i.e. as you said, the idea of using hand-crafted formulas failed because the formulas are too complicated to be hand-crafted. so why not translate the resulting neuronal network into a neuronal network of complicated formulas? i.e. input transforms into a formula, then some complicated calculations are done and output is a single value representing the output of a whole cluster of neurons. so after the learning process this approach should be analyzed to figure out if the flops-count could be reduced that way...