#BigData, Parallella, Clustering

#BigData, Parallella, Clustering

Postby ParallelANN » Tue Mar 31, 2015 8:57 am

Dear all,

I am the new one. I am an electrical engineer based in Germany and I am working on a topic in the wide field of "BigData", "Industry 4.0" and "Artificial Neural Networks". I am afraid I can only say those catchwords, but nothing more :)

We have a software that is running on Windows. My plan is to speed this thing up by using parallel computing. The parallela seems to be perfect, because it offers a lot of computational power with a low price.

Now I have several questions and I hope to find answers here.
1) Is it possible to run a windows based cluster with parallella? There is a blog entry where a Beowulf Cluster has been build, but that is running on Linux. We don't want to offer a Linux system because so far everything is running on Windows. And because we're not working in a scientific field, most users cannot handle Linux.

2) What do I have to keep in mind when building parallel applications with parallella? My imagination is as follows: the software is running on a main node. The software has an interface to some sort of cluster driver and tells him to perform the current computation on another node. The cluster driver manages the nodes and the queries from the software. In doing so, it would be relatively easy to run an algorithm, that is designed for parallel computing, on several parallela boards. Is this imagination too simple?

3) Am I right that if I am building a cluster with 15 parallela Boards I end up with 240 Epiphany cores (plus 15 ARM cores) for 1500$. The 240 Epiphany cores would be responsible for the computation. If so, is it better to use a Nvidia graphics card with lots and lots of cores or a parallella cluster?


I thank you in advance for your support.
Regards
ParallelANN
 
Posts: 3
Joined: Tue Mar 31, 2015 8:40 am

Re: #BigData, Parallella, Clustering

Postby 9600 » Tue Mar 31, 2015 11:00 am

ParallelANN wrote:1) Is it possible to run a windows based cluster with parallella? There is a blog entry where a Beowulf Cluster has been build, but that is running on Linux. We don't want to offer a Linux system because so far everything is running on Windows. And because we're not working in a scientific field, most users cannot handle Linux.


DAB-Embedded did a port of Windows Compact 7 (CE) to Parallella.

What do I have to keep in mind when building parallel applications with parallella? My imagination is as follows: the software is running on a main node. The software has an interface to some sort of cluster driver and tells him to perform the current computation on another node. The cluster driver manages the nodes and the queries from the software. In doing so, it would be relatively easy to run an algorithm, that is designed for parallel computing, on several parallela boards. Is this imagination too simple?


That sounds reasonable and there are various tools and frameworks available for distributing workloads across multiple nodes, e.g. MPI. You need to bear in mind that by default applications will only use the ARM processor, and you need to use the Epiphany SDK to develop programs which are in two halves and run on the ARM and Epiphany, with communication between them. As would be the case if you developed applications for an x86 machine equipped with a GPU.

Am I right that if I am building a cluster with 15 parallela Boards I end up with 240 Epiphany cores (plus 15 ARM cores) for 1500$. The 240 Epiphany cores would be responsible for the computation. If so, is it better to use a Nvidia graphics card with lots and lots of cores or a parallella cluster?


It depends, it's impossible to make generalisations and simple core count comparisons are meaningless. Epiphany is MIMD, whereas GPUs are SIMD (and thus far less flexible). An Epiphany chip consumes ~2W, whereas GPUs typically much more. So it depends on your particular application and whether it's suited to an architecture and power constraints. There are other considerations too, such as whether you're happy to use a proprietary API such as CUDA, or require an open API and fully open source toolchain.

In short, I'm afraid that you're going to have to read the documentation and make your own mind up :)

Regards,

Andrew
Andrew Back (a.k.a. 9600 / carrierdetect)
User avatar
9600
 
Posts: 997
Joined: Mon Dec 17, 2012 3:25 am

Re: #BigData, Parallella, Clustering

Postby ParallelANN » Tue Mar 31, 2015 2:00 pm

Hello Andrew and thank you very much for your quick answer,

1) I will check the youtube video at home.

2) Good to read that my imagination is not too far from reality. However, it sounds like that porting a existing software onto a Parallella cluster system is much more complicated than imagined. To me it sounds like that it's better to start from scratch.

3) And here's problem 2. I am an electrical engineer who did much with embedded systems and automotive. But not too much because I am not an computer scientist. Therefore, I have a good knowledge about computers and processors, but no experience with multicore cpu architectures. However, I can imagine that the Parallella chip is more flexible than a GPU because the GPU is designed for a specific problem whereas the Parallella has no specific problem it is designed for.

I think for us it would be easier to make the software CUDA compatible instead of porting it onto Parallella. Otherwise, the R&D costs would explode :ugeek:
ParallelANN
 
Posts: 3
Joined: Tue Mar 31, 2015 8:40 am

Re: #BigData, Parallella, Clustering

Postby 9600 » Tue Mar 31, 2015 2:38 pm

ParallelANN wrote:Good to read that my imagination is not too far from reality. However, it sounds like that porting a existing software onto a Parallella cluster system is much more complicated than imagined. To me it sounds like that it's better to start from scratch.


It may or may not be difficult to port your application, it's difficult to say. If you are 1) looking to build a cluster of systems anyway and 2) have some floating-point intensive and highly parallelisable part of the application which is to be offloaded to said cluster, then it may not be "that hard", or at least when compared with doing the same thing using some other platform for cluster nodes.

I think for us it would be easier to make the software CUDA compatible instead of porting it onto Parallella. Otherwise, the R&D costs would explode :ugeek:


Well, I don't know your particular application and even if I did, I'm not the best person to make a judgment. However, if it's not using CUDA now and work would be required anyway, it's perhaps worth looking at the pros and cons of each approach.

Regards,

Andrew
Andrew Back (a.k.a. 9600 / carrierdetect)
User avatar
9600
 
Posts: 997
Joined: Mon Dec 17, 2012 3:25 am

Re: #BigData, Parallella, Clustering

Postby ParallelANN » Tue Mar 31, 2015 3:10 pm

To me the implementation sounds more complicated than using CUDA. If I use CUDA I have an defined interface that can be implemented into the application "relatively easy". If I am using a Parallella cluster, I need to re-program the existing software so that it is communicating as the main node (or with the main node), and I also have to program the Parallella boards so that the ARM and the Epiphany cores work together (that's what it sounds like). Its hard for me to weigh the pros and cons due to the missing experience with both systems.

We don't want to build a cluster anyway. We want to speed up our algorithm because we have to process big amounts of data. Right now we are trying to parallelize the algorithm. I think it'd be the best to use CUDA.
ParallelANN
 
Posts: 3
Joined: Tue Mar 31, 2015 8:40 am

Re: #BigData, Parallella, Clustering

Postby piotr5 » Tue Mar 31, 2015 11:42 pm

I really know nothing about these things. but I know: before making a decision first lay out a plan of what actually needs to be done, and start doing stuff that has lowest costs or worktime -- thereby postponing your expensive decisions. such notions like "relatively easy" and "more complicated" are pure speculation till you actually started doing some real stuff and tried out the options you have...
piotr5
 
Posts: 230
Joined: Sun Dec 23, 2012 2:48 pm


Return to Clustering

Who is online

Users browsing this forum: No registered users and 1 guest

cron