Clustering and temperature variation

Forum for anything not suitable for the other forums.

Clustering and temperature variation

Postby optimaler » Tue Apr 29, 2014 5:36 am

I promised I would get some data later tonight regarding heat distribution on my cluster setup. Here's a pretty graph.

clustertemps-01.jpg
clustertemps-01.jpg (141.37 KiB) Viewed 12761 times


The graph shows the temperature of each board in the stack, with the position in the legend corresponding to the vertical position in the stack. The variation in temperatures generally accompanies different positions that I tried, with each position getting about 30 seconds to normalize. I'll have a more thorough setup in later days this week.

I'm not an engineer, but it looks like for stacked boards the temperature variation depends pretty heavily on board position in the stack and relative position of the fans. This should be pretty obvious (like, really obvious), but the graph somewhat quantifies this.

I will post my scripts and additional experimental data later when I can approach it more seriously, but I need to go to bed right now. :P
optimaler
 
Posts: 24
Joined: Mon Dec 17, 2012 3:29 am

Re: Clustering and temperature variation

Postby 9600 » Tue Apr 29, 2014 8:40 am

It would be great if you could share your scripts and I'm just wondering how to best to collect together, and perhaps even package, utilities like this. Rather than adding them to the parallella-examples repository, maybe we should have another called parallella-utils, which contains system admin stuff instead of parallel programming examples. These could then be packaged and made available in the default install and updated via a Parallella APT repo.

Then there are also existing system admin and monitoring tools etc. that we could choose to integrate with.

Thoughts?

Cheers,

Andrew
Andrew Back (a.k.a. 9600 / carrierdetect)
User avatar
9600
 
Posts: 997
Joined: Mon Dec 17, 2012 3:25 am

Re: Clustering and temperature variation

Postby greytery » Tue Apr 29, 2014 10:20 am

@optimaler: I just love charts! And it looks like you have all 8 boards linked up now.

The stability of a Parallella seems to depend, in part, on keeping temperatures stable and low. The differences shown here for each board are significant - and a tad high - and it shows that this is an important project. Go for it!!!

You say you tried a few things, such as different standoffs, positions of the fans, etc. Hopefully - after you've had some sleep - you'll post a few more pictures with the results.

Your picture at http://forums.parallella.org/viewtopic.php?f=45&t=838&start=30#p7074 shows two fans stacked vertically. Lots of air pushed - but inefficently.
Comparing with the embecosm stand/approach {as shown by @9600} at http://www.thingiverse.com/thing:283432, have you tried a push/pull arrangement of the two fans?

Looks like some DIY ducting/casing would help too. Nothing too ugly, of course :D

@9600 : separate parallella-utils - gets my vote.

tery
tery
User avatar
greytery
 
Posts: 205
Joined: Sat Dec 07, 2013 12:19 pm
Location: ^Wycombe, UK

Re: Clustering and temperature variation

Postby shodruk » Tue Apr 29, 2014 10:25 am

Interesting! :o
Shodruky
shodruk
 
Posts: 464
Joined: Mon Apr 08, 2013 7:03 pm

Re: Clustering and temperature variation

Postby theover » Tue Apr 29, 2014 12:49 pm

Some people can predict these things (temperate simulation is a normal part of PCB design...).
theover
 
Posts: 174
Joined: Mon Dec 17, 2012 4:50 pm

Re: Clustering and temperature variation

Postby optimaler » Tue Apr 29, 2014 6:33 pm

@Adapteva and theover: Did anyone on the Adapteva crew do thermal simulations on stacked boards? I know you did measurements on single boards with IR cameras.

@greytery I'll rerun some of the experiments later today hopefully, but I will comment on the Embecosm approach (and maybe they can comment on the issue as well). I did try push/pull a little bit after I posted last night, and it didn't seem to have much of an impact on bringing temperatures down. My current thinking is that the fan setup needs to be push/pull with an enclosure, which is pretty obvious from a standard practices viewpoint. Like you say, the stacked fans are pushing air with terrible efficicency, especially when you consider all the junk in the way of the airflow. Tonight I am planning on building a cardboard enclosure to test the theory out.

@9600 A utils directory would be pretty good, although I think we should be cautious about dependencies and duplicating functionality with other existing utilities.

In any case, since we don't have that repo going right now, here are the scripts for the curious and impatient. Please comment on problems (and credit to ubii for writing the original ztemp script). Run watchtemp.sh for a little while, then process the output (`hostname`-temps) with the python script aggregate.py.


Code: Select all
#!/bin/bash
#adapted from the ztemp script written by ubii
#(see: http://forums.parallella.org/viewtopic.php?f=23&t=930&p=6242#p6242 )
#written by optimaler on 4/29/2014
#beware! this script overwrites previously generated data
#make sure to rename files before re-running

fname=`hostname`-temps
mv $fname $fname.bak

init=`date +%s%N`
while :
  do
   raw=`cat /sys/bus/iio/devices/iio:device0/in_temp0_raw`
   offset=`cat /sys/bus/iio/devices/iio:device0/in_temp0_offset`
   scale=`cat /sys/bus/iio/devices/iio:device0/in_temp0_scale`
   t=`date +%s%N`

   c_temp=`echo "scale=1;(($raw + $offset) * $scale) / 1000" | bc`
   f_temp=`echo "scale=1;(($c_temp * 9) / 5) + 32" | bc`
   d_t=`echo "scale=2;($t-$init) /1000000000" | bc`
   
   echo "$d_t : $c_temp C / $f_temp F" >> $fname
   
   sleep 0.2
done


Code: Select all
#
#aggregate.py, written by optimaler 4/29/2014
#
#Note: this requires the SciPy stack;
#you can get it on Parallella using
#sudo apt-get install python3-scipy
#
#
#This script takes multiple output files from watchtemp.sh
#as arguments and plots them nicely:
#
#python aggregate.py <name1> <name2> <name3> ...
#
#You may need to adjust the plot height manually
#to make it look nice. See below for details
#

from matplotlib import *;
from pylab import *;
import numpy as np;
import sys;


colors = np.r_[np.linspace(0.1,1,len(sys.argv)), np.linspace(0.1,1,len(sys.argv))]
mymap = plt.get_cmap("jet")
# get the colors from the color map
my_colors = mymap(colors)

maxX = 0;
for i in range(1,len(sys.argv)):
  name = sys.argv[i];
  infile=open(name, 'r').readlines();
  x = []; y = [];
  for k in infile:
    fields = k.split()
    if( float(fields[0]) > maxX ):
      maxX = float(fields[0])
    x += [ float(fields[0]) ] # time
    y += [ float(fields[2]) ] # C degrees
  plot(x,y,'o-',markersize=1,label=name,color=my_colors[i])

xlabel('Time, s')
ylabel('Temp, C') 
legend(prop={'size':6})
#show()
F = gcf()
#Adjust the height of the plot here in the second argument
F.set_size_inches(maxX/15, 5) 
F.savefig( "sampling.png", dpi = 250);

optimaler
 
Posts: 24
Joined: Mon Dec 17, 2012 3:29 am

Re: Clustering and temperature variation

Postby FHuettig » Wed Apr 30, 2014 4:04 am

optimaler wrote:@Adapteva and theover: Did anyone on the Adapteva crew do thermal simulations on stacked boards? I know you did measurements on single boards with IR cameras.

I'm pretty sure no analysis was done for stacked boards, but I also don't think it would be very useful. The boards within the stack, given any forced airflow, are unlikely to influence each other (radiation won't be a big contributor IMHO). I suspect the issue will simply be how much airflow there is for each board, particularly near the Zynq, with minimal change when any other board is powered on or not. There can be some heat conducted through the mounting holes if metal posts/standoffs are used, but again I think it will be small compared to what gets pulled by the airflow for any reasonable fan arrangement.

I may be disagreeing with @greytery a bit by saying that the temperatures shown in your plot are quite reasonable, these are temperatures measured in the actual Zynq silicon, essentially junction temps, so 50-60C should be just fine I think. The deltas between the boards (~10C highest-to-lowest) may be significant and I'm sure could be reduced by adjusting the airflow patterns, but it would not worry me.

@9600, I support a separate -utils repo, if there had been one I would have submitted xtemp there instead of -examples, which I think should be for epiphany-based examples, but I also don't think it's a huge deal.

Cheers,
Fred
-- Fred -- Hardware Guy --
FHuettig
 
Posts: 142
Joined: Wed Jan 29, 2014 8:30 pm
Location: Lexington, MA, USA

Re: Clustering and temperature variation

Postby optimaler » Wed Apr 30, 2014 4:25 am

I have more data. I apologize for the long post. TL;DR - Use an enclosure with push/pull fans, the thermal compound on the heatsinks is not thermal compound; using real thermal compound decreases temperatures by up to 4 deg C.

I built a cardstock case to direct the flow of air more cleanly, shown below. The setup is push/pull; I tried to block as many openings as possible. The end with the pulling fan is detachable so I can access the reset buttons (I am still having grief with eth0 not coming up properly).

enclosurecollage-small.jpg
enclosurecollage-small.jpg (157.43 KiB) Viewed 12652 times


The graph below shows the temperatures with and without the pull fan attached to the enclosure. The transition occurs around 60 s, leveling off around 120 (as best I can tell). Like in my original post, the vertical board position corresponds to vertical position in the legend.

enclosedairflow.jpg
enclosedairflow.jpg (140.41 KiB) Viewed 12652 times


My observations:
1) The enclosure makes the cooling more consistent, and for the most part drops the temps about 2 deg C.
2) Using push/pull instead of just push with the enclosure gives approximately 2 deg C additional drop.
3) I wondered why board 5 was running hot compared to the other boards. The best hypothesis I can come up with is that the chip runs slightly hot compared to the others; I examined the placement of the heatsinks to see if there was any observable difference.

I randomly (and accidentally) pulled one of the heatsinks off my boards. this seemed weird to me, because I thought the silver stuff was a thermal compound. Ha, no, it's some kind of tin foil.

So I conveniently have some Artic 5 silver compound sitting around, so I decided to see what the temperature difference between the two might be. I scraped off the tin foil, cleaned off the heatsinks with 70% isopropyl, and applied artic silver. I put single board in my enclosure, with and without Arctic 5 applied. Here are the results:

thermalcompound.jpg
thermalcompound.jpg (59.04 KiB) Viewed 12652 times


As you can see, you can decrease the temperature by about 4 C by using proper thermal compound, versus weird silver foil stuff. I don't necessarily endorse this modification of the setup, but honestly it makes a non-trivial difference. Maybe someone in the Adapteva crew can endorse the use of thermal compounds so everyone can enjoy lower (and safer) temps?

At this point, the only experiment left that I can think of to try is longer standoffs, but I don't have any at the moment. I'm not sure how much of an impact that would make, either. Short of liquid cooling, I think this is probably the best anyone can do (although, someone please prove me wrong).
optimaler
 
Posts: 24
Joined: Mon Dec 17, 2012 3:29 am

Re: Clustering and temperature variation

Postby greytery » Wed Apr 30, 2014 11:30 am

@FHuettig - thanks, we have no real disagreement at all.
The temps ARE well within the 85 degree spec for the Zynq 7000 series, even with the 'stock' 'tin foil' attached heatsinks.
I suppose it's because I have a legacy mentality from being a computer hobbyist for decades. Heatsinks the size of grapefruits were once (are still?) the norm for an overclocked PC. :geek:
Still, something tells me that keeping a stable, low/lowish, temperature for any electronics system increases its life and uptime. Every little helps.
And it's a bit ironic that we're talking about a heat issue on an ARM-based chip, when the Parallella is a platform to demonstrate the low power/low heat Epiphany.

@optimaler - good stuff.
Card Stock sure beats 3D Printing when it comes to meeting deadlines!! Hope those are anti-static socks.

Larger standoffs will increase the overall height and you'll probably have to go back to stacked fans - or say, clusters of four per 'box' - or, say, different ducting between the boards. {....which can easilly get out of hand.}

... water cooling? Mmmmmm! :ugeek:

No surprise that properly applied Arctic silver is more efficient than the adhesive thermal tape, but thanks for the example and data. I'd planned on doing much the same.
The issue is how to safely and securely fix the heatsink when there are no holes/clips on the board. Card Stock won't help here, I'm afraid. ;)
From the above, the thermal tape approach is deemed 'good enough' to stay within the spec for the Zynq - so I guess we're in the post sales, modding, hobbyist area as far as Adapteva is concerned.

I also wonder why there is variation between the boards which doesn't seem to tally with board position.
Is it because of differences in the 'tin foil' attachment (my guess), or because the Zynq's production output varies this much (I doubt/hope not)?

Note: My boards will be based on the smaller 7010 array, and I suspect they would have a marginally lower operating temp. Also, like here, I won't be using the HDMI circuitry which would have pushed the temps up higher still.

tery
tery
User avatar
greytery
 
Posts: 205
Joined: Sat Dec 07, 2013 12:19 pm
Location: ^Wycombe, UK

Re: Clustering and temperature variation

Postby aolofsson » Wed Apr 30, 2014 12:26 pm

@gretery, @optimaler: Thanks for the great feedback. The Kickstarter boards are being shipped with the zynq 7020 engineering silicon (a long story...) which has an errata item that makes the Zynq standby (leakage) current higher than it should be (and somewhat variable between chips). We only have partial tracking of this data, but estimate is a delta of 0.1A for the zynq. Of course, leakage current is highly dependant on temperature, so the hotter it gets, the more it leaks..

Andreas
User avatar
aolofsson
 
Posts: 1005
Joined: Tue Dec 11, 2012 6:59 pm
Location: Lexington, Massachusetts,USA

Next

Return to General Discussion

Who is online

Users browsing this forum: No registered users and 7 guests