Guide for developers
. Sun Studio 11
> Free!
> Download
http://developers.sun.com/prodtech/cc/downloads/index.jsp > High performance C/C++/Fortran compiler
> More optimization options than gcc
> Full IDE environment for debugging and performance
analyzing
> Available on Solaris sparc, Solaris x86 and Linux
> Default in /opt/SUNWspro/bin
> cc (c compiler), CC (C++ comiler)
> dmake (distributed make tool)
> dbx (debugger that supports multi-threading programs well)
> cc -flags (show all flags help)
> Compiler options for different target
– -xarch=v9 (64-bit application solaris sparc)
– -xarch=v8 (32-bit application solaris sparc)
– -xarch=amd64 (64-bit application solaris x64 on AMD Opteron)
> cc -xdryrun (show compiler commands and options to be used
but no compilation)
> Performance analyzer
> Profiling tool that allows you to drill down into source codes and
assembly codes and quickly find the application bottle-neck.
> Run the user process using "collect" to collect the profiling
data
– % /opt/SUNWspro/bin/collect -L unlimited -A copy -F on -d
/export/home/analyze <cmd>
> Then run analyzer GUI tool
– % cd /export/home/analyze
– % /opt/SUNWspro/bin/analyzer test.3.er
. Sun Studio 11
> Some C/C++ compiler optimizations
> -xarch=xxx (specify the cpu architecture)
> -xO4 (optimization level is 4)
> -fast (a combination of multiple optimizations, be careful for
float-pointing codes)
> -lsunperf (using Sun performance library)
> -xipo (cross-file optimization)
> -xlinkopt (optimization during linkage)
> -xautopar -xloopinfo (find and parallel loops)
> -xprefetch_level=3
> Profiling feedback based optimization
> compile with
– -xprofile=collect -xO4
> run test and generate profiling data in ./mycode.xxxx
> recompile with profileing feedback
– -xprofile=use:./mycode -xO4
> Re-run test and see performance improvements!
. Sun Gcc
> Feature compatible with gcc 4.0.2
> Debuggable with gdb and dbx
> Through advanced optimizations tuned to Sun systems
> Extra optimizations such as -xipo, -xprefetch, -xprofile
> Free!
>
http://www.sun.com/download/products.xml?id=43fb4c75 > By default, installed in /opt/sfw/bin
. ATS (Automatic Tuning)
> Imagine you don't have 3rd party source codes, how do
you optimize it?
> ATS can recompile and optimize the binaries directly.
> Can also be used for
> Which porting is easier?
> 32-bit -> 32-bit or 64-bit -> 64-bit (Easy)
> 32-bit <-> 64-bit (Difficult)
> Little Endian -> Little Endian (e.g. DEC Alpha <-> x86) or
Big Endian -> Big Endian (Easy)
> Little Endian <-> Big Endian (Difficult)
> 32-bit <-> 64-bit
> Be careful of different size for pointer type and long type
between ILP32 and LP64 data models.
> LP64
– pointer: 8 byte, long: 8 byte
> ILP32
– Pointer: 4 byte, long: 4 byte
> x86<->sparc
> Be careful of Little-endian (x86) and Big-endian (sparc)
problem on byte-operations in source codes.
> Be careful of word-alignment issue: (see next pages)
> x86<->sparc
> Be careful of word-alignment issue:
1 #include <stdio.h>
2 #include <malloc.h>
3 main(int argc, char *argv[])
4 {
5 void *p = malloc(10);
6 p++;
7 int *i;
8 i= (int *)p;
9 *i=3;
10 }
> x86<->sparc
> Using gcc to compile and run this code on both x86 and sparc
> You'll see core dump on Solaris sparc with gcc compilation
– Why? Because Line 8: i= (int *)p; the address of “i" is not aligned
(the address cannot be divided by sizeof(int))
> Try to use Sun compiler to compile and run this code on sparc
– Not core dump because the default -xmemalign=8i is used for 32-bit
application (use cc -xdryrun can show this), and the compiler inserts
interpret codes.
– If you compile with -xmemalign=8s (default for 64-bit application), the
code will core dump
> Misalignment will cause performance penalty because trap
handling is slow.
> C++ templates
> Different compilers have different supports on C++ templates
> If you use some advanced features of template, you might have porting problem.
> Platform-dependent libraries/system calls/signals
> You have to read documents for both source platform and
target platform to understand the differences and workarounds.