TFM:Compiling UNIX Software
From ProgSoc Wiki
Compiling UNIX Software
Wowsers!! You just found a beaut new program on the Net... but what's this? it's not compiled!!! ARRGGHHH!!!
So what's wrong with these UNIX people?? Is distributing software that's already compiled against their religion or something? Do they think we're all C gurus, or even have the inclination to become one???
Well there are a few reasons why freeware/shareware UNIX software is distributed in source code (uncompiled) form:
- A pre-compiled program will only work for the platform it was compiled for. Since UNIX is available in dozens of different platforms and flavours, they would need to provide a number of different distributions for the same program. However with source code, the one distribution can work for all supported platforms, so it saves a lot of space on FTP sites.
- With a source distribution, you can decide what features you want compiled into the program, and which ones to leave out. You can also make other decisions that can only be made at compile time, such as where to look for special data files.
- With the source code available, any programmer can examine the code and fix bugs or make extensions to the original program, and in doing so help save the world!
Note that nowadays you can sometimes find pre-compiled distributions of programs, and while this makes installation simpler, they may be unusable if you do not have permissions to install it in the required directories (this usually requires root access).
But you don't need to be a C wizard to compile UNIX software!! All it takes is some basic knowledge of UNIX --- if you know your way around a shell (such the C shell) and a text editor (such as vi), you know enough to compile source-code distributions. Most of the difficult work is usually done for you by utilities such as `configure' and `make'. Distributions also come with instructions which explain what you need to do step-by-step.
Note that this chapter assumes you have already downloaded the distribution and unpacked it. If you don't know how to find and download a source-code distribution, read the chapter on FTP. If you don't know how to unpack a distribution, read the File Compression chapter.
Rules of Thumb
There are a couple of rules of thumb I follow when compiling a UNIX distribution : you would be wise to follow them as well...
Rule 1: READ THE INSTRUCTIONS!!!
There is NO substitute to reading the documentation that comes with the distribution ... not even this tutorial! Even once you've become accustomed to the procedure, some distributions can differ from it in subtle ways that can only be discovered by reading the README or INSTALL files.
So what's the point of this tutorial then? Well much of it is really to help you understand what is going on when the instructions ask you to do certain things, such as running make install. The rest is to help you make informed decisions when the instructions offer you choices, as well as to let you in on some secrets (ooh-aah!) on things such as fine-tuning your installation.
Rule 2 : Be careful
It is better to comment something out rather than to change or delete it.
Much of the compile-time configuration of programs is done by editing certain files. These files will be filled with default settings, some of which you may have to change.
Rather than changing or deleting the default value for a setting, it is better to comment it out, and add a replacement setting next to it. That way if you find later on that the choice you made with that setting was wrong, it is much easier to revert it back to its original value.
Before you can start compiling, you need to know some basics about the system you are compiling the program for. You also need to make some decisions, such as which C compiler to use, and which directories to install the program and its files.
Know your UNIX
Many people speak of UNIX as if it were a single clearly-defined operating system. In fact it isn't. The term UNIX these days collectively refers to a number of different operating systems which share some common features and interoperability.
Briefly, the story goes that once there was Only One UNIX. Then there were two. These two were called System V (from AT\&T) and BSD (from the University of California, Berkeley). These then spawned other versions, and so on... and now we have a whole lot of UN*Xes which vary in different ways. Most however still show their origins as either from System V or from BSD.
Now, for a program to compile correctly, it usually has to know which of these variants you are using. It is also important to know if the one you are using is based on System V or BSD, as sometimes that is all that the program needs to know.
Choosing a C compiler
Just as not all UN*Xs are the same, neither are all C compilers. And just as the variants of UNIX come in 2 main flavours, so too does the C language come in 2 main flavours. "Traditional" or "K&R" C was the original form of C that was developed along with the original UNIX. Thus, most of the older UNIX variants still use it as their "standard" C. There is a newer, improved and standardised version of C called ANSI C, which is the standard C for newer UN*Xs such as Solaris.
You will need to select one of the following 3 C compilers to use to create one of your programs (you probably won't have all 3 available on your system to choose from, however).
- cc This is the generic name for the K&R C compiler that comes with UN*Xes that standardise on K&R C. It's sufficient for most programs, but the quality of the code they produce is usually not as good as gcc (especially when using optimisation, see later).
- gcc The GNU C compiler, from the Free Software Foundation. gcc makes a good choice of compiler because it has saner behaviour than most C compilers due to the many programmers in the Internet community that continually work on improving it. It normally compiles its own version of C (which is a superset of ANSI C), but can compile K&R C if it is given the `-traditional' option.
- acc This is the generic name for the ANSI C compiler that comes with some operating systems that have switched to ANSI C.
Choosing a home for your program
A program will usually need to put files in a number of different directories for different purposes. Usually the directories used by any one program are all subdirectories of the one root directory. The typical root directory for add-on programs is /usr/local, so such a program may have files in the following directories:
/usr/local/bin /usr/local/lib /usr/local/man ... etc.
If you don't have root access, you probably won't be able to install files under /usr/local. In this case, a good second choice is your own HOME directory.
Here are the directories just about every program uses to install files in:
- bin Directory for executable files, both binary programs as well as scripts. Remember to include this directory in your $PATH to be able to use the programs in it (see the chapter on shells for details)
- lib Libraries and other (mostly static) data files used by programs
- man On-line manual pages, readable with the man command. You need to include the directory in your $MANPATH to make use of it.
Here are some other directories which you may also have under your root installation directory, and may prove to be useful:
- etc Files and utilities for administring other programs.
- info compiled texinfo files, readable with the info command. This is a form of hypertext on-line help which offers an alternative to man pages (you may still want to install both). Include the directory in $INFOPATH to use it.
- scripts If the number of programs you install becomes large, you may which to separate the scripts from the binary executable programs. This also helps with porting and updating, as only the binary executables need to be changed, but not the scripts.
Some programs may need detailed information of your system in order to compile correctly. If you had to supply all that information yourself you could be in a LOT of trouble!!! Fortunately there are ways for distributions to retrieve much of this information for itself --- if you're lucky, your distribution is one of them! Don't expect it to do all the work for you... you will still need to go over the header files and Makefiles it generates.
Configure scripts are shell scripts included in some source distributions. Their job is to run tests on your system to find which UNIX features are supported by your system, and which are not. It then uses this information to generate a Makefile (and sometimes a header file as well).
Running the configure script is easy. Simply go to the top directory of the distribution and type:
Note the `./' in front of the configure command. This is to ensure that you run the script in the current directory, and not some other configure command that may be in some other directory in your $PATH.
Some configure scripts accept certain options: the README files will tell you if this is the case, and which options are accepted.
Configure scripts tell you what tests they are performing as they run. Some scripts ask you to confirm the facts they discover or the choices they make. If you find that you don't understand what they're asking, just accept the default answer and usually it will work.
Programs which use the X Windowing System, in addition to needing to know which UNIX features are available, also need to know certain facts about the X installation. These include such things as the locations of X header and library files, programs, fonts, and the font file format used.
Rather than having each X program needing its own configure script to search for this information, the X Windowing System takes a different approach. X installations come with a program called xmkmf, which stands for `X-MaKe-MakeFile'. This program knows the answers to all the typical questions asked by X program distributions. It reads a Makefile template which contains slots where these answers need to be filled in, and from it generates a Makefile. Neat, eh?
Running xmkmf is even easier than running configure (remember to be in the top distribution directory when running this):
Unlike configure, xmkmf needs no options and asks no questions... what could be easier?
Configuration Header Files
C source files vary the code that gets compiled (sometimes called `conditional compilation') with things called #defines. These #defines can be passed to the compiler as options on the command line, but sometimes there are just too many of them!!! In these cases, they are placed in a C header file, typically under a name such as config.h.
To continue configuring the program, you may need to edit such a header file and modify the #defines.
Hold on!!! Didn't I say you didn't have to know C to do any of this?? OK, so I lied :-). But before you start having a heart attack, take a look at a couple of examples.
Here are a couple of #defines from a typical config.h:
#define X11_GRAPHICS /* X11 interface */ #define LOGFILE "logfile" /* for debugging purposes */
The first line tells the compiler to compile the code for X11 support, and the second line tells the compiler that the name of the file for logging is "logfile".
Now lets say that we want to disable X11 support, and change the name of the log file.
To disable the first line, we comment it out with a C comment. C comments begin with the characters /* and end with the characters */. Since there is already a */ at the end of the line, we just put a /* at the front.
For the second line, we simply change the word in quotes following the define name (LOGFILE). Our 2 lines now look like this:
/* #define X11_GRAPHICS /* X11 interface */ #define LOGFILE "blah" /* for debugging purposes */
If you want to comment out a number of consecutive #defines, you need to comment out each line individually. Putting a /* before the first line and a */ after the last won't work.
Also, sometimes a #define won't have a comment after it, in which case you'll have to add the */ at the end yourself. For example, to disable the following line:
change it to this:
/* #define MSDOS */
There! That wasn't so hard now, was it?
One final point: you may cause problems if you make changes to a C header file after you have compiled some or all of the source code. To avoid this, delete all the object files created from the previous compilation, and restart the compilation from scratch.
make and Makefiles
Here we come to the "big magic" of compiling and installing source distributions: the Makefile.
So what "is" make?
One of UNIX's basic philosophies is of having lots of small programs, each of which does a simple task very well, and providing ways of using them together to achieve the desired result.
You will already have seen this in the shell, when using pipes to send the output of one program to another, as in this example:
% expand file.txt | fold -78 | pr -l64 | lpr
A similar situation arises when compiling programs. There are a number of programs in UNIX to assist in the compilation process: the compiler, the linker, and the assembler are some commonly-used tools
To "glue" these programs together in the right way so that they can build a program, another program is needed which can determine what files need to be compiled or generated, how to generate them, and in which order to generate them in.
This is where make steps in. Make is a program which builds targets (usually a file such as a program) by following rules from a Makefile. A Makefile defines which files are dependant on which other files, and what commands to run to generate them. It uses modification dates to determine which files need to be re-generated so it can avoid repeating operations unnecessarily.
Here is a sample Makefile rule, for the curious amongst you:
widget: tic.o tac.o toe.o cc -o widget tic.o tac.o toe.o
This rule states that the target `widget' is dependent on the files tic.o, tac.o, and toe.o, and to build the `widget' from these files, the command cc -o widget tic.o tac.o toe.o must be executed.
Make rules can get much more complex than this example, with things such as special macros and implicit rules. But this needn't worry you, as you don't need to write or modify any rules to configure a Makefile!
To build a target, all you have to do is run make with the target name:
% make widget
Make tells you what it is doing by echoing each command it runs to the screen. It also tells you if it has nothing to do ("target is up to date"), or if a command it ran failed (which normally halts the make process).
Like shell scripts, Makefiles use variables, and it is these variables which allow them to be customised.
In fact, variable assignments in Makefiles look a lot like those in Bourne shell scripts:
VARIABLE = value
Some times assignments can get very long, so to split them across multiple lines, backslashes (\) can be used as line continuation characters:
FIRSTTWENTY = one two three four five six seven eight \ nine ten eleven twelve thirteen fourteen fifteen \ sixteen seventeen eighteen nineteen twenty
They also share the same method for including comments: everything past the first # on a line is considered a comment, so to comment out a variable assignment, simply put a # in front of it:
# VARIABLE = value
If the assignment takes up multiple lines, remember to comment out the other lines too.
Common make variables
These are some configurable make variables which commonly appear in Makefiles:
Remember when I told you earlier to choose a root directory for the files for your program? Well this is where you specify it. The exact name of the variable can vary depending on where it came from (the person who wrote it, or the configure script which generated it), but you should recognise it once you see it.
These variables specify the specific directories for installing files. Note that these should default to the relevant subdirectory under the install root directory, for example:
BINDIR = $(prefix)/bin MANDIR = $(prefix)/man
Note how the prefix make variable is evaluated by writing it as $(prefix)<tt>.
- CC The name of the C compiler you wish to use: cc, gcc or acc.
The options to be passed to the C compiler for compiling. These can include debugging or optimisation options: see the next section for more details.
- DEFINES <tt>CPPFLAGS defines the options to be passed to the C compiler for preprocessing. These include any #defines that are to be passed via the command line. Sometimes the #defines are separated into a DEFINES variable as in the following:
DEFINES = -DX11_GRAPHICS -DLOGFILE=\"blah\" CPPFLAGS = -I./include $(DEFINES)
(Note the backslashes in front of each of the double-quotes. This is to stop the shell which runs the commands on make's behalf from interpreting the double-quotes. Read the shell chapter for more details).
- LIBS LDFLAGS defines the options to be passed to the C compiler for linking. These include any special libraries which must be linked to the program. Sometimes these libraries are separated into a LIBS variable as in the following:
LIBS = -lX11 -lXaw -lm LDFLAGS = -L/usr/local/X11/lib $(LIBS)
Compiling the program
If you have a Makefile, the program is usually the default target in it. This means that to compile it, all you need do is type "make".
This should compile the program but not install it anywhere.
Your distribution make come with some support programs and utilities along with the main program: if you have a Makefile, you can usually compile all of them by running:
% make all
With some very simple programs, there may be only once C file to compile and no Makefile. In this case, you'll need to run the compiler manually. Assuming the program name is `widget' and the C file is widget.c, you should be able to compile it by running:
% cc -o widget widget.c
You may however wish to include extra options for optimisation and/or debugging.
You can tell the compiler to produce optimised code by passing it the -O option. If you are using make, add it to the CFLAGS variable:
CFLAGS = -O
If you are running the compiler manually, add it to the command line:
% cc -O -o widget widget.c
Optimised code executes faster than unoptimised code, and usually is noticeably smaller. Unfortunately, optimisation algorithms in C compilers involve heavy wizardry and are thus the most likely place for bugs to arise. Some programs are more vulnerable to buggy optimisation than others: the README files will tell you if your program is sensitive to the -O option.
Note that the quality of the C compiler also affects how safe it is to use -O option: gcc is generally more trustworthy than cc (or even acc), but this depends on what version of gcc you are using.
If you are feeling really confident, you can try using the -O2 option which does even MORE optimisations to the generated code.
Another option that can be employed in the same way as -O is the -g option. This tells the compiler to include information in the compiled code which can be used by debuggers for tracing through the execution of the program. Unless you are a C guru, you are unlikely to make use of this yourself. On the other hand, if the Makefile includes the -g option by default, there is little harm in leaving it in unless you are short of disk space (it doesn't slow the program's execution down any).
As an aside, most compilers cannot use the -g and -O options together. Gcc is an exception to this rule: however strange things have been known to happen with optimised programs in debuggers - caveat programmer.
Sometimes in the middle of a compilation you may get warning lines which look something like the following (the actual warning may vary):
widget.c: line 56: warning: pointer assigned to integer without cast
If you get this, don't panic!! This is usually a sign of sloppiness on the programmer's part, rather than something going wrong in the compilation. Let it keep going, and if the compilation finishes then the warning was a false alarm... hopefully ;)
C errors are a different story: they're a bit hard to ignore even if you wanted to, because it stops the compilation from continuing!
You are venturing into treacherous territory here: it can get difficult to proceed without some good knowledge of C (to decode the sometimes very cryptic error messages the compiler gives you).
- Try re-reading the instructions. You may have skipped or misread a step, or simply have made a mistake. Otherwise look for a FAQ or troubleshooting guide in the documentation, and see if you can find a mention of your specific problem.
- Try to find somebody who knows more C than you do: maybe they can help decipher the error messages and fix the problem for you.
- If you don't know any C gurus personally, there's a whole dump-truck full of them right under your very nose: the ProgSoc mailing list!! Remember that the harder the problem, the better us ProgSoc code-hackers like it. On the other hand, if it becomes obvious that you simply haven't read the instructions, expect some taunts and shouts of "RTFM!" at the very least.
Testing the program
Now, assuming that make successfully compiled your program(s), you will want to test it to make sure it works before continuing with the installation.
Some Makefiles contain a test suite to verify that the program works correctly (the README docs will tell you if this is so). Run these tests with:
% make test
This will run tests on the program similar to the way configure runs tests on your system... except that failure of a test means there's something wrong with the program. You will be told at the very end if all the tests passed or not (in case you missed seeing the rest of the output).
If it DOESN'T have a test suite, you'll have to run the program yourself. Make sure you know how to operate the program (read the documentation otherwise), and run:
% ./program [arguments..]
In other words, run it as you would normally, except put ./ in front of the program name as you did with configure, so you ensure you are running your newly-compiled version of the program and not one that may already be installed and in your $PATH.
Unfortunately some programs may not work unless installed: they will compain "cannot find \<file\>", where \<file\> is in one of the installation directories you specified earlier, and is copied as part of the installation process. In this case, you'll have to install the program before you can test it.
If you have a Makefile, installing the program and its related files is as easy as running:
% make install
This will copy your compiled program(s) and files to the right directories and with the right permisions.
If you are installing the program in your own system (i.e. you have root access and the freedom to use it as you wish), this step is the only one you need to perform as root. All other steps can be done in your ordinary user account.
Now is the time to try out your newly-installed program under normal operating conditions. HOWEVER, Some shells such as csh or tcsh build a hash table to remember where commands are in their $PATH without having to search all the directories. If you add a new program to one of these directories in the middle of your shell session, it won't find it. To tell csh or tcsh to rebuild this hash table so it can find the new program, run the rehash built-in command.
You should also test to see if man can find and display your program's manpage.
Cleaning up after yourself
Now that you've finished installing, you may want to do something about the source files and associated junk you created whilst compiling your program.
If you have a Makefile, you can delete the larger, more annoying files such as object (.o) files and core dumps but leave the source files intact with:
% make clean
You may need to use make clean if you want to recompile after editing config.h, or parts of Makefile which affect the way the program is compiled. Otherwise you may have part of your program compiled for one configuration, and part compiled for another.
make clean might not delete some of the other files created, such as files produced by yacc or configure. There is often another make target for a "cleaner clean", under a name such as distclean or spotless, which will delete all these files as well, and (hopefully!) leave you with only the files you started with when you unpacked the distribution.
It is more likely however that you won't want anything to do with the source code after installing the program --- in this case, you may as well delete the distribution directory and everything in it.
Fine-tuning your installation
There are a couple of things you may want to do some time after installing. They don't significantly affect your installations, but can prove quite useful.
Reducing your program's size with strip
If you're getting very short of disk space, you can reduce the size of your programs by stripping out their symbol tables. This is done with the `strip' command. strip usually reduces your program's size by 20-50%. Here's an example of its use:
niflheim% ls -sl expect 640 -rwxr-xr-x 1 dbugger 647168 Feb 17 18:39 expect niflheim% strip expect niflheim% ls -sl expect 360 -rwxr-xr-x 1 dbugger 360448 Feb 17 18:40 expect
Note that some Makefiles strip your program for you when installing---in these cases, strip will obviously have no effect. Also, strip does its job by removing unnecessary stuff from files, and debugging information is the first to go. Debugging a stripped program is not pretty.
Man page searching and catman
Ever needed a program in UNIX to do a specific task, but didn't know which program could do it? Or you knew of a program which could do it, but had forgotten the program's name? With a plethora of cryptically-named programs, this problem happens often in UNIX.
One way to find such programs is by searching the man whatis database. This database lists the title lines of man pages, where a short description of the program is given.
For example, say we want a program which will split long lines in a file into separate lines. To do this, we could search for "lines" in the whatis database like this:
niflheim% man -k lines comm (1) - display lines in common, and lines not in common, between two sorted lists error (1) - categorize compiler error messages, insert at responsible source file lines fold (1) - fold long lines for display on an output device of a given width head (1) - display first few lines of specified files look (1) - find words in the system dictionary or lines in a sorted list paste (1V) - join corresponding lines of several files, or subsequent lines of one file random (6) - select lines randomly from a file sort (1V) - sort and collate lines textedit_filters, align_equals, capitalize, insert_brackets, remove_brackets, shift_lines (1) - filters provided with textedit(1) unifdef (1) - resolve and remove ifdef'ed lines from cpp input uniq (1) - remove or report adjacent duplicate lines wc (1) - display a count of lines, words and characters /usr/local/lib/perl5/man/whatis: No such file or directory /home/dbugger/man/whatis: No such file or directory /usr/local/man/whatis: No such file or directory niflheim%
Looking at the results, it looks like fold does what we want. However notice the "No such file or directory" messages at the end. This means that the whatis database has not been built for these man directories. You can build or rebuild the whatis database for your man directories with the `catman' command, like this :
% /usr/etc/catman -w -M /home/dbugger/man
man -k will now be able to search man pages in the /home/dbugger/man directory in subsequent searches.
Help Save The World!
Well now that you've gone to all that trouble of installing that beaut new program, you're not going to keep it all to yourself now surely? Tell your friends! Tell the world!!! Let everyone make use of it! Otherwise we end up with half a dozen people with the same program installed in their accounts --- quite a waste really.
- ↑ Just to be unclear right from the start, the word UNIX was originally a trademark of AT&T. Then it became a trademark of Novell. The last I heard, Novell sold the rights to the name to a mob called X/Open. By the time you read this, it could well belong to someone else instead. So to avoid being wrong (and get my backside sued in the process), I'll say UNIX is a trademark of its respective owner(s). That should keep the right people happy.
- ↑ This is the catch-cry of Larry Wall (author of Perl), and the basic philosophy of the Free Software Foundation (better known as the GNU folks): that people should share the programs they develop with the rest of the Internet community, so we can all benefit and learn from each other, and make the world a better place to compute in...
- ↑ Although it helps when the compilation fails... hopefully this will never happen to you.
- ↑ I'll add more rules of thumb to the list once I think them up.
- ↑ Yeah yeah, I've been caught out a few times myself for skipping the documentation. I guess I'm human too.
- ↑ You're not alone: Iain Sinclair wonders the exact same thing. "Hello there Ax waves, hope you like the tutorial so far!"
- ↑ Including myself, as you may have noticed.
- ↑ Careful speakers of Operating Systems avoid this use of the term and instead use the word "UN*X" for the same meaning.
- ↑ Interoperability : a big word which means that computers running under one of the operating systems can talk to computers running under another of the operating systems without being too fussed about the fact that they're running different operating systems.
- ↑ There's sure to be a good book that explains in detail what happened... except I don't know any. If you do, let us know at email@example.com.
- ↑ Linux, of course, has heritage from both camps.
- ↑ Solaris is an example of a UNIX which standardises on ANSI C, but doesn't come with an ANSI C compiler... or any C compiler at all for that matter. Sun's reasoning behind the decision is that seeing as Solaris doesn't need a C compiler to operate (SunOS needed cc to recompile kernel configuration changes), they don't need to include one... and if a user wants a compiler (acc), they should pay for it. Luckily the FSF provide pre-compiled versions of gcc for Solaris.
- ↑ See Cleaning Up After Yourself below
- ↑ though nowadays the assembler, or code generator, is part of the same executable as the compiler.
- ↑ Well, sort of. See section 7 for more detail.
- ↑ Wizardry : Knowledge of the inner workings of complex programs or systems, known only by the most skilled of programmers (invariably known as "wizards").
- ↑ Unless, of course, this is your code that you're trying to compile, in which case you can expect a crash.
- ↑ Some people who run their own UNIX machines do EVERYTHING as root: since UNIX can be rather unforgiving, this habit is only for either the daring or the foolhardy.
- ↑ bash has the same feature/problem, except it doesn't seem to have a rehash command or any equivalent... oh well, try logging out and in again instead.
- ↑ Or whatever else the man page pertains to, which could be a system call, C function, device driver, file format, etc.
- ↑ The directory where catman resides varies from UNIX to UNIX. In SunOS it lives in /usr/etc, while in Linux it lives in /usr/bin. Read the manpage for catman if you can't find it.