Rdist is a UNIX tool which allows two directory structures to be compared and one updated from the other, even across the Wide Area Network, We use this to maintain the NERC file structure at all sites, and also the Sysadmin scripts (see below) which run everywhere daily. In principle the NERC structure, or at least the centrally-maintained part of it, could be updated everywhere overnight; in practise we only update the 'scripts' part on a regular basis, by request from local staff or when we ourselves are going to do some updates or installs on that site.
Updates to the rest of the structure are normally only done if there is a major reshuffle at the site, and for example new architecture types are being introduced. The startup scripts are updated when new versions of packages are installed, and any local versions are brought back to the central system for redistribution to other sites, to avoid reinventing the wheel should the same package be bought at another site.
The nercman userids are configured ( ~/.rhosts) to allow access from the Wallingford nercman userid, for both the NERC file structure and the Sysadmin software. Note that we are not permitting remote root access. Some of the files in /nerc/scripts will also be locally-configured ( such as the /etc/sendmail.cf file in the various incp directories) to have the correct local names and values, and will be owned by root to prevent the rdist deleting them..
System Admin software
This includes a selection of security and management software, most of which lives under /nerc/packages/sysadmin. The most widely-used are described below.
This is part of a general awareness of security issues
Back to contents
This is a security-checking package which runs every night on almost all systems. It was developed by D Ritchie and derives mostly from the well-known COPS package. Concise summaries are produced and mailed to secman. This should be defined as an alias for whomever or wherever a site wishes the reports to be sent to. Clearly, such reports should not be given too wide a circulation. The reports are also archived under /data/operator/zenith.
Matters checked include -
Back to contents
Here are all the scripts which run every night on most workstations to look at the memory size, partition details, model, hostid, licence managers, disks and as much other information as can be gained by looking at the various logs and running any available built-in query utililities. It relies on task-manager ( see below) running on every workstation. The results are stored in /data/operator/config.mgt, and are rdisted back to Wallingford every morning to be collated and a subset loaded into ORACLE at Swindon. One host on each site will have a crontab entry for nercman to do this rdisting - this is one to watch if the host is upgraded!. The local data is overwritten for each day of the month until the last, leaving one entry per month - which is overwritten again in a year's time. Messages about disk errors etc. are mailed to the confman id which is usually an alias for operator or the system manager's personal id.
Network Time Protocol (NTP)
This is the preferred time synchronisation software for the Internet. There is a hierarchy of NTP servers thoughout the world. These get their time from a variety of sources such as radio clocks and so on. The JIPS people provide NTP servers for our community on the network and we provide configuration files (/etc/ntp.conf) to use these. The reference time from these servers derives ultimately from the GPS satellite system. The software goes to great lengths to estimate network delays and, after it has been up for a few days, time synchronisation between systems should be accurate to better than 1 ms over a LAN and 5 ms over the WAN. (I have no idea how to test this without buying special equipment so it is an easy claim to make!!) It expects one or more local time servers on each site ( called ntp0, ntp1 etc in the hosts file) which act as NTP servers for all other systems while themselves querying the reference servers via JIPS. NTP software (server and client) is available for most UNIX systems. Client software is available for Novell servers, MS Windows and DOS.
A daemon (/etc/xntpd) should be running on all systems and there is a tool (/nerc/packages/sysadmin/ntp/ntpq) which gives lots of information on the current state of the NTP system. NTP uses broadcasts on port 123/UDP.
For Alpha systems we keep the package authorisation keys here.
Back to contents
TASK for Local Area Maintenance
TASK_MANAGER is a facility to execute a script on a selection of UNIX systems at a site. The mechanism makes it simple to request execution on "all participating systems", "just a few systems" or on "all but a few" systems. In addition the mechanism will prevent the task from being executed more than once on a host and will run the task on hosts which are down at the time of the original submission shortly after they return to service.
We have found it to be so useful that it is now installed as a matter of course on all new systems, and has been added to very nearly all existing ones.
It is used, for example, to apply operating system patches on all workstations, or to copy applications packages to the mirror servers after an update. It is in essence an intelligent list of 'cron' jobs, which can vary from time to time, needing only the single entry in root's crontab on each host.
The mechanism operates in the following manner :-
The script /nerc/etc/task_manager is executed by root every hour at the same time on all participating hosts by the following crontab entry :-
0 * * * * /nerc/etc/task_manager >> /dev/null
This script ensures that it has been invoked by root and then pauses for 10*(last component of the executing host's internet address) seconds in order to avoid a sudden overload on the server.
The file ~operator/task/hosts.include is searched for the executing host or a wildcard "*" indicating all hosts. If neither is found the script terminates.
The file ~operator/task/hosts.exclude is searched for the executing host and if found the script terminates.
The script then ensures the task has not already been performed on this host by checking for the existence of the log file for this host ~operator/task/log/task_log.<hostname>. If this is found execution terminates.
The script performs the desired task by executing ~operator/task/task.csh and writes output to the log file ~operator/task/log/task_log.<hostname>.
In order to set up this facility the following steps need to be performed :-
Back to contents