Index:

This is a guide to the configuration and management of the Legato Networker backup system. It covers releases 5.2.1, 5.5.1 and 5.5.2 which are the ones we are proficient in. However, it is probably relevant to other releases too. This is a brief overview of how to decide which hosts to back up, policies for backing them up and making sure that they are properly backed up.
  1. Which hosts to back up
  2. Backup frequencies, browse and retention policies.
  3. Which disks to save
  4. How many groups
  5. How many pools
  6. Checking the 'Savegroup Completion' messages
  7. Changing the cleaning tape in a jukebox
  8. For more information...

1: Which hosts to back up:

One machine must be designated the Legato server, and have the alias nsrhost. We recommend that this is a Sun Microsystems UNIX host if possible, as that is where our experience is.

The next thing to decide is which other hosts will be backed up. Do not assume that you should automatically back up every host. Many sites use UNIX machines as data-less clients. In these cases, it is as easy to do a system installation from CD as it is from your backup media, with the obvious reduction in tape (or whatever ) costs. There are drawbacks to this, in that things like system and user crontabs will be lost. However, this could be circumvented with a cron job to back up the relevant partitions/files to a host that is backed up.

A good rule of thumb is 'if it's in the automounter, back it up'. Also, you may have machines which are not in the automounter, but nonetheless are used for data storage. Make sure that these are also backed up.

Back to the index
 

2: Backup frequencies, browse and retention policies.

 i: Backup frequencies:

You should aim for a  full backup once per week, if possible. However, some sites may not have the hardware or  media budget to either physically  back up the site once a week or pay for the tapes used.

In this case, a full backup once a fortnight would suffice. We do NOT recommend a full backup only once a month, as this leaves you too vulnerable in the event of media problems.

In general, small sites can do a full on Friday for all hosts, with incrementals on all other days. Large sites may need to scatter the day for full backups throughout the week.
 
Back to the index

 ii: Browse policies:

browse policies are set by the amount of disk space you have for the indexes, and the amount of files backed up if you have release 4.2.5 or earlier. ( There isn't a limit on the number of files you can back up, but there is a limit on the size of the index file )

Allowing 2% of the total file size backed up as an index size seems a reasonable starting point.  We suggest a month of on-line index as a reasonable starting point, though your personal requirements or storage limitations may lengthen or shorten this.

Please note, if you have release 4.2.5 or earlier and your index file exceeds 2 gigabytes in size, you will, in increasing order of importance:
  i: Lose all the old index entries
 ii: Not be able to back up the new index entries for clients
iii: Not be able to back up anything for the nsrhost

As some sites use their main file server as the nsrhost, this is obviously a critical problem.

Back to the index

 iii: Retention policies:

The retention policy sets how long you want to keep the tape so you can recover from it. We suggest a minimum time of three months, but a year is a more common storage time.

this is to avoid the situation where someone deletes an important file, and does not realise until you have over-written the tape it was on.

There are two important points here
  i: That the owner of the data understands and agrees the lenght of time that the backup tapes will be kept for.
 ii: That you take account of the working practises of the people at your site. For example, if your people take extended trips away from their main base, you need to be able to recover from mistakes they make before going away for three months, for example.

Back to the index

3: Which disks to save:

In general it is safest to use "All" as the Save Set directive in the client setup window. This means that new disk additions or partition re-mapping do not affect the integrity of the backups. The down-side is that there may either be more data on the machine than you can back up in a single night, or more operating system than data. In these cases, specify the mount point of the disks you want to back up in the Save Set directive for the client. You can have have multiple entries for the same client in your client setup windows, but only use one client license.

Back to the index

4: How many groups?

Don't make life too complicated. Try to keep to servers and workstations, or similar. Don't set up Silicon_Graphics, Sun, DEC and HP as four separate pools just to keep things looking neat. The only reason to split them up like this is to allow them to write to different pools, and even then you should seriously consider why you want to do this.

Back to the index
 

5: How many pools?

Again, try to keep things simple. In most cases 3 pools is more than adequate, giving you pools for full backups, incremental backups ( splitting them like this improves storage redundancy, and is of great benefit when you cannot read a tape ) and a pool for cloning for offsite storage. You should have two tape devices for cloning, though it is possible to clone to a file-system based device. However, cloning a DLT is not usually possible as it is rare to have spare disk-space of the required quantity. It is also longer, as a two-pass transfer is required. You can have a separate pool for offsite storage though. We highly recommend you do this on a regular basis, perhaps once a quarter. If you cannot, at least place your older full backup tapes off-site. This means that data recovery is much slower, as the tape isn't available. However, in the event of disaster, you can at least get a system back once you replace the
servers.

Back to the index

6: Checking savegroup completion messages.

These must be checked every day, and any errors must be acted upon.

The 'person' receiving the savegroup completion notices, and doing the checking, should be a role that is present every day, such as a Help Desk ID or a computer room operator if you have one. Also, the Legato administrator should receive the messages as well.

The idea here is to stop failures going un-noticed because some-one is on holiday, or in the Antarctic.

The person reading the messages need not necessarily be able to fix the problems reported, just to identify that something is amiss and get the appropriate person to deal.

Successful backups get a savegroup completion message like this:

NetWorker Savegroup: (notice) Main completed, 23 clients (All Succeeded)
Start time:   Mon Oct 13 19:30:02 1997
End time:     Tue Oct 14 02:41:34 1997

--- Successful Save Sets ---
[ *** Snip client details *** ]
 

Unsuccessful backups get a savegroup completion message like
this:

NetWorker Savegroup: (notice) Main completed, 23 clients (client1 Failed)
Start time:   Thu Oct 23 19:30:01 1997
End time:     Thu Oct 23 22:00:56 1997

--- Unsuccessful Save Sets ---

* client1:All 2 retries attempted
* client1:All Connection timed out

--- Successful Save Sets ---
[ *** Snip client details *** ]

In this case, client1 was not on the site, so the error message is to be expected. If the system will be missing for more than a few days, it is worth removing it from the group. This makes it easier for the person checking the Savegroup Completion notices to spot errors. You can always make a new group and put it in that, if you are worried that you might forget to re-enable it as a backup client.
 

NetWorker Savegroup: (notice) Main completed, 23 clients (client2 Failed)
Start time:   Thu Sep 25 01:30:01 1997
End time:     Thu Sep 25 03:27:12 1997

--- Unsuccessful Save Sets ---

* client2:All 2 retries attempted
* client2:All Connection timed out

--- Successful Save Sets ---
[ *** Snip client details *** ]

In this case, the nsrexecd client daemon was not running. restart it with /nerc/etc/rc.d/S06networker_client to re-enable backups. As you can see, the above two messages are identical, so you need local knowledge of what's happening to spot what's causing the problem.
 

This last one is a sign that something is seriously amiss:

Networker Savegroup: (notice) Main completed, 23 clients (client1, client2, client3, client4, client5, client6, client7, client8, client9, client10, client11, client12, client13, client14, client15, client16, client17, client18, client19, client20, client21, client22, server Failed)
Start time:   Mon Jul 28 19:00:02 1997
End time:     Mon Jul 28 20:26:56 1997

--- Unsuccessful Save Sets ---

* client1:index 1 retry attempted
* client1:index save: SYSTEM error, Arg list too long
* client1:index save: Cannot open save session with server

[ repeated for all other clients ]

* server:/ 1 retry attempted
* server:/ save: SYSTEM error, Arg list too long
* server:/ save: Cannot open save session with server
* server:/var 1 retry attempted
* server:/var save: SYSTEM error, Arg list too long
* server:/var save: Cannot open save session with server
* server:/local 1 retry attempted
* server:/local save: SYSTEM error, Arg list too long
* server:/local save: Cannot open save session with server
* server:/local1 1 retry attempted
* server:/local1 save: SYSTEM error, Arg list too long
* server:/local1 save: Cannot open save session with server
* server:/local10 1 retry attempted
* server:/local10 save: SYSTEM error, Arg list too long
* server:/local10 save: Cannot open save session with server
* server:/local11 1 retry attempted
* server:/local11 save: SYSTEM error, Arg list too long
* server:/local11 save: Cannot open save session with server
 

--- Successful Save Sets ---

  client1: /                     level=full,       40 MB    00:01:34       2505 files
  client1: /usr               level=full,     293 MB    00:13:45    27170 files
  client1: /local            level=full,     507 MB    00:16:19    15427 files
  client1: /local1          level=full,     384 MB    00:12:38      5253 files
  client1: /local8          level=full,     982 MB    00:30:14      4771 files
  client1: /local9          level=full,   1253 MB    00:40:38      2741 files
  client1: /local10        level=full,   2694 MB    02:29:35      6462 files
  client1: /local12        level=full,   1125 MB    01:40:30 124273 files
  client1: /local13        level=full,   1852 MB    01:02:17   13725 files

So everything except the index was backed up for the clients, EXCEPT the client that is the Legato server. Nothing was backed up from that.
 
Back to the index

7: Changing the cleaning tape in a jukebox

Magenetic tape devices get dirty in use, from the tape shedding particles as it is used. Read the instruction manual for your tape device to determine an appropriate cleaning interval, bearing in mind the number of hours use the tape devices get. You can assign a slot for a cleaning cartridge in a jukebox. If you have done this, Networker can be told the life of the cleaning cartridge and keep track of how often it is used, and update the "uses left" figure. So, for example, on a DAT devices, the HP cleaning tapes can be used 50 times before being all used up. So, to tell Networker you have installed a new tape, run  nsrjb -U 50 -j jukeboxname  to allow 50 uses of the cleaning tape. If you have a DLT drive, the recommended number of uses is 20 so you would want nsrjb -U 20 -j jukeboxname  .
 
Back to the index

8: For more information:

If you still have problems, please contact iTSS Systems Group
  • By electronic mail to syshelp@ua.nwl.ac.uk and itsshelp@itss.nerc.ac.uk

  • or by phoning us on This page last updated on 30 October 2000

    Back to the Legato main page. || Back to the Legato Procedures index
    Dominic Feeley,
    iTSS Systems Group, Wallingford,
    UNIX and Legato Support.