Legato Networker - What if it doesn't work ?

Some errors described, and their resolutions
Error message in recover :  unknown version for UNIX file `/'
Tapes marked full far too early - SCSI errors and transport failures
NetWare backups not working - after password, server name or IP change

Some errors described, and their resolutions

On rare occasions, the Legato System will manage to tie itself into a knot from which it cannot recover. It is normally robust, and will survive power cuts whilst in the middle of updating indexes, for example.

However, there are times when the system will refuse to work. First of all check that Legato Networker is up and running, you should have the following processes ( or daemons ) running on the server ( release 4.2 or later ):

/usr/sbin/nsrexecd -s nsrhost
/usr/sbin/nsrmmd -n 1  ( for the first tape device or jukebox )
/usr/sbin/nsrmmd -n 2  ( optional - for the second tape device or jukebox )
/usr/sbin/nsrmmd -n 3  ( optional - for the third tape device or jukebox )

Release 4.1.3 or earlier did not have a separate nsrexecd for the server, it was part of nsrd

In those circumstances, if Networker is up and running:

The first step here is to make sure the jukeboxes are in a state that Networker can cope with. The jukeboxes MUST be "loaded", i.e. in a state where the jukeboxes have scanned the tape drawer and are ready for SCSI commands. If in doubt, unload the stacker, then when it is finished, reload the system. The means for doing this varies from system to system, but is normally obvious from reading the button labels on the jukebox. If you still have problems with jukebox control, then the command   nsrjb -H  is the next step. This prompts for the jukebox to work on, then reinitializes the software control( and hardware systems ) of that jukebox. This normally sorts out irregularities with jukeboxes. You are always warned to re-inventory the jukebox, but this is normally not necessary. However, an inventory is a good check that the jukebox is working. Use nsrjb -I and provide the appropriate jukebox name when prompted.

If you still cannot get the system to work, You have more serious problems with the stackers. The next step is a power-on reset of the stackers. Please note, it is possible to bring down the system that the SCSI controller is on by resetting external devices. It is very rare, but it does happen.

Manually unload the tape if you cannot unload via the nwadmin GUI or with nsrjb -u (  This is very system-dependant! )  Power off the jukebox, wait ten seconds, then power it back on. Reload the stacker, and run nsrjb -H for the appropriate stacker.

If you are still having problems, change the cleaning cartridge, even if it isn't past its "sell-by" date, and reset  the 'number of uses' counter with nsrjb -U 50 Force a clean, and retry.

The last thing to try is a shutdown and restart of Legato.  Run nsr_shutdown as root, then when this completes, run /etc/rc2.d/S95networker start . This will take some considerable time  perhaps a whole day if the client databases are badly mangled. Although operations are safe, they are dreadfully slow.  You are advised to wait until nsrmm and nsrck have finished before using Legato.

If you get this far without fixing the problem, and nothing has been done to alter the configuration of Legato, then it's likely that either the tape devices are broken, or the system that Networker runs on has been reconfigured. To check the latter case, reboot the legato server so it reconfigures it's device drivers, for example with boot -r from the kernel prompt for a Solaris machine.  Any change to hardware can require a boot -r, even if the devices look identical. They might both be Sony SDT-5000 DAT devices, but with different PROM revisions. This can stop the OS successfully using the devices.

Sometimes, you get error messages like: NetWorker index: (notice) Check failed for client clientname (bad database header) First of all, force a check with nsrck -F clientname You may need to run this several times, as sometimes it cannot fix all the index problems in one go. If nsrck -F complains, then things are looking serious. You will need to remove and re-create the client index. To do this, change directory into the client index, and remove all the files. This is one of the times when you need to ignore the README file. Once it's deleted, run nsrck -c clientname which will create an empty database. Then, you can use Save Set Recover to bring back the old indexes if required.

Back to the index

Error message in recover :  unknown version for UNIX file `/'

nsrclient# recover
recover: Using nsrhost as server for nsrclient
recover: unknown version for UNIX file `/' (possible newer version)
/ not in index
<return> will exit.
Enter directory to browse:

Legato state that the client-server interface is cast in stone, and that any client release can use any server release. However, we have found exceptions to this. For example, the Solaris Legato 4.1.3 client cannot understand the file system when talking to a Legato 5.0 server. However, the tape format is fine, so using scanner and uasm work happily. However, this is a difficult thing to do for the inexperienced, so we suggest upgrading to 4.2.3 client versions.

This sort of error can be caused by having locally installed versions of Legato. We do not recommend that pkgadd or inst be used for adding client versions of Legato. You should use the /nerc/packages/legato version for clients instead. There are certain circumstances where the pkgadd or inst approach is the preferred version for clients, but these are rare.

Please note that some operating system changes will break Legato, but in some circumstances this simply cannot be got around. A classic example was the Silicon Graphics change from EFS to XFS filesystems. However, this broke everything that needed filesystem access. Again, a new release of the Legato client code fixed this.

Back to the index

Tapes marked full far too early - SCSI errors and transport failures

One some Sun machines, such as SPARC/10's, there can be problems with the SCSI implementation causing Legato tapes to appear as full far too early. Legato marks a tape as full if a certain number of errors are detected whilst writing the tape.

Lines like the following will appear in /var/adm/messages. Please note that nothing can be inferred from the order of the devices reported in the messages file.

May 11 15:15:03 fzua unix: WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000 (esp0):
May 11 15:15:03 fzua unix:      failed reselection (no valid cmd)
May 11 15:15:03 fzua unix: WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@0,0 (sd0):
May 11 15:15:03 fzua unix:      SCSI transport failed: reason 'reset': retrying command

Unfortunately, there is more than one reason this can happen. It is possible that a hardware fault is causing the problem, but on older kit, especially used with modern external devices, it might be the SCSI implementation on the controller causing problems.

It is possible to "slow down" the SCSI bus, which may fix these problems.

To do this, add the following to /etc/system

set scsi_options=0x58
* This disables fast ( 10MHz ) SCSI,  and also disables command queueing ( tagged command support ) and synchronous operation

set scsi_options=0x178
*This enables fast ( 10MHz ) SCSI and synchronous operation, but still disables command queueing ( tagged command support )

A reboot is required to effect these changes.

In practical use, you may notice very little difference in actual performance as a result of these changes.

We have more information on the scsi_options settings.

Back to the index

NetWare backups not working - after password, server name or IP change

One common stumbling point is changing the administrator password without also changing the Legato server client setup.
This is listed as a possible cause of the failure in the savegroup completion message. Also, simply retrying with the wrong password will eventually activate the intruder detection system so you can not back up the system ( or even log into it ) even if you select the correct password.

NetWare and Legato NetWorker can also get confused by a change in either the name or the IP address of the Legato server. You may find that the Legato server-controlled savegrp commands fail to run, yet manual saves from the NetWare end run fine.

Thanks to Allan Nelson of Merlewood for the following:

You don't need to reboot the clients to get them to pick up the server correctly.

All you need to do (on the client) is go to the Networker screen, press F2 (File) and select the 'Change Server' option.  In there, tell it to search for Networker servers (don't type the name in).  It will then listen for Networker servers broadcasting and give you a list of them - there'll be 1 - yours! Select that, and that's it.

It's always best if you have problems with connections to do this from the client.  It at least tells you that it's 'aware' of a Networker server out there and can communicate with it.

Back to the index

If you still have problems, please contact iTSS Systems Group

  • By electronic mail to syshelp@mail.nwl.ac.uk and itsshelp@itss.nerc.ac.uk

  • or by phoning us on