What is the Backup Validator and do I need it? Probably, yes

No replies
Dave Kinchlea
Dave Kinchlea's picture
Offline
Joined: 2009-04-22

There may well be people who don't know what this product is supposed to do (never mind how it works).The question often  comes up as to whether it is needed and if so by whom and when. Before I try to answer those questions, however, I'll try to explain it's reason for being.

The problem is that when you use an External File Store within your Livelink system you have created a disconnect between where the files are stored and where the metadata for the files are stored. The "files" are stored within a file system supplied by some external service (a file server of some description) and the metadata are stored within a database. These two services are both external to Livelink and distinct from each other ... they are not aware of each other and do not communicate together in any way. It is Livelink that is responsible for use of the two services and any logical connection between them.

For ordinary transactions Livelink handles the connection well, if one service or the other fails during creation (for instance) then the transaction will fail and both services will be attempted to be restored to before the transaction (only serious external flaws will prevent that from happening). The problem occurs when Livelink is not in control of the transactions, that is when a third party external to Livelink is accessing the service. In this case a backup / restore service(s) needs to create a backup/snapshot of a running system.

The first thing to note, if you have a backup window that allows for service disruption (ie: if you can turn the service off for a period of time because nobody is using it) then you do not need the Backup Validator, Livelink will not interfere. But for anybody trying to provide a 24x7 service then there may still be a need.

The Problem in Detail

The disconnect between the two external services means that there is a synchronization problem between any snapshot (backup) of a database and of an EFS ... without stopping services first, there is no possibility of guaranteeing the two snapshots are taken at exactly the same time. But even if that problem can be solved, the real problem is any ongoing transactions while "snapshots" are being created.

The creation of Backup Validator goes back to a time where the cost of "snapshot" technology with file systems was prohibitive for most organizations. Backups were often done directly to tape and could take many hours to complete. The problem Livelink administrators faced is that the time it takes to backup a database is typically much less than the time it takes to backup a file system and thus the database backup was "out of sync" with the file system backup. New content could be added and existing content deleted during the time the two snapshots were being written.

New content isn't an issue for most organizations, it just means your backups include extra data that is largely inaccessible to any but system administrators... a potential privacy threat for some but for most organizations this would cause no concern. But deletions are another matter because what it could mean is that files are deleted from the EFS before there is a chance to back it up for the desired snapshot. This becomes a serious issue at restore time because the database will point to content that will not exist ... a bad thing indeed for an ECM system!

Note that the real problem here is that Livelink is not in charge and, in fact, is unaware of the backup requests. It is an unprivileged consumer of both external services (DB and EFS) and has no administrative connection to either. What Livelink always needed but still doesn't have is a backup or read-only mode that allows for a consistent snapshot to be taken.

It would not only take a serious coding effort but it would also take a big change in design philosophy to see Livelink / Content Server actually control things such that it could guarantee that a backup will be synchronized between DB and EFS. The Backup Validator doesn't control, it tracks the activities of Livelink while it is active, preventing deletions from occurring on the EFS (in a similar way to Recycle Bin / undelete does) thus ensuring that files will not get physically deleted and will be available at restore time.

Upon restore the administrator must reconcile the two backup images and the transactional record created by the Backup Validator by undoing or redoing transactions appropriately to ensure the DB and EFS are synchronized.

This is always a problem with live backups

Technology is constantly improving, the ability to do "instantaneous snapshots" of file systems is quite common and relatively inexpensive today, but there are still demons to be wary of. In all cases, a file system/storage snapshot only works with closed files, any file with an open file descriptor (a process wanting to write) will either delay or be excluded from the snapshot. The action is usually called "quiescing the file system" and it can take an indeterminate amount of time. So even if the snapshot is claimed to be done in seconds, it might take many minutes before it is able to occur or it might not include some critical files.

Because Livelink doesn't have a backup mode, it is not possible to guarantee that a file system can be quiesced and so, in theory at least, a snapshot might never happen (though that would be a busy Livelink implementation indeed). The point is that the technology only makes it less likely there will be an inconsistency between DB and EFS backups, but it doesn't guarantee it.

The only time consistency isn't an issue is when the database is the exclusive storage container for Livelink content (ie: so-called "BLOB data". This is a viable alternative for some small sites but isn't really feasible for the typical size of ECM solutions today. And use of the Archive Server doesn't change the equation either, from a consistent backup point of view, the Archive Server is exactly equivalent to an EFS as there is no controlling connection between Livelink and the Archive Server.

What about file "modifications"?

One of the best design choices Livelink made was to ensure there was no such thing as modifications to documents/files; there were enforced versioning from the start. Thus though the User Interface allows for the concept of modifying content, at the transactional level as applied to the EFS, all content is new. This means that any file that is backed up from the EFS is the correct file to be restored, whether it was backed up today or one year ago. And thus, in a very real sense, there is no file modifications for Livelink transactions to the EFS.

So who needs the Backup Validator?

Well, the truth is that nobody "needs" the Backup Validator as there are other options available, even those wanting 24x7 service and using tape devices to backkup an EFS do not require the Backup Validator, they just need to adjust the way they address business continuity.

For instance, redo logs on an Oracle DB allow for transactions to be undone (or "rolled back") thus allowing a database to be put into a consistent state after the fact. If after the database backup snapshot is struct, redo logging is turned on until the file system backup is complete and then stopped (checkpoint) with the resulting files stored with the original backups, it is possible to build your way through to a consistent restore.

The Backup Validator only simplifies things in that it does much of this work for you, nobody needs it but many can benefit from it.