The synchronization subnet is a connected network of primary
and secondary time servers, clients and interconnecting transmission paths. A primary time server is directly synchronized to a
primary reference source, usually a radio clock. A secondary time server derives synchronization, possibly via other secondary
servers, from a primary server over network paths possibly shared with other services. Under normal circumstances it is intended that
the synchronization subnet of primary and secondary servers assumes a hierarchical-master-slave configuration with the primary servers
at the root and secondary servers of decreasing accuracy at successive levels toward the leaves.
Following conventions established by the telephone industry [BEL86], the accuracy of each server is defined by a number called the
stratum, with the topmost level (primary servers) assigned as one and each level downwards (secondary servers) in the hierarchy
assigned as one greater than the preceding level. With current technology and available radio clocks, single-sample accuracies in the
order of a millisecond can be achieved at the network interface of a primary server. Accuracies of this order require special care in
the design and implementation of the operating system and the local-clock mechanism, such as described in Section 5.
As the stratum increases from one, the single-sample accuracies achievable will degrade depending on the network paths and local-clock
stabilities. In order to avoid the tedious calculations [BRA80] necessary to estimate errors in each specific configuration, it is
useful to assume the mean measurement errors accumulate approximately in proportion to the measured delay and dispersion relative to
the root of the synchronization subnet. Appendix H contains an analysis of errors, including a derivation of maximum error as a
function of delay and dispersion, where the latter quantity depends on the precision of the timekeeping system, frequency tolerance of
the local clock and various residuals. Assuming the primary servers are synchronized to standard time within known accuracies, this
provides a reliable, deterministic specification on timekeeping accuracies throughout the synchronization subnet.
Again drawing from the experience of the telephone industry, which learned such lessons at considerable cost [ABA89], the
synchronization subnet topology should be organized to produce the highest accuracy, but must never be allowed to form a loop. An
additional factor is that each increment in stratum involves a potentially unreliable time server which introduces additional
measurement errors. The selection algorithm used in NTP uses a variant of the Bellman-Ford distributed routing algorithm [37] to
compute the minimum-weight spanning trees rooted on the primary servers. The distance metric used by the algorithm consists of the
(scaled) stratum plus the synchronization distance, which itself consists of the dispersion plus one-half the absolute delay. Thus,
the synchronization path will always take the minimum number of servers to the root, with ties resolved on the basis of maximum
error.
As a result of this design, the subnet reconfigures automatically in a hierarchical-master-slave configuration to produce the most
accurate and reliable time, even when one or more primary or secondary servers or the network paths between them fail. This includes
the case where all normal primary servers (e.g., highly accurate WWVB radio clock operating at the lowest synchronization distances)
on a possibly partitioned subnet fail, but one or more backup primary servers (e.g., less accurate WWV radio clock operating at higher
synchronization distances) continue operation. However, should all primary servers throughout the subnet fail, the remaining secondary
servers will synchronize among themselves while distances ratchet upwards to a preselected maximum infinity due to the well-known
properties of the Bellman-Ford algorithm. Upon reaching the maximum on all paths, a server will drop off the subnet and free-run using
its last determined time and frequency. Since these computations are expected to be very precise, especially in frequency, even
extended outage periods can result in timekeeping errors not greater than a few milliseconds per day with appropriately stabilized
oscillators (see Section 5).
In the case of multiple primary servers, the spanning-tree computation will usually select the server at minimum synchronization
distance. However, when these servers are at approximately the same distance, the computation may result in random selections among
them as the result of normal dispersive delays. Ordinarily, this does not degrade accuracy as long as any discrepancy between the
primary servers is small compared to the synchronization distance. If not, the filter and selection algorithms will select the best of
the available servers and cast out outlyers as intended.
|
|