By: Michael Cole
In the past, maintenance programs for distribution systems have tended to be prescriptive in nature, focused on component reliability and lacking a formal process to strategically manage circuit performance and customer impacts. Nevertheless, local Line Operations are to be given full credit for managing maintenance as best they could, based on available maintenance information, common sense and intuition. Aspects of maintenance management that require improvement include:
Reliability performance criteria and targets are essential to strategic planning and these are currently being developed at B.C. Hydro, firstly at the area/district level and then for individual circuits. This is being done by synthesizing a variety of factors including:
The end result will be unique but equitably managed reliability targets for each area, district and circuit. It is important to recognize that maintenance, while very important, is but one activity affecting reliability in a larger scheme that also involves circuit design, protection schemes, restoration logistics, component purchase specifications, quality control, work practices, and component re-use practices.
At B.C. Hydro, various useful distribution information databases have matured over the years and these include the Distribution Trouble Reporting System (DTRS), the Pole Management System (PMS), the Wood Pole Inspection Maintenance System (WPIMS), the Distribution Maintenance Management System (DMMS), the Vegetation Management System (VMS), the Work Management System (WMS), the Material Management System (MMS) and the Geographic Facilities Information System (GFIS). The information in these databases, in combination with RCM concepts, can be used to develop methodologies for the more effective allocation of maintenance and other expenditures.
Reliability Indices
Distribution systems consist mainly of large populations of relatively simple components that are located in the public domain and subject to a high level and variety of events that have the potential to cause system failure. It is, first of all, useful to examine distribution system trouble data. For B.C. Hydro, the system wide 1997/98 fiscal year results are presented in Table 1.
Of significance are the service interruption causes and their effects on SAIFI, SAIDI and CAIDI, and whether or not the cause is "condition independent" or "condition dependent".
"Condition independent" means that from a maintenance perspective the interruption is largely independent of the condition of the circuit components and, hence, largely independent of the maintenance of those components. For example, motor vehicles, trees, birds and lightning generally do not discriminate among new, old and deteriorated facilities when causing interruptions and these causes are classified as independent of the condition of circuit components. Tree trimming and vegetation management programs can, therefore, be administered without factoring in the condition of circuit components.
"Condition dependent" means that the interruption is largely dependent on the condition of the component and, therefore, largely influenced by the maintenance of those components. In this sense, "maintenance" includes component replacement prior to "wearing out". Condition dependent interruptions are caused by the deterioration of mechanical and electrical properties of components over time due to exposure to mechanical and electrical stresses and various environmental conditions.
It is interesting to note that the condition dependent causes of service interruptions are currently only minor contributors to system wide SAIFI and SAIDI values. This suggests that unless local service interruptions caused by component deterioration have gotten out of hand, significant improvement to service reliability lies elsewhere (in circuit design or distribution automation for example ). On the surface, therefore, it appears that a short term objective for B.C. Hydro should be to simply "keep the lid on" condition dependent component failures at the least possible cost. This can also be viewed as an exercise in "containment".
From the view point of outages caused by equipment failures, the system wide ten year trend at B.C. Hydro compared to customer growth has slowly risen over the last 10 years from 0.0010 annual outages per customer to 0.0014 annual outages per customer. The trend is depicted in Graph 1.
Administrators of maintenance programs need to carefully consider the local probability and consequence of a component failure to help decide if component replacement, after it fails (as opposed to before it fails), is a viable option. The probability, duration and extent of an outage caused by a component failure should be part of the decision making process. These decisions will be aided by data on the age, type and condition of components, circuit redundancy, protection and switching schemes, and the local ability to repair circuits and restore service. Primary indicators of success in component maintenance programs that address condition dependent interruptions are, listed in order of importance, SAIFI, CAIDI and SAIDI. These three indices can be applied by geographic area or on a circuit by circuit basis and should be included in the RCM feedback loop.
For some distribution system components such as poles, kiosks and padmount transformers, B.C. Hydro keeps detailed historical data, but for the majority of the components this is not done due to the practicality and cost of tracking unit performance histories for large populations of relatively simple but critical components. Additionally, the data available at B.C. Hydro is relatively recent in comparison to the life span of components and many component makes and models cannot be easily identified by geographical location. Consequently, insufficient data exists to feed into the traditional RCM model. However, in a generic sense, large component populations lend themselves to statistical performance analyses which can be combined with RCM concepts to provide a useful result. In order to do so, a good trouble reporting system is required to complement inspection and condition monitoring programs.
Defining Circuit Components
One of the first tasks in an RCM program is to define system boundaries which are manageable. This means segmenting circuits into manageable "components". A "component" can be a simple part such as pin insulator or an equipment site such as a switched capacitor bank with multiple parts. There is no right or wrong way to do this and it may require an iterative process in order to get to the final result. The final selection will depend on a number of factors including the value of the asset, the component population, the function of the component, the dominant failure modes, the available component data, and the consequences of component failure. The best approach is to start with a breakdown that looks about right , work it through to the end, and then make adjustments as required. Figure 1 gives an example of how an overhead distribution circuit may be broken down into components.
Determining Maintenance Drivers and Priorities
Safety
The primary consideration in any inspection and maintenance program is the safety of the utility worker and the utility's obligation to public safety. The safety driver is the most difficult to rationalize and quantify. The two questions that need to be asked are: What is the acceptable risk of a harm-causing component failure and how does one know when the risk has been reduced to acceptable levels? Utility worker safety translates into maintaining circuits and components so that the utility worker can operate them safely in both emergency and non-emergency situations in accordance with the practices and procedures inwhich the worker has been trained.
Public safety translates into maintaining circuits and components so that the risk of harm meets or exceeds societal acceptance. A benchmark for societal acceptance is how a court of law would view a utility's practice in the event of litigation. Generally, this means that the bare-bones maintenance practice to address safety must be as good as or better than the utility industry standard. Therefore, the first order of business is to fully understand the industry norms in this regard and to know how your utility practice compares with other utilities. Some minimum safety expectations will be enshrined in various National and Provincial standards, codes and regulations.
An important "criticality" measure is the degree to which a distribution system component fails in a fail-safe mode. In this sense, a "fail-safe failure mode" is one that is unlikely to cause a significant safety hazard. As a starting point, it is useful to simply categorize a dominant failure mode as being either "fail-safe" or "non fail-safe" to avoid being impeded by detailed criticality assessments. Overhead circuits in general have more components that have the potential to fail in a non fail-safe manner than underground systems. For example, wood pole structures that support bare primary wires do not generally fail in a fail-safe mode and therefore require a considerable level of attention just to meet public safety requirements. Underground cables, on the other hand, usually fail in a fail-safe manner and require considerably less attention to safety.
Asset Utilization
Asset utilization comes into play where the investment in a particular circuit component is significant, replacement costs are high and the cost to extend the service life is economically viable. The wood pole examples in Figures 1 and 2 illustrate this point where a typical pole replacement done live-line, including equipment transfers, might cost $2650 and a replacement deferral has a present value of about $1220. Notably, the total annual maintenance cost of the test and treat program which includes inspections, repairs and replacements amounts to only $7 per pole annually when averaged over the entire pole population. In this case, asset utilization, which is driven strictly by repair versus replacement economics, has a high priority in maintenance decisions.
Service Reliability
Service reliability is reflected in the frequency and duration of interruptions experienced by customers. The contribution that maintenance makes to interruptions must be carefully analyzed in tune with reliability targets and in concert with other activities that impact service reliability in order to develop lowest cost solutions. In practice, it is useful to first determine the level of maintenance required to deal only with the safety and asset utilization drivers. The spin-off from meeting these two requirements alone may be adequate to address customer service requirements for a particular area or circuit, thereby simplifying the analysis. Figure 3 exemplifies a component where service reliability has a higher priority than asset utilization.
Actuarial Concepts
A conceptual understanding of the life characteristics of large populations of components is helpful in making maintenance decisions. Mortality (or survivor) curves known as "Iowa Type" survivor curves and their derivatives were developed in the 1930's and revalidated in the 1980's as being representative of physical plant mortality.
Figure 4 illustrates such a curve along with the derived probable life curve and the derived replacement frequency curve. The curve shown is a "right mode" #3 curve which is believed to approximate the survival of B.C. Hydro wood poles. Given the long average service life of distribution components, accurately matching a survivor curve to a particular component generally requires at least 40 years of good records dating back to the time of component installation.
At B.C. Hydro such data does not exist. In the case of wood poles, about 27 years worth of reasonably good records have been kept. This information along with pole demographics and historical pole population growth estimates were used to conclude that wood pole survival is probably characterized by an R3 survivor curve and a 35 year average life span.
This survivor characteristic includes poles that are replaced before they are worn out, that is, prior to a condition dependent failure occurring. At the current system wide pole replacement rate, approximately 30 per cent of poles are replaced prior to wearing out. Reasons for replacing poles with remaining service life capacity include equipment upgrades, additional road clearance requirements, MVAs, falling trees, and pole relocations. Survivor curves are not particularly useful for accurately determining the right time to replace a component before it wears out. However, components will definitely be replaced shortly after they fail or as required for design changes and survivor curves are useful in anticipating the approximate ongoing funding required for component replacements.
The probable life curve shows the probable average life of survivors from an original group. Referring to the example in Figure 4, at age zero, the probable life is 35 years. For those components that have survived to age 35, the probable life is 42 years. Table 2 further illustrates this by listing the life expectancy or probable remaining service life of the average wood pole survivor by the years of service that the pole has already seen.
The replacement frequency histogram shows the rate at which units from an original group are replaced over time. The example in Figure 4 is actually a retirement frequency histogram, but, for the purpose of discussion, is called a replacement frequency histogram on the assumption that all retired units will be replaced. Most of the units in a given age interval will be replaced because they are worn out, but a portion of the units will be replaced due to design changes before they are worn out. Of the units that are worn out and left to fail, the histogram gives a useful insight on how failures and replacements from an original population of like components will progress over time. Looking at this from a local service area or circuit perspective, maintenance decisions pertaining to service reliability should take into consideration whether or not components can simply be left to fail, particularly during the age interval when the highest failure rate is expected.
Assuming that all the retirements are in fact due to component failure (which is in reality not the case), the maximum annual failure rate in the example will be 4.4 per cent of the original population during the 3.5 year age interval between 35 to 38.5 years. On a per circuit basis, such a failure rate would have to be assessed based on the population of affected components in the circuit and the impact on reliability targets for that circuit. For most components it will, in fact, be very difficult to actually do such an actuarial analysis with a meaningful degree of accuracy due to insufficient historical data. However an understanding of the concept is important when monitoring failure trends in order to predict component failure frequencies. To complement the monitoring of failure trends, good condition monitoring tools, which are calibrated to the remaining service life of components, are extremely desirable.
Research to develop such tools should be supported. Ideally, one would like to have the option of replacing components just prior to the time when they would otherwise fail with some confidence that the time of replacement was not overly premature. Of particular interest in this regard is the development of better non-destructive condition monitoring tools for underground cable and improved methods of detecting and measuring partial discharge in distribution components.
Circuit Assessment
The effective allocation of maintenance resources will in part be driven by the circuit configuration and the probability of consumer interruptions caused by component outages. To illustrate the methodology for assessing a circuit, refer to Figure 5. This 12 kV circuit serves 900 customers and has a mix of commercial, industrial and residential loads. For the sake of argument, the circuit is separated into three sections whose boundaries are defined by the source point, node-1, node-2 and node-3. Node-1 is a junction point where the circuit branches out into three legs. At node-2 there is a commercial customer with a demand load of about 1.5 MVA and at node-3 there is a manufacturing customer with a demand load of about 0.6 MVA.
This circuit has three year average SAIFI and CAIDI values of 0.9 and 3.6 respectively. The recorded outage statistics for component damage over a ten year period are given in the trouble reporting system extracts (Figure 6). Of the ten component damage interruptions, 6 were condition independent and 4 were condition dependent.
Detailed analysis of the circuit between node points can be done by compiling in-line component data along with probable condition dependent interruption rates as listed in Table 3.
Commentary
The calculated interruption rate due to condition dependent component failures seen by the customers at nodes 2 and 3 are 0.03 and 0.06 respectively. The non-zero interruption rates used in the example are system wide values which have not been component age adjusted for the specific circuit. A methodology to factor in the component age and condition monitoring histories has yet to be developed. The circuit protection is such that a component failure interruption in the node-1 to node-2 section will take out the entire circuit.
However, the probability of condition dependent interruptions in this section will be very low because of the newness of the components. A component failure in the node-1 to node-3 section will likely be limited to that section due to fusing at node-1. In this section, the probability of condition dependent interruptions is also low because of low component populations, although it is noted that the overhead components are aging but not likely at a critical stage yet. Outside of "regular" inspection, repair and replacement programs, it appears that in the short term, no additional component maintenance is required with respect to condition dependent interruptions seen by the two customers covered in the analysis.
However, some extra condition monitoring should be considered for the very old plant close to the source where components are more likely to be in poor condition and where component failures will have the largest impact on the 900 customers served by the circuit.
Recommended Future Work
This article just scratches the surface and much remains to be done. Recommended future work includes the following:
Fred L. Kaempffer is with B.C. Hydro.
ET