By Daniel Carreau, Serge Ménard, Michel Landry and Kamil Murat Eksioglu
Condition monitoring has amply demonstrated its worth and reliability as real-time operating-data acquisition and processing strategy, condition-monitoring equipment has demonstrated its worthiness and reliability. To make efficient use of the data obtained, however, unique expertise and significant manpower has been required and neither of these resources is always available in a timely manner.
This article traces the evolution of a collaborative project aimed at the implementation of generic and breaker specific rules for a software-based, diagnostic expert system. Working backward from a user-needs assessment, the development guidelines chosen ensure a usable and cost-effective end result
. The article also addresses the design provisions made in order to allow integration of foreign monitoring system outputs as well as the inter-operability (information exchange capability) at the substation and network operating and planning levels.
Data acquisition and condition monitoring
A condition-monitoring system uses appropriate and calibrated sensors that translate critical physical characteristics (gas, pressure, driving-rod travel, etc.) into electrical signals. These are routed to an acquisition sub-system where they are converted to digital data and temporarily stored in memory buffers of appropriate capacity.
The data is then processed, archived and trended to create time specific, event specific and historical profiles of the condition of the apparatus. These fingerprints are then compared to reference or benchmark information. Insufficient correlation will generate alerts or alarms, depending on the nature of the discrepancy. In extreme cases, automatic and immediate actions can be triggered, under preset user control, by the condition monitoring system in order to enhance power system performance (transformer loading or transit routing) or to prevent catastrophic failures. These monitoring systems continuously track the operational parameters of interest, flagging both static and evolving abnormal conditions and predicting component deterioration and possible malfunctions.
A cost-justified condition monitoring and assessment system is characterized by two fundamental missions:
- To alert concerned staff on an 'as needed' and timely basis to mission critical information and recommendations
- To provide detailed off-line information that will enhance user knowledge and optimize future operations and planning decisions.
The accomplishment of these objectives results directly from the successful execution of the following six tasks:
Task 1 - Selective acquisition of analog and binary data
It must be emphasized that generic analysis is extremely beneficial for determining the minimal and optimal selection of sensors, their characteristics and their installation location and procedures for a given application. Configurations must be developed that provide both measured and calculated variables in both event-driven and operating environment classes. This study will provide information that will help maximize the ratio of the'value of the information' over the 'cost of the measurement'. Lastly, but not least, is the benefit of integrating informational inputs (processed or not) from other devices or systems within the substation.
Task 2 - Efficient and secure archiving of acquired data
In order to be transformed into enhanced knowledge, the information derived from acquired data needs to be integrated. The implementation of a knowledge- (rule-) based expert system in order to assess accurately the condition of critical apparatus would suffer if new knowledge were not constantly integrated into the expert rules set.
Task 3 - Condition assessment of the apparatus
At an elementary level, this task consists of comparing the acquired and computed raw data with expected values that mimic the expected 'normal' condition of the apparatus. At a higher performance and more reliable level, knowledge- (rule-) based expert-system software programs provide detailed event-triggered and timely detailed assessments of the apparatus condition. Details include identification of the affected component(s), the nature of the anomaly, its gravity (qualitative or quantitative), probable cause(s), consequence(s) if not addressed, a timeline of future degradation and recommended remedial action(s). The final decision to act remains with the human experts, while the time consuming and expensive task of sifting through unmanageable amounts of raw data is eliminated.
Task 4 - Present the raw data (if required), information and expert system knowledge (if installed)
Human interaction with the condition monitoring and assessment systems will be necessary over the course of the working day. The system must present only what is required and desired by the stakeholders, whatever their level of interaction or location, and must present it efficiently, securely and at the lowest possible cost. The ability to port outputs to external higher level management and user interface systems is growing in importance within the power industry. Standards are being completed that will set the ground rules for efficient and reliable communication between interacting systems.
Task 5 - Configuring the condition monitoring and assessment system
Physical installations will continue to require human involvement. Training programs and reference information sources must be optimized to minimize costs and vendor dependence and to ensure correct and safe installations. Generic software suites should be loaded at the most reliable and cost effective location; application specific software programs or configurations should be commissioned on site, using real field inputs. Configuration should be performed by a workforce that provides the best combination of reliability and cost effectiveness. Self-learning expert system software programs should be encouraged because of their cost benefits, as long as their adaptation provides continued reliability over the life of the installation.
Task 6 - Self-checking
Self-checking is a means of ensuring the reliability level of condition monitoring and assessment systems installed in substations. It is essential that failures within these systems be reliably and immediately detected and that all pertinent details be conveyed to the appropriate staff. A lack of confidence in the monitoring systems will undermine all the benefits expected from this significant investment.
Many systems available from various vendors during the last decade, including Snemo's Moniteq product, have met most of the above requirements. The most significant improvement needed in meeting these requirements was in the area of the knowledge- (rule-) based expert system capability.
Condition assessment resource requirements
Most condition assessment systems without expert systems installed during the last past decade, require users to spend considerable time at the computer screen, interfacing with the system data. This results from a lack of knowledge-based processing of'raw' data into pertinent and easy-to-use information for field and engineering personnel.
Depending on the strictness of correlation settings, additional human-resource loading can occur, caused by significant volume of unselective alerts and/or alarms. The number and frequency of these alarms may induce users to disable or ignore them, thus negating one of main benefits of a condition assessment system. It is not surprising that many users have been reluctant to switch from their historical 'preventive (or scheduled) maintenance' programs to a 'predictive maintenance' policy that will be less costly only if the requirement for human expert interpretation is kept to a minimum.
Every installation has its own unique characteristics. No condition-monitoring and assessment system can accommodate any and all applications using only a generic configuration. Customization, including the adjustment of configuration as experience accumulates and as business objectives change, is a resource-intensive activity. Some closed systems cannot be adapted or customized without 'starting from scratch'. The adaptation of the so-called open-architecture systems can be cost effective if the right design choices are made during product development. Much has been said among industry players regarding the reliability of the monitoring system versus that of the apparatus being monitored. It is a fact that some monitoring installations have been de-commissioned because of unreliable operations; others are still in service but only because heavy resource investments have been made in system installation and commissioning, trouble-shooting, corrections and add-ons, user staff training and monitoring-data interpretation. When the costs of these investments are included in the economic justification equation, the result no longer supports the original decision.
It is clear that power system operators required condition monitoring and assessment systems to move successfully from the R&D pilot project arena into that of technically fit, robust and commercially viable solutions to their business needs. That is where knowledge-based expert-system software systems come in.
Diagnostic and predictive assessment system development project
The initial concept was to superimpose an expert system onto an existing condition-monitoring system. However, early investigations showed that it would be necessary to deconstruct the existing design into its functional components and to re-assemble these according to the new requirements generated by arising from the incorporation of the expert system.
As an accredited ISO 9001 company, Snemo recognized this need and endeavored to elaborate working methods and procedures which to ensure that this objective would be met. The development project followed a rigid step-by-step approach, each step documented and based on the output of the preceding step and thoroughly documented. The result was a 'Requirements' document, the summary issues of which are tabled below. The project was broken down in a pyramid fashion, with each separate function identified at a primary level. Each function was individually coded and fully tested, with records kept of the tests performed and the results. After functions were 'approved', an integration process was initiated from the bottom up. This integration process consisted of testing the interfaces between functions (or groups of functions) two by two. After integrating the lowest level of the functions, the process continued until all the functional blocks were integrated. Tests of the integration of functions by layer were conducted and fully documented.
System tests were performed by a separate group. The system was tested to verify compliance with the requirements and specifications using a variety of methods, including automated injection of simulation data forcing the generation of random data and analysis of the output, and abnormal-condition response of the system (data overload, forced failures of components, etc).
Quality assurance validations were performed throughout the design and test processes, making sure that the project respected approved requirements, schedules and budgets.
Requirements
In the initial phase, the project team established and agreed on the requirements of the software project, the agreement serving as a basis for estimating, planning, performing and tracking the activities throughout the project and subconsequently, the product development life cycle.
The objective was to replace a time-based scheduled maintenance program with a predictive maintenance policy based on continuous assessment of the apparatus condition using a high-voltage apparatus-monitoring system that operates on line in real time. This shift in policy would be approved and implemented only if sufficient evidence was provided that the expert system software was capable of valid interpretation of the primary acquired data.
A rule-based expert system shell is required, with the ability to acquire situation input and to evaluate the data, identify a problem, diagnose the possible cause(s), and suggest recommended action(s) for solving the problem based on event data, or on data acquired over time and previous conditions detected by the system (experience). The apparatus-specific expert system software program must be self-learning (adaptive), continuously correcting its own rules on the basis of a comparison between actual measured and/or calculated data and previously predicted values.
The expert system software must be portable to mobile controller units for off-line and modest-requirement applications. A plug-and-play, peripheral recognition capability will automate and facilitate the system configuration process.
Guidance for user-driven actions (e.g. report generation, configuration modifications, access to lower level data structures) must be in the form of 'wizard' type sub-programs, which eliminate the need for system-specific knowledge.
A set of application-specific, coded, validated and fully documented rules are integrated into the expert system shell. High-level language programs with add-on extensions are required to accomplish specific monitoring tasks. A kernel containing a factory installed rules set could be customized at installation time and can be modified independently by the user at a later date, with the help of a 'tool box'. The sum of these rules represents the system's expertise. The sum of this expertise and the historical experience of events and data represents the knowledge base.
A graphical user interface provides the window-based control and data display through which users have access to the system, locally or remotely. The interfaces can be installed on multiple computers (PC) connected to the resident expert-system host controller, allowing users to monitor site activities from wherever the computers are located. The project stakeholders agreed that the expert system software product must be suitable for implementation on existing legacy installations at a minimum cost to the owners.
Development
The scope of the project was defined in terms of the system functionality, engineering, performance, reliability, availability, serviceability, documentation and training requirements.
Functional specifications were developed for each deliverable software component (which defined how users will interact with the system), key algorithms, and gateways to other sub-systems.
The key steps in the process consisted of:
- Reviewing legacy installation characteristics and owner expectations
- Assessing the feasibility of transferring an existing, proven and hardened, expert system shell to this application
- Assessing typical site infrastructure characteristics
- Defining the System Management (platform, remote access, backup..) architecture
- Establishing a site Configuration Management methodology
- Benchmarking against and ensuring compatibility with existing competitive offerings
- Reviewing and evaluating pertinent Industry Standards (UCA, MMS, CASM..)
- Determining critical design orientations (open system, IED links, inter-operability..)
- Determining application specific and condition-assessment system re-quirements
- Producing a 'Requirements' documentation set
- Producing a detailed 'Functional Specification' documentation set
- Determining software platform characteristics (operating system, communication interfaces and protocols, database product)
- Elaborating expert system architecture and technical specifications: Models and concepts (MMS, CASM, UCA, GOMSFE); Database definition and modeling; Overall system architecture; Modules definition and mapping
- Producing a detailed 'Technical Specification' documentation set
- Validating the Performance, Relia-bility, Availability and Serviceability (PRAS) characteristics
- Determining the Development Environment specifications (language, toolkit, version control..)
- Producing the 'Software Develop-ment Plan' documentation set.
While the above process is taking place, and at other strategic points, peer and management reviews are conducted, with appropriate sign-off, to confirm collective stakeholder acceptance of the work-in-progress.
Implementation
Implementation of the requirements and specifications was performed in two distinct steps.
The first step consisted of the development of an overall hardware and software infrastructure that would host the apparatus-specific expert rules. In the nomenclature of Snemo's Condition Monitoring Diagnostic Assessment System (COMDAS), also known as Moniteq¨III, this infrastructure is referred to as the Substation Controller (SC). The Substation Controller software structure is composed of 4 layers as depicted in Figure 1.
The second step consists of the continuous development of expert rules sets which are specific to each type of apparatus being monitored and/or to each specific application. Each rules set development effort is treated as a sub-project within the overall expert system software development project. As such, it can be assigned to outside development groups, while remaining under the governance of the detailed specifications set. This approach favours short lead times and creates a healthy cost-competitive environment. Each rules set is developed in compliance with the framework shown in Figure 2.
The Functional Requirements task consists of:
- Defining the measured physical variables (sensor/converter outputs)
- Defining the calculated physical variables (from sensor/converter output combinations)
- Defining the calculated physical variable transfer functions
- Defining the combinatorial combinative relations between measured and/or calculated variables
- Defining the diagnostic assessment rules
The Technical Specifications task consists of:
- Identifying and defining common characteristics across multiple apparatus types
- Developing the suite architecture
- Designing the software common modules
- Designing the software apparatus specific modules
The Implementation task consists of:
- Coding the common modules
- Coding the apparatus specific modules
- Testing all unit modules
- Testing the integrated system (generic, non apparatus specific)
The Validation task consists of performing the following (at the factory):
- Simulation tests (apparatus specific)
- Failure scenario definition (apparatus specific)
- Failure scenario testing (apparatus specific)
- Acceptance (factory and site) tests definition (customer and order specific)
- Factory acceptance tests (customer and order specific)
The Documentation task consists of producing the following documents:
- A narrative of the Rules
- A mathematical demonstration of the Rules
- A logic-tree diagram
- A condition-assessment diagnostics table
- The software technical specifications
- The compiled source codes and coding comments
- All test reports (plans and results)
- All delivery, installation, site acceptance tests, warranty and service agreements are executed on a customer- and order-specific basis.
Foreign system integration and inter-operability
As the relatively new concepts of Reliability Centered Maintenance (RCM) and Station Predictive Maintenance (SPM) take hold within the deregulated electrical power industry, host monitoring installations are technically and financially prejudiced by the difficulty associated with integrating foreign systems and achieving true inter-operability.
Within the scope of the expert system software development project, critical attention was paid to ensure that foreign monitoring system data outputs (electrical analog or optical digital signals) could be processed successfully within the Moniteq¨ III overall system architecture. The conversion of foreign data into digital format, which can be processed by the acquisition sub-system and station controller software, has been achieved.
The user sees a single, fully transparent and cohesive information format. The expert system software will incorporate a specific rules set for each apparatus monitored by foreign systems. It will also provide an application-specific rules set to accommodate possible combinative relations between different various apparatus (that would be traditionally performed by a human expert).
The ability to exchange data and information on a peer-to-peer basis with equivalent- or higher-level systems within a client's operations and maintenance sectors was seen as a key requirement.
Conclusion
Through the Expert System Software Development project, a condition monitoring system graduated to the rank of a high level, self-learning, automated Diagnostic Condition Assessment system. This achievement was a critical step in providing users with a complete array of benefits at a cost that has become much easier to justify. Power companies can confidently migrate now to a Station Predictive Maintenance (SPM) program within a Reliability Centered Maintenance (RCM) policy, and expect to benefit from the cost savings that accrue.
The successful outcome of the project is decidedly the result of a collaborative effort: the stakeholders included 3 utilities, 1 research institute, 1 manufacturer and countless independent expert consultants.
Spanning more than 2 years and amounting to approximately $1.5 million (US), this investment was clearly necessary and worthwhile. On-going costs will include the cost-of-development of additional apparatus-specific rules sets.
One of the major challenges remains that of harnessing and incorporating the expertise of the remaining human experts into reliable and secure software programs. This is especially true for the apparatuses that seem destined to outlive their associated experts for economic reasons.
Daniel Carreau and Serge Ménard are with SNEMO Ltd. Michel Landry and Kamil Murat Eksioglu are with Hydro-Québec/IREQ.