Method, device, equipment and medium for server RAS function test

文档序号:7396 发布日期:2021-09-17 浏览:28次 中文

1. A method for server RAS function test is characterized by comprising the following steps:

integrating error injection programs corresponding to the error types of the devices into a management page of a baseboard management controller in advance;

receiving an error injection instruction;

and calling a corresponding target error injection program from the management page according to the error injection equipment and the test mode indicated by the error injection instruction so as to execute error injection operation.

2. The method for testing the RAS function of the server of claim 1, wherein the invoking the corresponding target error injection program from the management page according to the error injection device and the test mode indicated by the error injection instruction to perform the error injection operation comprises:

and under the condition that the fault injection equipment indicated by the fault injection instruction is a CPU and the test mode comprises a plurality of fault type sequence tests, sequentially calling target fault injection programs corresponding to the plurality of fault types of the CPU from the management page according to the plurality of fault type sequences so as to execute fault injection operation.

3. The method for testing the RAS function of the server of claim 1, wherein the invoking the corresponding target error injection program from the management page according to the error injection device and the test mode indicated by the error injection instruction to perform the error injection operation comprises:

and under the condition that the fault injection equipment indicated by the fault injection instruction is a CPU and the test mode comprises N times of target fault types, calling a target fault injection program corresponding to the target fault type of the CPU from the management page to circularly execute N times of fault injection operations.

4. The method for testing the RAS function of the server of claim 1, wherein the invoking the corresponding target error injection program from the management page according to the error injection device and the test mode indicated by the error injection instruction to perform the error injection operation comprises:

calling a target error injection program corresponding to the target error type of the memory from the management page under the condition that error injection equipment indicated by the error injection instruction is the memory and the test mode comprises an error injection mode and a target error type;

clearing the information in the register corresponding to the memory;

and executing the target error marking program according to the error injection mode, and storing the generated error information to the register.

5. The method for testing the RAS function of the server of claim 1, wherein the invoking the corresponding target error injection program from the management page according to the error injection device and the test mode indicated by the error injection instruction to perform the error injection operation comprises:

when the error injection equipment indicated by the error injection instruction is a target port of the PCIE, and the test mode comprises a target error type, retry times and retry time intervals, calling a target error injection program corresponding to the target error type of the PCIE from the management page;

starting the target labeling error program to execute error injection operation on the target port of the PCIE according to the retry time interval; and ending the error injection operation until the number of times of executing the error injection operation on the target port of the PCIE reaches the retry number.

6. The method for testing the functions of the server RAS according to any one of claims 1 to 5, wherein after the error injection device and the test mode indicated by the error injection instruction are used to call the corresponding target error injection program from the management page, the method further comprises:

displaying error information generated by executing error injection operation on a management page of the baseboard management controller; wherein the error information includes whether error injection was successful, an error type, and an error address.

7. A device for testing RAS function of a server is characterized by comprising an integration unit, a receiving unit and an error injection unit;

the integration unit is used for integrating error injection programs corresponding to the error types of the devices into a management page of the baseboard management controller in advance;

the receiving unit is used for receiving an error injection instruction;

and the error injection unit is used for calling a corresponding target error injection program from the management page according to the error injection equipment and the test mode indicated by the error injection instruction so as to execute error injection operation.

8. The apparatus for server RAS function test according to claim 7, further comprising a presentation unit;

the display unit is used for displaying error information generated by executing error injection operation on a management page of the baseboard management controller; wherein the error information includes whether error injection was successful, an error type, and an error address.

9. An apparatus for server RAS function testing, comprising:

a memory for storing a computer program;

processor for executing the computer program for implementing the steps of the method of server RAS function test according to any of claims 1 to 6.

10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the method for server RAS function testing as claimed in any one of claims 1 to 6.

Background

Servers are the heart of network systems and computing platforms, storing a host of important data and running network services. With the development of high technology, the requirements on the security performance, stability performance and the like of the server are very high. The security performance requirements of a server are mainly reflected in RAS performance, which refers to the Reliability (Reliability), Availability (Availability) and Serviceability (Serviceability) of a machine.

A CPU (Central Processing Unit) is a final execution Unit for information Processing and program operation, and serves as an operation and control core of the server. The internal memory, the internal memory and the main memory are used for temporarily storing the operation data in the CPU and the data exchanged by the external memories such as the hard disk and the like, and are a bridge for communicating the external memory and the CPU. The CPU and the memory are used as important data processing and operation centers of the server, and the stability and the performance of the CPU and the memory directly influence the RAS performance of the server. In order to enable the server to work more stably, when errors occur in the CPU and the memory, the errors need to be diagnosed and corrected quickly, which requires the tester to perform error injection testing work in the early stage.

For the domestic marine optical platform, the commonly adopted error injection method is to manually and repeatedly operate and execute various error injection commands, and the manual error injection mode causes the test efficiency of error injection testing work to be low and errors are easy to occur.

Therefore, how to improve the testing efficiency of the error injection testing work is a problem to be solved by the technical personnel in the field.

Disclosure of Invention

An object of the embodiments of the present application is to provide a method, an apparatus, a device, and a computer-readable storage medium for server RAS function testing, which can improve the testing efficiency of error injection testing work.

To solve the above technical problem, an embodiment of the present application provides a method for testing a RAS function of a server, including:

integrating error injection programs corresponding to the error types of the devices into a management page of a baseboard management controller in advance;

receiving an error injection instruction;

and calling a corresponding target error injection program from the management page according to the error injection equipment and the test mode indicated by the error injection instruction so as to execute error injection operation.

Optionally, the invoking a corresponding target error injection program from the management page according to the error injection device and the test mode indicated by the error injection instruction to perform an error injection operation includes:

and under the condition that the fault injection equipment indicated by the fault injection instruction is a CPU and the test mode comprises a plurality of fault type sequence tests, sequentially calling target fault injection programs corresponding to the plurality of fault types of the CPU from the management page according to the plurality of fault type sequences so as to execute fault injection operation.

Optionally, the invoking a corresponding target error injection program from the management page according to the error injection device and the test mode indicated by the error injection instruction to perform an error injection operation includes:

and under the condition that the fault injection equipment indicated by the fault injection instruction is a CPU and the test mode comprises N times of target fault types, calling a target fault injection program corresponding to the target fault type of the CPU from the management page to circularly execute N times of fault injection operations.

Optionally, the invoking a corresponding target error injection program from the management page according to the error injection device and the test mode indicated by the error injection instruction to perform an error injection operation includes:

calling a target error injection program corresponding to the target error type of the memory from the management page under the condition that error injection equipment indicated by the error injection instruction is the memory and the test mode comprises an error injection mode and a target error type;

clearing the information in the register corresponding to the memory;

and executing the target error marking program according to the error injection mode, and storing the generated error information to the register.

Optionally, the invoking a corresponding target error injection program from the management page according to the error injection device and the test mode indicated by the error injection instruction to perform an error injection operation includes:

when the error injection equipment indicated by the error injection instruction is a target port of the PCIE, and the test mode comprises a target error type, retry times and retry time intervals, calling a target error injection program corresponding to the target error type of the PCIE from the management page;

starting the target labeling error program to execute error injection operation on the target port of the PCIE according to the retry time interval; and ending the error injection operation until the number of times of executing the error injection operation on the target port of the PCIE reaches the retry number.

Optionally, after the error injecting device and the test mode indicated by the error injecting instruction are called from the management page, the method further includes:

displaying error information generated by executing error injection operation on a management page of the baseboard management controller; wherein the error information includes whether error injection was successful, an error type, and an error address.

The embodiment of the application also provides a device for testing the RAS function of the server, which comprises an integration unit, a receiving unit and an error injection unit;

the integration unit is used for integrating error injection programs corresponding to the error types of the devices into a management page of the baseboard management controller in advance;

the receiving unit is used for receiving an error injection instruction;

and the error injection unit is used for calling a corresponding target error injection program from the management page according to the error injection equipment and the test mode indicated by the error injection instruction so as to execute error injection operation.

Optionally, the error injection unit is configured to, when the error injection device indicated by the error injection instruction is a CPU, and the test mode includes multiple error type sequence tests, sequentially call, from the management page, target error injection programs corresponding to multiple error types of the CPU according to multiple error type sequences, so as to perform error injection operation.

Optionally, the error injection unit is configured to, when the error injection device indicated by the error injection instruction is a CPU and the test mode includes N times of tests of a target error type, call a target error injection program corresponding to the target error type of the CPU from the management page to cyclically execute N times of error injection operations.

Optionally, the error injection unit includes a call subunit, a clear subunit, an execution subunit, and a storage subunit;

the calling subunit is configured to, when the fault injection device indicated by the fault injection instruction is a memory and the test mode includes a fault injection mode and a target fault type, call a target fault injection program corresponding to the target fault type of the memory from the management page;

the clearing subunit is configured to clear information in the register corresponding to the memory;

the execution subunit is configured to execute the target error labeling program according to the error injection mode;

and the storage subunit is used for storing the generated error information to the register.

Optionally, the error injection unit is configured to, when the error injection device indicated by the error injection instruction is a target port of the PCIE, and the test mode includes a target error type, a retry number, and a retry time interval, call a target error injection program corresponding to the target error type of the PCIE from the management page; starting the target labeling error program to execute error injection operation on the target port of the PCIE according to the retry time interval; and ending the error injection operation until the number of times of executing the error injection operation on the target port of the PCIE reaches the retry number.

Optionally, the device further comprises a display unit;

the display unit is used for displaying error information generated by executing error injection operation on a management page of the baseboard management controller; wherein the error information includes whether error injection was successful, an error type, and an error address.

The embodiment of the present application further provides a device for testing RAS functions of a server, including:

a memory for storing a computer program;

a processor for executing the computer program to implement the steps of the method for server RAS functional testing as described in any of the above.

An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method for server RAS function testing as described in any one of the above.

According to the technical scheme, the error injection program corresponding to the error type of each device is integrated in the management page of the baseboard management controller in advance; receiving an error injection instruction; and calling a corresponding target error injection program from the management page according to the error injection equipment and the test mode indicated by the error injection instruction so as to execute error injection operation. In the technical scheme, the error injection program is integrated on the management page of the baseboard management controller, when the error injection test is required to be performed on the equipment, the corresponding target error injection program is directly called to perform the error injection test on the equipment according to the test mode, various error injection commands are executed without manual repeated operation, and the test efficiency of the error injection test work is improved. And the mode of calling the target label error program only needs the tester to input the error injection instruction, and the tester does not need to be very familiar with the error injection operation and the error injection type, so that the professional requirement on the tester is reduced, and the execution difficulty of the test is further reduced.

Drawings

In order to more clearly illustrate the embodiments of the present application, the drawings needed for the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

Fig. 1 is a flowchart of a method for testing RAS functions of a server according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of an apparatus for testing RAS functions according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a device for testing a RAS function of a server according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without any creative effort belong to the protection scope of the present application.

In order that those skilled in the art will better understand the disclosure, the following detailed description will be given with reference to the accompanying drawings.

Next, a method for testing RAS functions of a server provided in an embodiment of the present application is described in detail. Fig. 1 is a flowchart of a method for testing a RAS function of a server according to an embodiment of the present application, where the method includes:

s101: and integrating error injection programs corresponding to the error types of the devices into a management page of the baseboard management controller in advance.

The device may be a device that affects the performance of the server, such as a CPU, a memory, a PCIE (Peripheral Component Interconnect Express) device, and the like.

The error types corresponding to each device are different, for example, the error type corresponding to the CPU may include Processor repairable (Processor recoverable), non-fatal (Processor non-recoverable) that the Processor cannot repair, and the like. The types of errors corresponding to the memory may include Single-bit recoverable (Single-bit recoverable), Multi-bit recoverable (Multi-bit recoverable error), Multi-bit non-recoverable (Multi-bit non-recoverable error), and the like. The error types corresponding to PCIE may include lcrc _ tx, lcrc _ rx, ecrc _ tx, ecrc _ rx, acs _ total, acs _ normal, etc.

The error injection program may be used to implement an error injection operation that performs a certain type of error on the device.

In the embodiment of the present application, the error injection program may be integrated into a Management page, i.e., a web page, of a Baseboard Management Controller (BMC), so that a tester can call the error injection program conveniently to perform error injection testing work.

S102: and receiving an error injection instruction.

In practical application, a tester can directly select a fault needing to be injected on a BMC web page, which is equivalent to inputting an error injection instruction. The system can realize fault injection operation by executing a corresponding error injection program.

S103: and calling a corresponding target error injection program from the management page according to the error injection equipment and the test mode indicated by the error injection instruction so as to execute error injection operation.

The error injection device can be a CPU, a memory or a PCIE and the like. The test mode may include the type of error, and the mode of performing error injection, etc.

The following description will be given by way of example of several common error-injection modes of the apparatus. In a specific implementation, the error injection device indicated by the error injection instruction is taken as a CPU, and the test mode includes a plurality of error type sequence tests.

For example, assuming that error injection operations of three error types, namely, a Processor coretable, a Processor unorderable non-total, and a Processor unorderable fat, need to be sequentially executed on the CPU, the system may first call an error injection program corresponding to the Processor coretable; after the error injection of the Processor coretable is finished, calling an error injection program corresponding to the Processor unorderable non-normal; and finally calling an error injection program corresponding to the Processor Uncoretable fat after the error injection of the Processor Uncoretable non-fat is completed.

Taking the error injection device indicated by the error injection instruction as the CPU, and taking the test mode including N times of tests of the target error type as an example, in a specific implementation, a target error injection program corresponding to the target error type of the CPU may be called from the management page to cyclically execute N times of error injection operations.

Wherein N is a positive integer, and the specific value thereof may be set according to actual requirements, which is not limited herein.

The target error routine refers to the type of error currently required to perform the error injection operation. The target error labeling program may be an error labeling program corresponding to one error type, or may be all error labeling programs corresponding to multiple error types. When the target error injection program is all error injection programs corresponding to multiple error types, the error injection programs corresponding to the error types can be executed in sequence.

By setting the test times and calling a program with wrong target marking from the management page, the same device can be tested for many times under the target error type without inputting error-marking instructions for many times by a tester, and the work of the tester is simplified.

Taking error injection equipment indicated by an error injection instruction as a memory, and taking a test mode comprising an error injection mode and a target error type as an example, in specific implementation, a target error injection program corresponding to the target error type of the memory can be called from a management page; clearing information in a register corresponding to the memory; and executing the target error program according to the error injection mode, and storing the generated error information into a register.

The error injection mode corresponding to the memory may include a persistent injection mode (persistent), a single-shot injection mode (one-shot), an address-based injection mode (address-base), and the like.

The Memory corresponding register is typically a UMC (unified Memory Controller) or MCA (Machine Check architecture) status (status) register.

In the embodiment of the present application, in order to effectively record error information generated after executing a target error labeling program, information in a register corresponding to a memory needs to be cleared first, and then the target error labeling program is executed according to an error injection mode, so that the generated error information can be stored in the register.

Taking the error injection device indicated by the error injection instruction as a target port of the PCIE, and taking a test mode including a target error type, retry times, and retry time intervals as an example, in a specific implementation, a target error injection program corresponding to the target error type of the PCIE may be called from a management page; starting a target error marking program to execute error injection operation on a target port of the PCIE according to a retry time interval; and ending the error injection operation until the number of times of executing the error injection operation on the target port of the PCIE reaches the retry number.

The PCIE has a plurality of ports, and the target port refers to a port that needs to perform error injection operation currently.

The retry time interval may be a time interval between each time the error injection operation is performed on the target port of the PCIE and the next time the error injection operation is performed. The number of retries may be the number of times the error injection operation is performed.

The retry time interval and the retry number may be set according to actual requirements, and are not limited herein.

In the embodiment of the application, in order to facilitate the tester to know the current test situation, a corresponding target error injection program can be called from a management page according to error injection equipment and a test mode indicated by an error injection instruction, so that after error injection operation is executed, error information generated by executing the error injection operation is displayed on the management page of the baseboard management controller; the error information may include, among other things, whether the error injection was successful, the error type, and the error address.

The error type refers to an error type of performing an error injection operation on an error injection device. The wrong address refers to the failed device.

According to the technical scheme, the error injection program corresponding to the error type of each device is integrated in the management page of the baseboard management controller in advance; receiving an error injection instruction; and calling a corresponding target error injection program from the management page according to the error injection equipment and the test mode indicated by the error injection instruction so as to execute error injection operation. In the technical scheme, the error injection program is integrated on the management page of the baseboard management controller, when the error injection test is required to be performed on the equipment, the corresponding target error injection program is directly called to perform the error injection test on the equipment according to the test mode, various error injection commands are executed without manual repeated operation, and the test efficiency of the error injection test work is improved. And the mode of calling the target label error program only needs the tester to input the error injection instruction, and the tester does not need to be very familiar with the error injection operation and the error injection type, so that the professional requirement on the tester is reduced, and the execution difficulty of the test is further reduced.

Fig. 2 is a schematic structural diagram of an apparatus for testing a server RAS function provided in an embodiment of the present application, including an integrating unit 21, a receiving unit 22, and an error injecting unit 23;

an integration unit 21, configured to integrate an error injection program corresponding to an error type of each device into a management page of the baseboard management controller in advance;

a receiving unit 22, configured to receive an error injection instruction;

and the error injection unit 23 is configured to invoke a corresponding target error injection program from the management page according to the error injection device and the test mode indicated by the error injection instruction, so as to perform an error injection operation.

Optionally, the error injection unit is configured to, when the error injection device indicated by the error injection instruction is a CPU and the test mode includes multiple error type sequence tests, sequentially invoke respective target error injection programs of multiple error types of the CPU from the management page according to the multiple error type sequences to perform the error injection operation.

Optionally, the error injection unit is configured to, when the error injection device indicated by the error injection instruction is a CPU and the test mode includes N times of tests of the target error type, call a target error injection program corresponding to the target error type of the CPU from the management page to cyclically execute N times of error injection operations.

Optionally, the error injection unit includes a call subunit, a clear subunit, an execution subunit, and a storage subunit;

the calling subunit is used for calling a target error injection program corresponding to the target error type of the memory from the management page under the condition that the error injection equipment indicated by the error injection instruction is the memory and the test mode comprises an error injection mode and a target error type;

the clearing subunit is used for clearing the information in the register corresponding to the memory;

the execution subunit is used for executing the target marking error program according to the error injection mode;

and the storage subunit is used for storing the generated error information into the register.

Optionally, the error injection unit is configured to, when the error injection device indicated by the error injection instruction is a target port of the PCIE and the test mode includes a target error type, a retry number, and a retry time interval, invoke a target error injection program corresponding to the target error type of the PCIE from the management page; starting a target error marking program to execute error injection operation on a target port of the PCIE according to a retry time interval; and ending the error injection operation until the number of times of executing the error injection operation on the target port of the PCIE reaches the retry number.

Optionally, the device further comprises a display unit;

the display unit is used for displaying error information generated by executing error injection operation on a management page of the baseboard management controller; wherein the error information includes whether the error injection was successful, an error type, and an error address.

The description of the features in the embodiment corresponding to fig. 2 may refer to the related description of the embodiment corresponding to fig. 1, and is not repeated here.

According to the technical scheme, the error injection program corresponding to the error type of each device is integrated in the management page of the baseboard management controller in advance; receiving an error injection instruction; and calling a corresponding target error injection program from the management page according to the error injection equipment and the test mode indicated by the error injection instruction so as to execute error injection operation. In the technical scheme, the error injection program is integrated on the management page of the baseboard management controller, when the error injection test is required to be performed on the equipment, the corresponding target error injection program is directly called to perform the error injection test on the equipment according to the test mode, various error injection commands are executed without manual repeated operation, and the test efficiency of the error injection test work is improved. And the mode of calling the target label error program only needs the tester to input the error injection instruction, and the tester does not need to be very familiar with the error injection operation and the error injection type, so that the professional requirement on the tester is reduced, and the execution difficulty of the test is further reduced.

Fig. 3 is a schematic structural diagram of a device 30 for server RAS function test according to an embodiment of the present application, including:

a memory 31 for storing a computer program;

a processor 32 for executing a computer program to implement the steps of the method of server RAS functional testing as described in any one of the above.

The embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and when being executed by a processor, the computer program implements the steps of the method for server RAS function testing as described above.

A method, an apparatus, a device and a computer-readable storage medium for server RAS function testing provided by the embodiments of the present application are described in detail above. The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

完整详细技术资料下载
上一篇:石墨接头机器人自动装卡簧、装栓机
下一篇:人脸识别设备的测试方法和装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!