Wafer test machine and method for training artificial intelligence model to test wafer

文档序号:6466 发布日期:2021-09-17 浏览:32次 中文

1. A wafer tester for testing a wafer including a plurality of dies, the wafer tester comprising:

a measuring device for measuring the dies to obtain a measurement value of each die;

a database for storing the measured values;

a storage circuit for storing a plurality of program instructions or program codes and an artificial intelligence model for testing the wafer; and

a computing circuit, coupled to the storage circuit and the database, for executing the program instructions or program codes to perform the following steps to train the artificial intelligence model:

determining a target die from the dies;

selecting a plurality of reference dies adjacent to the target die according to the target die and a preset range;

generating a main training data, wherein the main training data comprises the measurement value of the target crystal grain and the measurement values of the reference crystal grains;

generating auxiliary training data, wherein the auxiliary training data indicates that the reference crystal grains are qualified crystal grains or unqualified crystal grains; and

and training the artificial intelligence model by using the main training data and the auxiliary training data.

2. The wafer test tool of claim 1, wherein the artificial intelligence model comprises a feature extraction algorithm and a machine learning algorithm model.

3. The wafer testing machine of claim 2 wherein the machine learning algorithm model is selected from the group consisting of Bayesian Ridge Regression (Bayesian Ridge Regression) algorithm, Gaussian Process Regression (Gaussian Process Regression) algorithm, and scalable variable Gaussian Process (scalable variable Gaussian Process) algorithm.

4. The wafer test tool of claim 1, wherein the artificial intelligence model is a deep learning algorithm model, and the deep learning algorithm model comprises a Convolutional Neural Network (Convolutional Neural Network) algorithm model and a Mixed Density Neural Network (MDNN) algorithm model.

5. The wafer test apparatus of claim 1, wherein the training data is a first training data, the computing circuitry further performs the steps of:

generating second auxiliary training data, wherein the second auxiliary training data indicates whether at least one of the reference crystal grains and the target crystal grain exists or not; and

training the artificial intelligence model with the second auxiliary training data together with the primary training data and the first auxiliary training data.

6. The apparatus of claim 1, wherein the assistant training data further indicates whether the reference dies exist.

7. The wafer test apparatus of claim 1, wherein the primary training data and the secondary training data correspond to a combination of temperature and voltage.

8. The wafer test apparatus of claim 1, wherein the primary training data and the secondary training data correspond to a plurality of combinations of a plurality of temperatures and a plurality of voltages.

9. The wafer test apparatus of claim 1, wherein the primary training data and the secondary training data are a matrix or an array, and the relative positions of a plurality of elements of the matrix or the array correspond to the relative positions of the target die and the reference dies on the wafer.

10. A method for training an artificial intelligence model to test a wafer, the wafer comprising a plurality of dies, the method comprising:

determining a target die from the dies;

selecting a plurality of reference dies adjacent to the target die according to the target die and a preset range;

generating a main training data, wherein the main training data comprises a measurement value of the target crystal grain and the measurement values of the reference crystal grains;

generating auxiliary training data, wherein the auxiliary training data indicates that the reference crystal grains are qualified crystal grains or unqualified crystal grains; and

and training the artificial intelligence model by using the main training data and the auxiliary training data.

Background

The steady-state Supply Current (IDDQ) is a common feature for detecting whether a die (die) is faulty or not in a Complementary Metal Oxide Semiconductor (CMOS) circuit test. For dies with correct function, the variation (variation) of current between different test patterns is very small, i.e., the IDDQ measurement values of different test patterns should approach the average IDDQ value of a single die circuit. Conventionally, a single IDDQ threshold is used to determine whether a die fails.

However, since the leakage current of the transistor is a major part of IDDQ in the cmos circuit, process variation (process variation) causes the leakage current variation, and thus, the IDDQ value varies from die to die on the same wafer due to the process variation. In other words, the use of a single IDDQ threshold in conventional IDDQ testing does not meet practical requirements.

Disclosure of Invention

In view of the deficiencies of the prior art, an object of the present invention is to provide a wafer testing machine and a method for training an artificial intelligence model to test a wafer, so as to solve the problems encountered in the prior art.

A wafer testing machine for testing a wafer including a plurality of dies is disclosed. The wafer tester comprises a measuring device, a database, a storage circuit and a calculation circuit. The measuring equipment is used for measuring the dies to obtain a measured value of each die. The database is used for storing the measured values. The storage circuit is used for storing a plurality of program instructions or program codes and storing an artificial intelligence model for testing the wafer. The computing circuit is coupled to the storage circuit and the database, and is used for executing the program instructions or program codes to execute the following steps to train the artificial intelligence model: determining a target die from the dies; selecting a plurality of reference dies adjacent to the target die according to the target die and a preset range; generating a main training data, wherein the main training data comprises the measurement value of the target crystal grain and the measurement values of the reference crystal grains; generating auxiliary training data, wherein the auxiliary training data indicates that the reference crystal grains are qualified crystal grains or unqualified crystal grains; and training the artificial intelligence model by using the main training data and the auxiliary training data.

A method for training an artificial intelligence model to test a wafer is also disclosed. The wafer includes a plurality of dies. The method comprises the following steps: determining a target die from the dies; selecting a plurality of reference dies adjacent to the target die according to the target die and a preset range; generating a main training data, wherein the main training data comprises a measurement value of the target crystal grain and the measurement values of the reference crystal grains; generating auxiliary training data, wherein the auxiliary training data indicates that the reference crystal grains are qualified crystal grains or unqualified crystal grains; and training the artificial intelligence model by using the main training data and the auxiliary training data.

The wafer testing machine and the method for training the artificial intelligence model to test the wafer take crystal grains around the target crystal grain into consideration, and the artificial intelligence model is used for assisting in judging whether the target crystal grain is in fault or not, so that the fault crystal grain can be found out more accurately and quickly compared with the traditional technology.

The features, implementations and functions of the present disclosure will be described in detail with reference to the drawings.

Drawings

FIG. 1 is a functional block diagram of an embodiment of a wafer tester;

FIG. 2 is a functional block diagram of an embodiment of an artificial intelligence model and training data according to the present disclosure;

FIG. 3 is a flowchart of an embodiment of a method for training an artificial intelligence model to test a wafer;

FIG. 4 shows a wafer including a plurality of dies;

FIG. 5 is a schematic diagram of the internal architecture of the artificial intelligence model of FIG. 2;

FIG. 6 is a flowchart of wafer testing based on an artificial intelligence model according to the present disclosure; and

FIG. 7 is a flowchart illustrating another embodiment of a method for training an artificial intelligence model to test a wafer.

Detailed Description

The technical terms in the following description refer to conventional terms in the technical field, and some terms are explained or defined in the specification, and the explanation of the terms is based on the explanation or definition in the specification.

The disclosure includes a wafer tester and a method for training an artificial intelligence model to test a wafer. Since some of the components included in the wafer tester may be known components alone, the following description will omit details of known components without affecting the full disclosure and feasibility of the embodiments of the apparatus. In addition, some or all of the processes of the method for training an artificial intelligence model to test a wafer may be in the form of software and/or firmware, and may be performed by the wafer tester or an equivalent thereof.

FIG. 1 is a functional block diagram of an embodiment of a wafer tester. The wafer tester 100 includes a measurement device 110, a database 120, a calculation circuit 130, and a storage circuit 140. A wafer includes a plurality of dies. Before being tested by the wafer tester 100, each die on the wafer is tested by other testers and is determined to be a passed die or a failed die. A good die is a die that can function properly, while a bad die is a die that cannot function properly. The measurement equipment 110 measures the target characteristics of the qualified dies to obtain a measurement value for each qualified die. In some embodiments, the target characteristic may be the steady-state supply current, and the measured value may be a current value of the steady-state supply current. In other embodiments, the target characteristic may be a ring oscillator frequency (ring oscillator frequency), a thermal meter value (thermal meter value), or a voltage sensor value (voltage sensor value), and the corresponding measured values are frequency, temperature, and voltage, respectively. Similar to the steady state supply current, the ring oscillator frequency, the calorimeter value, or the voltage sensor value can also be used as a feature to determine if a die is faulty. In the art, a person skilled in the art knows how to measure the steady-state supply current, the ring oscillator frequency, the thermal measurement value, and the voltage sensor value of the die, and therefore details of the structure and operation of the measurement apparatus 110 are not described herein. The following description takes a steady-state supply current as an example, but the present invention is not limited to the steady-state supply current.

The database 120 stores measurement values measured or output by the measurement equipment 110 and data indicating whether the die is acceptable or unacceptable. The storage circuit 140 may be implemented by a volatile memory and/or a non-volatile memory, and the storage circuit 140 stores a plurality of program instructions or program codes and stores an artificial intelligence model (AI model) for testing a wafer. The computing circuit 130 may be a circuit or an electronic device with program execution capability, such as a central Processing Unit (cpu), a microprocessor, a micro Processing Unit (GPU), or a Graphics Processing Unit (GPU), which trains the artificial intelligence model by executing the program instructions or program codes. Once the artificial intelligence model training is completed, the wafer tester 100 can use the artificial intelligence model to determine whether the qualified die is faulty.

FIG. 2 is a functional block diagram of an embodiment of an artificial intelligence model and training data according to the present disclosure. FIG. 3 is a flowchart illustrating an embodiment of a method for training an artificial intelligence model to test a wafer. The following description refers to fig. 1 to 3.

First, the calculating circuit 130 determines a target die from a plurality of dies of a wafer, and selects a plurality of reference dies adjacent to the target die according to the target die and a predetermined range (step S310). Referring to fig. 4, fig. 4 shows a wafer 400 including a plurality of dies. Die 410, die 420, and die 430 may be the target die described above, and region 415, region 425, and region 435 may be the predetermined range described above. In the example of fig. 4, the predetermined range is a rectangle (containing 49 dies at most) of 7 × 7 and the target die is located at the center of the predetermined range; however, the preset range is not limited to the rectangle of 7 × 7, but may be other sizes and shapes, such as a rectangle of 5 × 5 or a rectangle of 3 × 10. Furthermore, the target die is not limited to be located at the center of the predetermined range.

Each square in fig. 4, which is represented by a gray scale (without white) represents a die, and the blank area (including but not limited to white square) represents a die where there is no die or a die where there is a failure. For example, area 415 includes 4 failed dies and 45 pass dies, area 425 at the edge of wafer 400 includes 3 failed dies and 33 pass dies, and area 435 includes 5 failed dies and 44 pass dies. The gray scale value may represent a magnitude of a measured value of a target characteristic of the die, for example, the gray scale value may be proportional to the measured value.

After step S310 is completed (i.e., after the target die and the plurality of reference dies are determined), the calculating circuit 130 generates the main training data 202 according to the measured values of the target die and the reference dies (step S320), that is, the main training data 202 includes the measured value of the target die and the measured values of the reference dies. For example, the primary training data 202 corresponding to region 415 may be represented as (where I(x,y)As a measurement value of the target crystal grain, x and y are integers):

since the unqualified die has no measurement value, step S320 further includes the following substeps: the average of the measured values of the neighboring qualified dies is used as the missing measured value (step S325). In some embodiments, the calculation circuit 130 calculates an average of the eight measurement values around the missing measurement value and takes the average as the missing measurement value. For example, wherein I(p,q)Are missing measurement values (p and q are integers representing defective grains)Coordinates of (d) and when the number of neighboring qualified dies of the unqualified die is less than eight, only the average of the measured values of the neighboring qualified dies is calculated. It is noted that, since the target die is the predicted target, the calculation circuit 130 regards the measurement value of the target die as the missing measurement value, and takes the average of the measurement values of the reference dies around the target die as the measurement value of the target die.

Next, the calculation circuit 130 generates the assistant training data 204 according to whether the target die and the reference die are qualified dies (step S330). The assistant training data 204 indicates whether the reference dies are qualified dies or unqualified dies. For example, the secondary training data 204 corresponding to area 415, area 425, and area 435 are as follows ("1" represents a failed die):

after the primary training data 202 and the secondary training data 204 are generated, the computing circuit 130 trains the artificial intelligence model 210 with the primary training data 202 and the secondary training data 204 (step S340), i.e., inputs the primary training data 202 and the secondary training data 204 into the artificial intelligence model 210. The artificial intelligence model 210 includes a feature extraction algorithm 212 and a machine learning algorithm model 214.

The feature extraction algorithm 212 is used to select a representative set of feature values (feature sets) in the primary training data 202 and the secondary training data 204. In addition to reducing the over-fitting (over-fitting) phenomenon, the feature extraction algorithm 212 can also reduce the complexity of the mathematical model. The documents "L.C. Molina, L.Belanche, A.Nebot (2002). Feature selection algorithm: a surveiy and experimental evaluation.2002 IEEE International Conference on Data Mining,2002. proceedings" discuss examples of several Feature extraction algorithms, and one of ordinary skill in the art can refer to the documents to complete the Feature extraction algorithm 212, and thus will not be described again.

The machine learning algorithm model 214 is used to process the set of feature values generated by the feature extraction algorithm 212. The machine learning algorithm used in the present disclosure may include a Bayesian Ridge Regression (Bayesian Ridge Regression) algorithm, a Gaussian Process Regression (Gaussian Process Regression) algorithm, a scalable variable Gaussian Process (CNN) algorithm, or a Convolutional Neural Network (CNN) algorithm. Because the convolutional neural network algorithm includes the function of feature extraction, when the algorithm used by the machine learning algorithm model 214 is a convolutional neural network algorithm, the feature extraction algorithm 212 may be omitted (i.e., the feature extraction algorithm 212 is integrated into the convolutional neural network algorithm).

FIG. 5 is a schematic diagram of one of the internal architectures of the artificial intelligence model 210 (e.g., deep learning algorithm model) of FIG. 2. In the embodiment of FIG. 5, the artificial intelligence model 210 is implemented as a deep learning algorithm model, wherein the deep learning algorithm model includes a convolutional neural network algorithm model 216 and a mixed density neural network algorithm model 218. The number of filters of each convolutional layer of the convolutional neural network algorithm model 216 can be set arbitrarily. Compared to fig. 2, the embodiment of fig. 5 may omit the feature extraction algorithm 212 of fig. 2 because the deep learning algorithm model of fig. 5 employs the convolutional neural network algorithm model 216.

A Mixed Density Neural Network (MDNN) algorithm model 218 is used to predict the complete probability distribution. The general architecture of the mixed density neural network algorithm is the same as that of a general multilayer Perceptron (Multiple Layer Perceptron), but the mixed density neural network algorithm is connected to three independent layers (layers) in addition to a full connection Layer (full connection Layer), which are respectively 'Alpha (Alpha)', 'Mu (Mu)', and 'Sigma (Sigma)'. In the present case, "Alpha (α)" can be ignored. The loss function (loss function) used in the hybrid density neural network algorithm of the present application is shown in the following equation. One of ordinary skill in the art can complete the mixed density neural network algorithm model 218 using the references "Bishop, Christopher M. (1994)," texture density networks.

The primary training data 202 and the secondary training data 204 are fed into the convolutional layer 510 of the convolutional neural network algorithm model 216 of the artificial intelligence model 210, and are expanded into a one-dimensional tensor (tensor) after being processed by the convolutional layer 510, and then the one-dimensional tensor is input to the fully-connected layer 530 of the mixed density neural network algorithm model 218, and then is divided into two independent fully-connected layers: a fully-connected layer (μ)540 and a fully-connected layer (σ) 550. In some embodiments, if primary training data 202 and secondary training data 204 are each a matrix of N xN (N is a positive integer), convolutional layer 510 contains 12 convolutional kernels (kernel) whose output feature map is a matrix of 12N ' xN ' (N ' ≦ N); thus, the dimension of the one-dimensional tensor is 12xN ', the dimension of the fully-connected layer 530 is (12 xN') x512, and the dimensions of the fully-connected layer (μ)540 and the fully-connected layer (σ)550 are both 512x 256. One of ordinary skill in the art can implement the artificial intelligence model 210 according to the above-described embodiments.

Please refer to fig. 3. After step S340 is finished, the computing circuit 130 selects the next target die on the current wafer, and then performs steps S310 to S340 again until all the dies on the current wafer are used as the target dies. After all dies of the current wafer have been used as target dies, the calculation circuit 130 may select metrology values of the next wafer from the database 120 to continue to perform steps S310-S340.

During the training process, the artificial intelligence model 210 continuously adjusts the parameters by using the measured value of the target die as the target average value. After training, the artificial intelligence model 210 can predict the threshold range of the measured value of the target die, i.e. the average value μ ± the standard deviation σ of the set coefficient x, where the set coefficient is a parameter of the adjustable threshold range, when the set coefficient is 1, μ - σ is the lower threshold, and μ + σ is the upper threshold. If the measurement value of the target crystal grain is equal to or greater than μ - σ and equal to or less than μ + σ, the target crystal grain is determined as a non-defective crystal grain.

Referring to fig. 6, fig. 6 is a flowchart illustrating wafer testing based on an artificial intelligence model. First, the measurement equipment 110 measures the target characteristics of a plurality of dies of the wafer to obtain a measurement value of each qualified die (step S610). Next, the calculating circuit 130 determines a target die, and selects a plurality of reference dies adjacent to the target die according to the target die and a predetermined range (step S620). Step S620 is similar to step S310, and therefore is not described again. Next, the calculating circuit 130 generates main test data (step S630), and step S630 includes a sub-step S635. The format of the primary test data is the same as the primary training data 202. Since steps S630 and S635 are similar to steps S320 and S325, respectively, they are not repeated. Next, the calculating circuit 130 generates the auxiliary test data (step S640), and the format of the auxiliary test data is the same as the auxiliary training data 204. Since step S640 is similar to step S330, it is not repeated. Then, the calculating circuit 130 inputs the main test data and the auxiliary test data into the trained artificial intelligence model 210 to determine whether the target die fails (step S650). The artificial intelligence model 210 predicts a range of a threshold value of the measured value of the target die from the measured value of the reference die, and then determines whether the measured value of the target die falls within the range of the threshold value. If so, the artificial intelligence model 210 (or the computing circuit 130) determines that the target die is not faulty; if not, the artificial intelligence model 210 (or the computational circuitry 130) determines that the target die is faulty.

FIG. 7 is a flowchart illustrating another embodiment of a method for training an artificial intelligence model to test a wafer. Step S710, step S720, step S725, and step S730 are similar to step S310, step S320, step S325, and step S330 of fig. 3, respectively, and thus are not repeated herein. The first auxiliary training data of step S730 is the auxiliary training data of step S330. In the embodiment of fig. 7, the calculation circuit 130 further generates second auxiliary training data (step S740), which indicates whether the target die and/or the reference dies are located at the edge of the wafer or whether the reference dies are present. For example, referring to fig. 4, since both the area 415 and the area 435 include NxN grains (pass or fail), the second training data corresponding to the area 415 and the area 435 can be expressed as ("0" represents that there is a grain at the position):

for another example, since the area 425 includes the inside and the outside of the wafer 400, the second auxiliary training data corresponding to the area 425 can be expressed as ("0" represents die at the position, and "" 1 "represents no die at the position):

as shown in the above example, when the target die and/or the reference dies are located at the edge of the wafer (as shown in area 425), the second auxiliary training data comprises two values ("" 0 "and" "1"); and when the target die and/or the reference dies are not located at the edge of the wafer (as shown in areas 415 and 435), the second auxiliary training data only includes one value ("0").

After the primary training data, the first auxiliary training data and the second auxiliary training data are generated, the calculating circuit 130 trains the artificial intelligence model 210 with the primary training data, the first auxiliary training data and the second auxiliary training data (step S750).

In another embodiment, the assistant training data of fig. 3 may indicate that the reference dies are qualified dies or unqualified dies, and/or indicate whether the target die and/or the reference dies are located at the edge of the wafer. For example, referring to fig. 4, the training data corresponding to the area 415 and 435 can be expressed as ("0" represents the location as a good die and "" 1 "represents the location as a bad die or no die):

as another example, the auxiliary training data corresponding to region 425 may be represented as:

as shown in the above example, the position without the die is regarded as the unqualified die in the present embodiment, in other words, the assistant training data of the present embodiment is equivalent to the union set of the first assistant training data and the second assistant training data of the embodiment in fig. 7.

Because the process conditions of the adjacent crystal grains on the wafer are approximate, the threshold value of the measurement value of the target characteristic can be obtained more accurately by judging the crystal grains in the local range rather than all the crystal grains in the whole wafer, and the probability of misjudgment can be reduced. For example, the IDDQ of die 410 of fig. 4 may not exceed the conventional IDDQ threshold, but its IDDQ value may still fall outside the range of the threshold (i.e., the average μ ± set coefficient x standard deviation σ) compared to the surrounding dies (i.e., the reference die in region 415). It has been found through experimentation that such a die 410 is most likely to be a failed die, whereas conventional testing methods fail to find that the die 410 is a failed die.

As shown in the foregoing examples, the primary training data and the secondary training data are presented in the form of a matrix or array, and the relative positions of the elements of the matrix or array reflect the relative positions of the target die and the reference die on the wafer, i.e., the elements of the matrix or array are arranged according to the positions of the target die and the reference die on the wafer. Thus, the wafer can be viewed as an image (each die represents a pixel), and the elements of the primary and secondary training data can be modeled as pixel values of the image.

In some embodiments, the primary training data and the secondary training data correspond to a single voltage and temperature combination, i.e., the primary training data and the secondary training data are measured at a single voltage and temperature combination. However, since the measurement of the die depends on the voltage and temperature depending on whether the die is acceptable or not, in other embodiments, the primary training data and the secondary training data may correspond to a plurality of voltage and temperature combinations. For example, if there are four voltage-temperature combinations (e.g., two temperatures with two voltages), in the embodiment of fig. 3 and 7, the training data actually includes four combinations of primary and secondary training data, each corresponding to one voltage and temperature combination.

In summary, the present disclosure takes the dies around the target die into consideration, and uses an artificial intelligence model to assist in determining whether the target die fails, so as to find out the failed die more accurately and quickly. Furthermore, experiments show that the artificial intelligence model is trained by the main training data and the auxiliary training data, and more accurate results can be obtained compared with the artificial intelligence model which is trained by only the main training data.

Because the implementation details and variations of the method embodiments of the present disclosure can be understood by those skilled in the art from the disclosure of the apparatus embodiments of the present disclosure, repeated descriptions are omitted here for the sake of brevity and without affecting the disclosure requirements and the implementability of the method embodiments. It should be noted that the shapes, sizes, proportions, and sequence of steps of the elements and steps shown in the drawings are illustrative only and not intended to be limiting, since they are within the knowledge of persons skilled in the art.

Although the embodiments of the present invention have been described above, these embodiments are not intended to limit the present invention, and those skilled in the art can apply variations to the technical features of the present invention according to the contents of the present invention, which may be included in the scope of the patent protection sought by the present invention.

[ notation ] to show

100 wafer tester

110 measuring equipment

120 database

130 computing circuit

140 storage circuit

400: wafer

410,420,430 crystal grain

415,425,435 area

202 primary training data

204 auxiliary training data

210 artificial intelligence model

212 feature extraction Algorithm

214 machine learning algorithm model

216 convolutional neural network algorithm model

218 hybrid density neural network algorithm model

510 convolutional layer

530 full connection layer

540 full connection layer (mu)

550 full connection layer (sigma)

S310 to S340, S610 to S650, S710 to S750

完整详细技术资料下载
上一篇:石墨接头机器人自动装卡簧、装栓机
下一篇:半导体装置以及测量处理系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类