Method for evaluating influence of fine particles on legal infectious diseases in different reporting orders
1. A method for evaluating the influence of fine particles on legal infectious diseases in different reporting orders is characterized by comprising the following steps:
s1, extracting legal infectious disease data disclosed by the target city in the research period, wherein the legal infectious disease data comprises the legal infectious disease type number, the infectious agents number and the death agents number;
s2, acquiring the average concentration data of PM2.5 in the corresponding legal infectious disease period of a target city, defining the concentration of PM2.5 on the day of disease report as the immediate concentration PM-0day, the daily average PM2.5 concentration in 7 days and 15 days before the report as the short-term average concentration PM-7day and PM-15day, the daily average PM2.5 concentration in 30 days before the report as the medium-term average concentration PM-30day, and the daily average PM2.5 concentration in 60 days before the report as the long-term average concentration PM-60 day;
s3, carrying out data correlation on the legal infectious disease data and the average concentration data of PM2.5, and synthesizing a target data set;
s4, performing correlation analysis on the target data set to determine the correlation between the PM2.5 concentration and the legal infectious disease, and screening out significant variables of the correlation;
s5, establishing a Bayes regression model of the determined infectious disease in different reporting orders by using a correlation significant variable construction method, wherein parameter prediction variables of the Bayes regression model are infection quantity, and prediction factors are PM2.5 concentration, infectious disease names in different reporting orders and interaction factors of the infectious disease names;
and S6, extracting a marginal effect curve between the PM2.5 concentration and the number variable of the infected persons under various infectious disease names based on a Bayesian regression model, and evaluating the quantitative relation between the PM2.5 concentration and the number of the infected persons under different reporting orders according to the marginal effect curve.
2. The method of claim 1, wherein the step of S4 is performed by using Pearson correlation coefficient when performing correlation analysis on target data set;
the correlation coefficient r is:
in the formula (1), r is a correlation coefficient between X and Y, X is PM-0day, PM-7day, PM-15day, PM-30day and PM-60day, Y is the number of legal infectious diseases, the number of infectors and the number of deaths, and N is the total amount of the data set samples of the target data set; the correlation significant variable refers to a variable with r > 0.05.
3. The method of claim 1, wherein the Bayesian regression model has a Poisson distribution data distribution, an iteration number of 2000, a divergence transition parameter of 0.99, and a maximum iteration depth of 15.
4. The method of claim 1, wherein the PM2.5 marginal effect model calculation of the marginal effect curve is predicted from the mean PM2.5 values at different concentrations in a bayesian regression model, and the confidence level of the marginal effect of the disease number and the PM2.5 concentration is 95%.
Background
With the development of economy and the improvement of living standard, the emission of industry and traffic and natural phenomena lead to the rising air pollution level, and the air pollution, particularly the fine particulate matter PM2.5 in the environment, becomes a public health problem with much attention. The relationship between the incidence of the official infectious diseases of korea and the climate, air pollution and hospital level was studied by jin et al to find that air pollution is strongly correlated with the incidence of infectious diseases (bacillary dysentery, syphilis, tuberculosis, etc.). You et al studied the seasonal relationship between PM2.5 concentration and Tuberculosis (TB) outside the special administrative district of hong Kong and Beijing City by Poisson linear regression method, and found that the PM2.5 concentration increases by 10 μ g/m in winter3The number of TB cases will increase by 3% in the next spring and summer, and the seasonal fluctuation of TB is more prominent among the elderly and children. The relation between air pollutants and scarlet fever incidence in 2013-2014 in Beijing area is analyzed by Mahara and the like by adopting a spatial regression method, and the air pollution factor is found to improve the scarlet fever incidence. Existing studies also indicate that differences in the health effects caused by differences in the PM2.5 contaminant concentration may have complex interactions with the PM composition and content, the route of action, and environmental changes such as temperature. In recent years, scholars at home and abroad study the development of air pollution and various infectious diseases, but only study the association relationship between PM2.5 and the morbidity and mortality of legal infectious diseases in different reporting orders.
The report sequence of the legal infectious diseases is the sequence of infectious diseases according to the number of reported cases in a certain period, generally, more attention and attention can be paid to the diseases ranked in the front, and the influence of the diseases under different report sequences on health can be developed to improve the attention to the diseases ranked in the back, so that the method has great significance for controlling the infectious diseases.
Based on the above, in order to study the effect of PM2.5 on the statutory infectious diseases in different reporting orders, it is necessary to provide a method for evaluating the effect of fine particulate matter on the statutory infectious diseases in different reporting orders.
Disclosure of Invention
The invention aims to provide a method for evaluating the influence of fine particles on legal infectious diseases in different reporting orders.
In order to solve the technical problems, the invention adopts the technical scheme that:
a method for assessing the effect of fine particulate matter on statutory infectious disease in different reporting orders, comprising the steps of:
s1, extracting legal infectious disease data disclosed by the target city in the research period, wherein the legal infectious disease data comprises the legal infectious disease type number, the infectious agents number and the death agents number;
s2, acquiring the average concentration data of PM2.5 in the corresponding legal infectious disease period of a target city, defining the concentration of PM2.5 on the day of disease report as the immediate concentration PM-0day, the daily average PM2.5 concentration in 7 days and 15 days before the report as the short-term average concentration PM-7day and PM-15day, the daily average PM2.5 concentration in 30 days before the report as the medium-term average concentration PM-30day, and the daily average PM2.5 concentration in 60 days before the report as the long-term average concentration PM-60 day;
s3, carrying out data correlation on the legal infectious disease data and the average concentration data of PM2.5, and synthesizing a target data set;
s4, performing correlation analysis on the target data set to determine the correlation between the PM2.5 concentration and the legal infectious disease, and screening out significant variables of the correlation;
s5, establishing a Bayes regression model of the determined infectious disease in different reporting orders by using a correlation significant variable construction method, wherein parameter prediction variables of the Bayes regression model are infection quantity, and prediction factors are PM2.5 concentration, infectious disease names in different reporting orders and interaction factors of the infectious disease names;
and S6, extracting a marginal effect curve between the PM2.5 concentration and the number variable of the infected persons under various infectious disease names based on a Bayesian regression model, and evaluating the quantitative relation between the PM2.5 concentration and the number of the infected persons under different reporting orders according to the marginal effect curve.
Optionally, the S4 is implemented by using Pearson correlation coefficient when performing correlation analysis on the target data set;
the correlation coefficient r is:
in the formula (1), r is a correlation coefficient between X and Y, X is PM-0day, PM-7day, PM-15day, PM-30day and PM-60day, Y is the number of legal infectious diseases, the number of infectors and the number of deaths, and N is the total amount of the data set samples of the target data set; the correlation significant variable refers to a variable with r > 0.05.
Optionally, the data distribution of the bayesian regression model is poisson distribution, the iteration number is 2000, the divergence transition parameter is 0.99, and the maximum iteration depth is 15.
Optionally, the PM2.5 marginal effect model calculation of the marginal effect curve is predicted by a bayesian regression model on the PM2.5 mean values at different concentrations, and the marginal effect confidence level of the disease number and the PM2.5 concentration is 95%.
The invention has the beneficial effects that:
the method for evaluating the influence of the PM2.5 on the legal infectious diseases in different reporting orders is provided by extracting the legal infectious disease data of a target city and the corresponding PM2.5 immediate-short-term-medium-term average concentration data and researching the interaction effect and the response relation of the hysteresis effect of the PM2.5 and the legal infectious disease in different reporting orders by using a Bayesian regression model.
Drawings
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a graphical representation of the results of Pearson correlation analysis between different time scales of PM2.5 concentration and the parameters of the legally prescribed infectious disease.
FIG. 3 is a graph showing the results of analysis of the disease infection number and death number variation of the first five and the last five reported numbers.
FIG. 4 is a graphical representation of the interaction of the number of statutory infections with PM2.5 concentrations in different reporting orders: panel A-E in FIG. 4 is a schematic representation of the results of the reporting sequence.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
As shown in fig. 1, the method for evaluating the effect of fine particles on the statutory infectious diseases in different reporting orders in the present embodiment comprises the following steps:
s1, extracting legal infectious disease data disclosed in the target city in the research period, wherein the legal infectious disease data comprises the legal infectious disease type number, the infectious population and the death population.
For example, when the target city is Beijing, legal infectious disease data can be extracted from the disease control center (http:// www.bjcdc.org /) legal infectious disease epidemic Weekly report in Beijing. The research period can be selected according to needs, for example, the data of 583-period cycle of legal infectious diseases reported by 2008-2019 can be selected as legal infectious disease data. Taking the weekly morbidity as the statistical standard of infectious diseases, the data are collated and displayed as follows: the legal infectious diseases with high incidence rate in nearly ten years are respectively: bacillary dysentery (Diarrhea), influenza (Flu), Gonorrhea (Gonorhea), influenza A H1N1 (H1N1_ Flu), viral Hepatitis (HBV), Hepatitis C (HCV), Hepatitis B (Hepatitis B), hand-foot-and-mouth disease (HFMD), Scarlet fever (Scarlet _ lever), Tuberculosis (TB), Syphilis (Syphilis) and other infectious Diarrhea diseases (other), and five types before the report order can be selected as research objects for the convenience of research.
S2, acquiring the average concentration data of PM2.5 in the corresponding legal infectious disease period of the target city, defining the concentration of PM2.5 on the day of disease report as the immediate concentration PM-0day, the daily average PM2.5 concentration in 7 days and 15 days before the report as the short-term average concentration PM-7day and PM-15day, the daily average PM2.5 concentration in 30 days before the report as the medium-term average concentration PM-30day, and the daily average PM2.5 concentration in 60 days before the report as the long-term average concentration PM-60 day.
Still taking the target city as Beijing as an example, the average concentration data of PM2.5 during the corresponding legal infectious disease can be obtained by combining with the environmental monitoring center of Beijing (http:// www.bjshjjc.com /).
And S3, performing data correlation between the legal infectious disease data and the average concentration data of PM2.5, and synthesizing a target data set.
And S4, performing correlation analysis on the target data set to determine the correlation between the PM2.5 concentration and the legal infectious disease, and screening out correlation significant variables.
The relation between the number of the accumulated infectious diseases and the accumulated death number of the legal infectious diseases and the reporting order is evaluated, and the relation between the legal infectious diseases and PM2.5 under different reporting orders is determined by performing correlation analysis on a target data set in consideration of the fact that the reporting order and the cumulative infectious diseases and the cumulative death number of the legal infectious diseases are not in one-to-one correspondence.
Optionally, the S4 is implemented by using Pearson correlation coefficient when performing correlation analysis on the target data set;
the correlation coefficient r is:
in the formula (1), r is a correlation coefficient between X and Y, X is PM-0day, PM-7day, PM-15day, PM-30day and PM-60day, Y is the number of legal infectious diseases, the number of infectors and the number of deaths, and N is the total amount of the data set samples of the target data set; the correlation significant variable refers to a variable with r > 0.05.
Pearson correlation analysis of concentrations of PM2.5 and the statutory infection parameters at different time scales is shown in fig. 2, and the analysis results show that the number of statutory infection species has a strong and significant negative correlation with PM2.5 at each time scale (r0 day-0.26, r7 day-0.23, r15 day-0.26, r30 day-0.34, r60 day-0.43, p <0.05), and the correlation is stronger the larger the time scale is. The legal infectious disease report type number is in negative correlation with the immediate-short term-medium term-long term PM2.5 concentration; the indexes of total infected persons and dead persons have extremely obvious positive correlation (0.82, p is less than 0.01), the total infected persons only have obvious negative correlation (r is-0.17, p is less than 0.05) with the long-term PM2.5 average concentration, and the correlation with the medium-term and short-term PM2.5 average concentration is not obvious; the mortality indicator was not significantly correlated with the mean concentration of PM2.5 at different time scales. The analysis result reveals that the concentration of PM2.5 has a cumulative effect on the number of legal infectious disease types, and the longer the cumulative effect time is, the more obvious the correlation between the two is.
S5, establishing a Bayes regression model of the determined infectious disease in different reporting orders by using the correlation significant variable construction method, wherein the parameter prediction variable of the Bayes regression model is the infection quantity, and the prediction factors are PM2.5 concentration, infectious disease names in different reporting orders and interaction factors of the infectious disease names.
Wherein a bayesian regression model can be constructed using the relevance significant variables based on the R language brm package.
Optionally, the data distribution of the bayesian regression model is poisson distribution, the iteration number is 2000, the divergence transition parameter is 0.99, and the maximum iteration depth is 15, so that the bayesian regression model can select a proper sample amount, and the posterior sample deviation is avoided.
And S6, extracting a marginal effect curve between the PM2.5 concentration and the number variable of the infected persons under various infectious disease names based on a Bayesian regression model, and evaluating the quantitative relation between the PM2.5 concentration and the number of the infected persons under different reporting orders according to the marginal effect curve.
The condition _ effect function in the ggeffects package can be used for extracting a marginal effect curve between the PM2.5 concentration and the infectious person number variable under various infectious disease names. The marginal effect curve may reveal the marginal effect of PM2.5 hysteresis effects interacting with the statutory infectious disease.
Optionally, the PM2.5 marginal effect model calculation of the marginal effect curve is predicted by a bayesian regression model on the PM2.5 mean values at different concentrations, and the marginal effect confidence level of the disease number and the PM2.5 concentration is 95%.
FIG. 3 is a graph showing the results of analysis of the variation of infection and death numbers of the disease in the first five and the last five reported numbers, which reveals statistics of infection and death. As can be seen from FIG. 3, the number of infections caused by viral hepatitis during the study was the greatest (18000), and the number of deaths was relatively high (> 10); scarlet fever (about 15000 infected people and less than 7 lethal people); bacillary dysentery is the second (infection of about 14000 people, more than 7 people fatal); type A H1N1, syphilis and hepatitis B cause the infection number to be relatively lowest, and the death number is also lowest. In addition, the results also revealed that the number of infections caused by viral hepatitis in the second reported order was the highest (> 18000) and the number of deaths was relatively the highest (> 10), whereas influenza in the first order was only about 5000 but less than 5; the cumulative number of infections caused by scarlet fever and bacillary dysentery was also relatively high in the fifth reported order (approximately 14000 and 13000 people). The research result shows that the corresponding relation between the number of infected persons and the reporting order is only established in a single report, but the cumulative infection number and the cumulative death number of the infectious diseases have no strict corresponding relation with the reporting order in a period of time, and the fact that all the infectious diseases need to be paid enough attention regardless of the reporting order is suggested.
FIG. 4 is the interaction of the quorum of infection with the concentration of PM2.5 in different reporting orders, and the interaction of the quorum of infection with the concentration of PM2.5 is shown in FIG. 4. The marginal effect of the Bayes regression model presents the change relationship with the number of patients under the PM2.5 gradient concentration and the obvious trend change characteristic (P) thereof based on the Bayes regression model result<0.001). Under different reporting orders, the process that the number of the infections of each legal infectious disease changes along with the concentration of PM2.5 reveals that the number of the infections of the hand-foot-and-mouth disease and the tuberculosis has a significant descending trend along with the increase of the concentration of PM2.5 (P) when the hand-foot-and-mouth disease and the tuberculosis are taken as the primary reporting orders<0.001) when the PM2.5 concentration is about 20. mu.g/m3At mild contamination, the number of cases was 13 (95% CI:11.358, 14.671), while bacillary dysentery and influenza were relatively less affected by PM2.5 concentrations (Panel A of FIG. 4), which were approximately 20 μ g/m3At mild contamination, the number of patients was approximately 10 (95% CI: 8.620, 13.132); bacillary dysentery as the firstWhen the two reports are ordered, the disease number of the patients is extremely remarkably increased along with the increase of PM2.5 concentration (P)<0.001) (graph B in FIG. 4) when the concentration of PM2.5 is 140-160 μ g/m3The number of cases of severe contamination caused by bacillary dysentery is more than 12 (95% CI: 10.985, 13.244); when infectious diseases were reported in the third or fourth order, the number of infectious diseases was rather significantly decreased as the concentration of PM2.5 was increased (P)<0.001) (panels C and D in fig. 4), with disease numbers below 10 persons (95% CI: 8.663, 12.840) as the fifth reporting order, the trend of disease progression with PM2.5 concentration was again similar to that of the first reporting order (panel E in fig. 4), with disease progression reaching 17 or more (95% CI: 14.651, 20.850). In addition, the frequency of the occurrence of tuberculosis and bacillary dysentery as the primary infectious diseases is relatively high, the viral hepatitis is secondary, the hand-foot-and-mouth disease and the influenza A H1N1 are relatively few, but the trend of the change is different along with the PM2.5 concentration, and the hand-foot-and-mouth disease has a remarkable descending trend and a larger descending amplitude in the first reporting order. These results reveal that there is significant variability between prevalence of each infectious disease and concentration gradient of PM2.5 in different reporting orders, i.e., the interaction between each legally prescribed infectious disease and changes in PM2.5 concentration will respond differently with different reporting orders.
According to the embodiment of the invention, the legally infectious disease data of the target city and the corresponding PM2.5 immediate-short-medium-long-term average concentration are extracted, and the Bayesian regression model is utilized to explore the interaction effect and the response relation of the hysteresis effect of the PM2.5 and the legally infectious disease in different reporting orders, so that the morbidity characteristics and the prevalence strength of the legally infectious disease can be known, a scientific basis is provided for the government to make infectious disease prevention and control measures, and a reference opinion is provided for the healthy travel of people.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.