Online capacity expansion IO scheduling method based on hot data and delete repeat operation
1. An online capacity expansion IO scheduling method based on hot data and delete repeat operation is characterized by comprising the following steps:
initializing a storage system;
monitoring user access requests over a period of time;
when the user access request is write operation, comparing the physical address of the user access request with the destination address of the data block to be migrated, if the ranges of the physical address of the user access request and the destination address of the data block to be migrated are completely consistent, modifying the destination address of the write operation, and modifying the destination address of the write operation into the destination address of the data block; if the ranges of the two are partially consistent, reading the residual missed data of the data block to be migrated and splicing the data into the content of the write operation, and modifying the address of the write operation into the target address of the data block to be migrated;
determining a hot data space with the maximum counter value in the period of time, and then generating a migration request corresponding to a data block needing to be migrated in the hot data space;
sequencing the user access request and the migration request;
and executing the sequenced user access request and the migration request.
2. The method according to claim 1, wherein the step of initializing the storage system comprises: partitioning a thermal data space with integer multiples of a memory stripe of a capacity expansion scheme, the thermal data space comprising: physical address information, size of physical space, information of data blocks that the space needs to migrate, and counters.
3. The online capacity expansion IO scheduling method based on hot data and delete repeat operations of claim 1, wherein the step of monitoring the user access request for a period of time comprises: analyzing the type of the user access request and the accessed physical address position item by item; and calculating the thermal data space to which the physical address is accessed according to the accessed physical address position, and adding 1 to a counter of the thermal data space.
4. The online capacity expansion IO scheduling method based on hot data and delete repeat operation of claim 1, further comprising the steps of:
analyzing the user access request and the migration request with the same operation content, and directly modifying the target position of the user write operation execution to the target position of the data block needing to be migrated if the user access request and the migration request are the same as each other in a complete way; if the data block content is partially identical, namely a part of the data block content is hit, a new read request is generated, the data content which is not hit by the user write operation in the data block is read out and spliced in the content of the user write operation, and the target position of the user write operation is also modified into the target position of the block needing to be migrated.
5. The online capacity expansion IO scheduling method based on hot data and delete repeat operation of claim 4, further comprising the steps of: and releasing the processed user write operation request to a time period to wait for the next operation, sending a deletion request to the hot data space, and deleting the migration operation request with the same content as the user write operation from the data block information needing to be migrated in the hot data space.
Background
With the deep integration and development of network technology in various industries of society, the enterprise data center storage system is in need of mass data storage, so that the capacity of the existing storage system cannot meet the higher requirement of increasing data scale on capacity. In order to increase the capacity of the data center storage system, expanding the capacity space of the existing storage system is a technology which must be adopted, namely, a capacity expansion technology. In the process of storing cluster data expansion, the existing expansion scheme mainly considers how to reduce the overhead in the data migration process during design, and rarely considers the influence of user access behavior on expansion.
However, the access behavior of the user has a great influence on the capacity expansion process. The first, intensive access behavior may cause increased disk load; secondly, the contention of the user access request and the migration request for the magnetic head of the magnetic disk can cause a large amount of random reading and writing; thirdly, the user access request and the migration request have repeated write operations on one data block. These reasons all lead to the reduction of the capacity expansion efficiency when the capacity expansion scheme carries out online capacity expansion.
Therefore, it is an urgent technical problem to provide an online capacity expansion IO scheduling method based on hot data and delete repeat operations.
Disclosure of Invention
In view of the above problems, an object of the present invention is to provide an online capacity expansion IO scheduling method based on hot data and delete repeat operations. The method comprises the following steps:
initializing a storage system;
monitoring user access requests over a period of time;
when the user access request is write operation, comparing the physical address of the user access request with the destination address of the data block to be migrated, if the ranges of the physical address of the user access request and the destination address of the data block to be migrated are completely consistent, modifying the destination address of the write operation, and modifying the destination address of the write operation into the destination address of the data block; if the ranges of the two are partially consistent, reading the residual missed data of the data block to be migrated and splicing the data into the content of the write operation, and modifying the address of the write operation into the target address of the data block to be migrated;
determining a hot data space with the maximum counter value in the one-time period, and then generating a migration request corresponding to a data block needing to be migrated in the hot data space;
sequencing the user access request and the migration request;
and executing the sequenced user access request and the migration request.
In an embodiment, the step of initializing the storage system comprises: partitioning a thermal data space with integer multiples of a memory stripe of a capacity expansion scheme, the thermal data space comprising: physical address information, size of physical space, information of data blocks that the space needs to migrate, and counters.
In an embodiment, the step of monitoring user access requests over a period of time comprises: analyzing the type of the user access request and the accessed physical address position item by item; and calculating the thermal data space to which the physical address is accessed according to the accessed physical address position, and adding 1 to a counter of the thermal data space.
In an embodiment, further comprising the step of: analyzing the user access request and the migration request with the same operation content, and directly modifying the target position of the user write operation execution to the target position of the data block needing to be migrated if the user access request and the migration request are the same as each other in a complete way; if the data block content is partially identical, namely a part of the data block content is hit, a new read request is generated, the data content which is not hit by the user write operation in the data block is read out and spliced in the content of the user write operation, and the target position of the user write operation is also modified into the target position of the block needing to be migrated.
In an embodiment, further comprising the step of: and releasing the processed user write operation request to a time period to wait for the next operation, sending a deletion request to the hot data space, and deleting the migration operation request with the same content as the user write operation from the data block information needing to be migrated in the hot data space.
The invention discloses an online capacity expansion IO scheduling method based on hot data and delete repeat operation, which is used for optimizing the efficiency of an online capacity expansion process. The first is to preferentially migrate the hot data interval frequently accessed by the user, because the capacity expansion algorithm migrates according to the sequence from the head to the tail of the disk, if the user frequently accesses the area of the disk close to the tail, a large amount of jumps are generated when the disk head respectively executes the migration operation and the user access operation, and a large amount of time is consumed. The data blocks in the hot data area are firstly transferred to the new disk, so that the access pressure of the old disk at the moment can be reduced, meanwhile, the new disk is added with the service for the user access request, the bandwidth of the storage system is increased, and the efficiency of online capacity expansion can be obviously improved. On the basis, the write operation of the user is further analyzed, so that the write operation request is reduced once, and when the block accessed by the write operation of the user is the same as the block to be migrated by the migration operation, the write operation only needs to be executed once. All access requests are sequenced before all operations are performed, further reducing random read and write behavior between these accesses. Meanwhile, all the operations are performed within a set time period in consideration of timeliness of thermal data.
Compared with the scheduling measurement in the prior art, the scheduling method of the invention has the following advantages:
(1) the bandwidth of the storage system is more fully utilized: the data blocks of the thermal data area with timeliness are preferentially migrated from the old disk to the new disk, so that the pressure of the old disk for responding to the access request of the user is relieved, the newly added disk is added into the service for responding to the access request of the user most quickly, the bandwidth of the storage system is quickly improved, and the load balance of the storage system is achieved.
(2) Fewer write operations: write operations are the most time consuming in the capacity expansion algorithm because write operations cause update operations of the parity block. According to the strategy, by comparing the contents of the user access request and the migration request, partial repeated write operation behaviors are reduced, and the efficiency in the online migration process is greatly improved.
(3) Faster user response time: the strategy rapidly increases the bandwidth of the storage system responding to the user request through the prior migration of the hot data; random reading and writing between the user access request and the migration IO are reduced through sequential reading and writing sequencing; through the redirection of the write request, partial write request operations are reduced, and further the resource contention condition between the user access request and the migration IO is reduced. All three points above provide a faster response time for the user.
Specific embodiments of the present invention are disclosed in detail with reference to the following description and drawings, indicating the manner in which the principles of the invention may be employed. It should be understood that the embodiments of the invention are not so limited in scope. The embodiments of the invention include many variations, modifications and equivalents within the spirit and scope of the appended claims.
Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments, in combination with or instead of the features of the other embodiments.
It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, integers, steps or components but does not preclude the presence or addition of one or more other features, integers, steps or components.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
FIG. 1 is a flowchart of an online capacity expansion IO scheduling policy based on hot data and delete repeat operations according to the present invention;
FIG. 2 is a diagram illustrating an online capacity expansion IO scheduling policy architecture based on hot data and delete repeat operations according to the present invention.
Detailed Description
The technical solutions of the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments, it should be understood that these embodiments are merely illustrative of the present invention and are not intended to limit the scope of the present invention, and various equivalent modifications of the present invention by those skilled in the art after reading the present invention fall within the scope of the appended claims.
The invention provides an online capacity expansion IO scheduling method based on hot data and delete repeat operation, which comprises the following steps:
initializing a storage system;
monitoring user access requests over a period of time;
when the user access request is write operation, comparing the physical address of the user access request with the destination address of the data block to be migrated, if the ranges of the physical address of the user access request and the destination address of the data block to be migrated are completely consistent, modifying the destination address of the write operation, and modifying the destination address of the write operation into the destination address of the data block; if the ranges of the two are partially consistent, reading the residual missed data of the data block to be migrated and splicing the data into the content of the write operation, and modifying the address of the write operation into the target address of the data block to be migrated;
determining a hot data space with the maximum counter value in the one-time period, and then generating a migration request corresponding to a data block needing to be migrated in the hot data space;
sequencing the user access request and the migration request;
and executing the sequenced user access request and the migration request.
With reference to fig. 1 and fig. 2, a method for online capacity expansion IO scheduling based on hot data and delete repeat operation is further described, where the method may include the following implementation steps:
(1) initializing a storage system, dividing a new hot data Space (Space) according to the layout of a storage stripe designed by a capacity expansion algorithm, and setting the initial physical position of the hot data Space, the related information of a data block to be migrated and the related initialization parameter setting of a counter (count).
(2) Monitoring the access request of the user within a period of time, and analyzing and processing the access request of the user by a RequestDeal module; mainly analyzes the access type and the access address of the user.
(3) And the RequestDeal module calculates the user access address to obtain a thermal data space to which the target position accessed by the user belongs, and adds 1 to a counter of the thermal data space.
(4) The RequestDeal module judges the type of user access, and if the request is a read request, the request is released to a time period module (TimePart) to wait for the next processing. If the data block is the write request, judging the accessed content range, judging whether the content of the data block operated by the data block is the same as the content of the data block needing to be migrated, and if the content of the data block is not the same as the content of the data block needing to be migrated, releasing the write request to a time period module to wait for the next processing; if so, the write request is sent to a write redirect (WriteRedirect) module for processing.
(5) The WriteRedirect module carries out specific analysis on the user request and the migration request with the same operation content, and if the user request and the migration request are the same as each other completely for the content of one data block, the target position of the execution of the user writing operation is directly modified into the target position of the data block needing to be migrated. If the data block is partially identical, namely the content of a partial data block is hit, the WtireRedirect generates a new read request, reads out the data content which is not hit by the user write operation in the data block, splices the data content in the user write operation content, and modifies the target position of the user write operation into the target position of the block to be migrated.
(6) And releasing the user write operation request processed by the WriteRedirect to a time period (TimePart) to wait for the next operation, simultaneously sending a deletion request to the hot data Space (Space), and deleting the migration operation request with the same content as the user write operation from the data block information needing to be migrated in the hot data Space.
(7) The method comprises the steps that a time period (TimePart) receives requests (including user access requests and migration requests) from requestDeal, WriteRedirect and hot data space passing, the requests are subjected to unified sequential ordering processing, a SortRequest module analyzes target physical position information of the requests, the requests with similar physical positions are ordered together in sequence as much as possible, and the purpose of sequential reading and writing is achieved.
(8) The SortRequest puts the sorted requests into a Sort _ T _ array set, and IO operations are sent to the storage system one by one according to the sequence in the Sort _ T _ array set.
(9) After the Sort of the Sort _ T _ array set is completed, the time period module sends a counter reset request to the hot data space.
(10) And (5) after the resetting is completed, starting the next round of IO dispatching operation according to the steps (1) to (10).
The invention discloses an IO scheduling method which is used for optimizing the efficiency of an HS6 online capacity expansion process. The first is to preferentially migrate the hot data interval frequently accessed by the user, because the HS6 capacity expansion algorithm migrates from the head to the tail of the disk, if the user frequently accesses the area of the disk near the tail, the head of the disk will jump a lot when performing the migration operation and the user access operation, respectively, consuming a lot of time. The data blocks in the hot data area are firstly transferred to the new disk, so that the access pressure of the old disk at the moment can be reduced, meanwhile, the new disk is added with the service for the user access request, the bandwidth of the storage system is increased, and the efficiency of online capacity expansion can be obviously improved. On the basis, the write operation of the user is further analyzed, so that the write operation request is reduced once, and when the block accessed by the write operation of the user is the same as the block to be migrated by the migration operation, the write operation only needs to be executed once. All access requests are sequenced before all operations are performed, further reducing random read and write behavior between these accesses. Meanwhile, all the operations are performed within a set time period in consideration of timeliness of thermal data.
Compared with the scheduling measurement in the prior art, the scheduling method of the invention has the following advantages:
(1) the bandwidth of the storage system is more fully utilized: the data blocks of the hot data area with timeliness are preferentially migrated from the old disk to the new disk, so that the pressure of the old disk on responding to the user access request is relieved, and the newly added disk is added into the service responding to the user access request at the fastest speed.
(2) Fewer write operations: write operations are the most time consuming of the HS6 capacity expansion algorithm because write operations cause parity block update operations. By comparing the contents of the user access request and the migration request, the method reduces partial repeated write operation behaviors and greatly improves the efficiency in the online migration process.
(3) Faster user response time: the method rapidly increases the bandwidth of the storage system responding to the user request through the prior migration of the hot data; random reading and writing between the user access request and the migration IO are reduced through sequential reading and writing sequencing; through the redirection of the write request, partial write request operations are reduced, and further the resource contention condition between the user access request and the migration IO is reduced. All three points above provide a faster response time for the user.
It is to be understood that the above description is intended to be illustrative, and not restrictive. Many embodiments and many applications other than the examples provided will be apparent to those of skill in the art upon reading the above description. The scope of the present teachings should, therefore, be determined not with reference to the above description, but should instead be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled. The disclosures of all articles and references, including patent applications and publications, are hereby incorporated by reference for all purposes. The omission in the following claims of any aspect of subject matter that is disclosed herein is not intended to forego such subject matter, nor should the inventors be construed as having contemplated such subject matter as being part of the disclosed inventive subject matter.