TY - JOUR
T1 - An Efficient-Assembler Whale Optimization Algorithm for DNA Fragment Assembly Problem: Analysis and Validations
AU - Abdel-basset, Mohamed
AU - Mohamed, Reda
AU - Sallam, Karam M.
AU - Chakrabortty, Ripon K.
AU - Ryan, Michael J.
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2020
Y1 - 2020
N2 - The study of deoxyribonucleic acid (DNA) is crucial in many fields, including medicine, biology, zoology, agriculture, and forensics. Since reading a DNA sequence is onerous because of its massive length, it is common in many DNA analysis applications to divide DNA strands into small segments or fragments which, after analysis, must be reassembled. Since this reassembly takes a non-specific polynomial time to solve, the DNA fragment assembly problem (DFAP) is NP-hard. This paper proposes a new assembler for tackling the DFAP based on the overlap-layout-consensus (OLC) approach. The proposed assembler adapts a discrete whale optimization algorithm (DWOA) using standard operators adopted from evolutionary algorithms to simulate the strategy adopted by humpback whales when searching for prey. For the first time, we formulate the behaviors of whales to be applied directly to any discrete optimization problem based on three primary operations: a swap-based best-position operator, an ordered crossover operator, and selection of a random whale operation to perform the exploitation and exploration phases of the algorithm. These operations were carefully designed to preserve the methodology of the original whale algorithm. DFAP is a multi-objective problem that seeks to reach the optimal order of segments that maximizes the overlap score and minimizes the number of contigs (set of overlapping DNA segments) to compose a one-contig DNA strand. Existing local search methods, such as problem aware local search (PALS) many non-conflicting movements (PALS2-many), suffer from being trapped in local optima. Hence, the integration of DWOA with PALS2-many improves the search capability for finding the optimal order of fragments. In addition, we propose a new variation of PALS2-many that achieves simultaneously the two objectives of DFAP. Our proposed DWOA was compared with a number of the most recent robust assemblers: a hybrid crow search algorithm for solving the DFAP (CSA-P2M*Fit), P2M*Fit, and a hybrid genetic algorithm (GA-P2M*Fit). The experimental results and statistical analyses of the proposed DWOA on thirty benchmark instances show that DWOA significantly outperforms those algorithms in reaching fewer contigs, in addition to being competitive with CSA-P2M*Fit and superior to P2M*Fit and GA-P2M*Fit for the overlap score.
AB - The study of deoxyribonucleic acid (DNA) is crucial in many fields, including medicine, biology, zoology, agriculture, and forensics. Since reading a DNA sequence is onerous because of its massive length, it is common in many DNA analysis applications to divide DNA strands into small segments or fragments which, after analysis, must be reassembled. Since this reassembly takes a non-specific polynomial time to solve, the DNA fragment assembly problem (DFAP) is NP-hard. This paper proposes a new assembler for tackling the DFAP based on the overlap-layout-consensus (OLC) approach. The proposed assembler adapts a discrete whale optimization algorithm (DWOA) using standard operators adopted from evolutionary algorithms to simulate the strategy adopted by humpback whales when searching for prey. For the first time, we formulate the behaviors of whales to be applied directly to any discrete optimization problem based on three primary operations: a swap-based best-position operator, an ordered crossover operator, and selection of a random whale operation to perform the exploitation and exploration phases of the algorithm. These operations were carefully designed to preserve the methodology of the original whale algorithm. DFAP is a multi-objective problem that seeks to reach the optimal order of segments that maximizes the overlap score and minimizes the number of contigs (set of overlapping DNA segments) to compose a one-contig DNA strand. Existing local search methods, such as problem aware local search (PALS) many non-conflicting movements (PALS2-many), suffer from being trapped in local optima. Hence, the integration of DWOA with PALS2-many improves the search capability for finding the optimal order of fragments. In addition, we propose a new variation of PALS2-many that achieves simultaneously the two objectives of DFAP. Our proposed DWOA was compared with a number of the most recent robust assemblers: a hybrid crow search algorithm for solving the DFAP (CSA-P2M*Fit), P2M*Fit, and a hybrid genetic algorithm (GA-P2M*Fit). The experimental results and statistical analyses of the proposed DWOA on thirty benchmark instances show that DWOA significantly outperforms those algorithms in reaching fewer contigs, in addition to being competitive with CSA-P2M*Fit and superior to P2M*Fit and GA-P2M*Fit for the overlap score.
KW - DNA fragments assembly problem
KW - DNA sequence
KW - overlap-layout-consensus
KW - whale optimization algorithm
UR - http://www.scopus.com/inward/record.url?scp=85098285365&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2020.3044857
DO - 10.1109/ACCESS.2020.3044857
M3 - Article
SN - 2169-3536
VL - 8
SP - 222144
EP - 222167
JO - IEEE Access
JF - IEEE Access
ER -