TY - JOUR
T1 - Decision tree algorithms for image data type identification
AU - NGUYEN, Khoa
AU - TRAN, Dat
AU - MA, Wanli
AU - SHARMA, Dharmendra
PY - 2017/2
Y1 - 2017/2
N2 - Identifying file type of file fragments has been investigated for a long time but it is still a challenge. It is found in the literature that high-entropy file fragments make the problem more complicated. Especially, existing popular file types share same compression algorithms such as deflate algorithm that causes file type identification for file fragment become harder. Applying machine learning or empirical techniques is to deal with this problem. Compression algorithms are used to reduce the size of files that have big data size and include image files. Many research work of file type identification have been done for JPEG format, and the Rate of Change feature is proven to work effectively for it. Conversely, few efforts have been made for PNG although this is a popular image format and widely used nowadays. In this article, we propose a new approach based on the deflate-encoded data detection, entropy-based clustering, and decision tree techniques to identify PNG data fragments which are the deflate-encoded fragments. Experiments showed high accuracy rates for the proposed method.
AB - Identifying file type of file fragments has been investigated for a long time but it is still a challenge. It is found in the literature that high-entropy file fragments make the problem more complicated. Especially, existing popular file types share same compression algorithms such as deflate algorithm that causes file type identification for file fragment become harder. Applying machine learning or empirical techniques is to deal with this problem. Compression algorithms are used to reduce the size of files that have big data size and include image files. Many research work of file type identification have been done for JPEG format, and the Rate of Change feature is proven to work effectively for it. Conversely, few efforts have been made for PNG although this is a popular image format and widely used nowadays. In this article, we propose a new approach based on the deflate-encoded data detection, entropy-based clustering, and decision tree techniques to identify PNG data fragments which are the deflate-encoded fragments. Experiments showed high accuracy rates for the proposed method.
KW - Decision tree algorithm
KW - File fragment identification
KW - Image data type identification
KW - PNG
KW - SVM
KW - Shannon entropy
UR - http://www.scopus.com/inward/record.url?scp=85014717038&partnerID=8YFLogxK
UR - http://www.mendeley.com/research/decision-tree-algorithms-image-data-type-identification
U2 - 10.1093/jigpal/jzw045
DO - 10.1093/jigpal/jzw045
M3 - Article
SN - 1367-0751
VL - 25
SP - 67
EP - 82
JO - Interest Group in Pure and Applied Logics. Logic Journal
JF - Interest Group in Pure and Applied Logics. Logic Journal
IS - 1
ER -