Decision tree algorithms for image data type identification

Research output: Contribution to journalArticle

Abstract

Identifying file type of file fragments has been investigated for a long time but it is still a challenge. It is found in the literature that high-entropy file fragments make the problem more complicated. Especially, existing popular file types share same compression algorithms such as deflate algorithm that causes file type identification for file fragment become harder. Applying machine learning or empirical techniques is to deal with this problem. Compression algorithms are used to reduce the size of files that have big data size and include image files. Many research work of file type identification have been done for JPEG format, and the Rate of Change feature is proven to work effectively for it. Conversely, few efforts have been made for PNG although this is a popular image format and widely used nowadays. In this article, we propose a new approach based on the deflate-encoded data detection, entropy-based clustering, and decision tree techniques to identify PNG data fragments which are the deflate-encoded fragments. Experiments showed high accuracy rates for the proposed method.
Original languageEnglish
Pages (from-to)67-82
Number of pages16
JournalInterest Group in Pure and Applied Logics. Logic Journal
Volume25
Issue number1
DOIs
Publication statusPublished - 2017

Fingerprint Dive into the research topics of 'Decision tree algorithms for image data type identification'. Together they form a unique fingerprint.

  • Cite this