Identifying file type of file fragments has been investigated for a long time but it is still a challenge. It is found in the literature that high-entropy file fragments make the problem more complicated. Especially, existing popular file types share same compression algorithms such as deflate algorithm that causes file type identification for file fragment become harder. Applying machine learning or empirical techniques is to deal with this problem. Compression algorithms are used to reduce the size of files that have big data size and include image files. Many research work of file type identification have been done for JPEG format, and the Rate of Change feature is proven to work effectively for it. Conversely, few efforts have been made for PNG although this is a popular image format and widely used nowadays. In this article, we propose a new approach based on the deflate-encoded data detection, entropy-based clustering, and decision tree techniques to identify PNG data fragments which are the deflate-encoded fragments. Experiments showed high accuracy rates for the proposed method.
|Number of pages||16|
|Journal||Interest Group in Pure and Applied Logics. Logic Journal|
|Publication status||Published - Feb 2017|