A proposed approach to compound file fragment identification

Research output: A Conference proceeding or a Chapter in BookConference contribution

1 Citation (Scopus)

Abstract

One of the biggest challenges in file fragment classification is the low classification rate of compound files known as high entropy files that contain different types of data, such as images and compressed text. It is seen that current methods for file fragment classification may not work for classifying these compound files. In this paper we propose a novel approach based on detecting deflate-encoded data in compound file fragments then decompress that data before applying a machine learning technique for classification. We apply our proposed method to classify Adobe portable document format (PDF) file type. Experiments showed high classification rate for the proposed method.
Original languageEnglish
Title of host publicationInternational Conference on Network and System Security
Subtitle of host publication8th International Conference, NSS 2014 Xi’an, China, October 15-17, 2014 Proceedings
EditorsMan Ho Au, Barbara Carminati, C.-C Jay Kuo
Place of PublicationCham, Switzerland
PublisherSpringer
Pages493-500
Number of pages8
Volume8792
ISBN (Electronic)9783319116983
ISBN (Print)9783319116976
DOIs
Publication statusPublished - 2014
Event8th International Conference, Network and System Security 2014 - Xian, Xian, China
Duration: 15 Oct 201417 Oct 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8792
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference8th International Conference, Network and System Security 2014
CountryChina
CityXian
Period15/10/1417/10/14

Fingerprint

Learning systems
Entropy
Experiments

Cite this

TRAN, D., MA, W., & SHARMA, D. (2014). A proposed approach to compound file fragment identification. In M. H. Au, B. Carminati, & C. -C. J. Kuo (Eds.), International Conference on Network and System Security: 8th International Conference, NSS 2014 Xi’an, China, October 15-17, 2014 Proceedings (Vol. 8792, pp. 493-500). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8792). Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-319-11698-3_38
TRAN, Dat ; MA, Wanli ; SHARMA, Dharmendra. / A proposed approach to compound file fragment identification. International Conference on Network and System Security: 8th International Conference, NSS 2014 Xi’an, China, October 15-17, 2014 Proceedings. editor / Man Ho Au ; Barbara Carminati ; C.-C Jay Kuo. Vol. 8792 Cham, Switzerland : Springer, 2014. pp. 493-500 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{666a4a6ed9b24e6bac0574f3fc3b3bb7,
title = "A proposed approach to compound file fragment identification",
abstract = "One of the biggest challenges in file fragment classification is the low classification rate of compound files known as high entropy files that contain different types of data, such as images and compressed text. It is seen that current methods for file fragment classification may not work for classifying these compound files. In this paper we propose a novel approach based on detecting deflate-encoded data in compound file fragments then decompress that data before applying a machine learning technique for classification. We apply our proposed method to classify Adobe portable document format (PDF) file type. Experiments showed high classification rate for the proposed method.",
keywords = "Compound file fragment classification, Digital forensics, File type classification, Network forensics",
author = "Dat TRAN and Wanli MA and Dharmendra SHARMA",
year = "2014",
doi = "10.1007/978-3-319-11698-3_38",
language = "English",
isbn = "9783319116976",
volume = "8792",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer",
pages = "493--500",
editor = "Au, {Man Ho} and Barbara Carminati and Kuo, {C.-C Jay}",
booktitle = "International Conference on Network and System Security",
address = "Netherlands",

}

TRAN, D, MA, W & SHARMA, D 2014, A proposed approach to compound file fragment identification. in MH Au, B Carminati & C-CJ Kuo (eds), International Conference on Network and System Security: 8th International Conference, NSS 2014 Xi’an, China, October 15-17, 2014 Proceedings. vol. 8792, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8792, Springer, Cham, Switzerland, pp. 493-500, 8th International Conference, Network and System Security 2014, Xian, China, 15/10/14. https://doi.org/10.1007/978-3-319-11698-3_38

A proposed approach to compound file fragment identification. / TRAN, Dat; MA, Wanli; SHARMA, Dharmendra.

International Conference on Network and System Security: 8th International Conference, NSS 2014 Xi’an, China, October 15-17, 2014 Proceedings. ed. / Man Ho Au; Barbara Carminati; C.-C Jay Kuo. Vol. 8792 Cham, Switzerland : Springer, 2014. p. 493-500 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8792).

Research output: A Conference proceeding or a Chapter in BookConference contribution

TY - GEN

T1 - A proposed approach to compound file fragment identification

AU - TRAN, Dat

AU - MA, Wanli

AU - SHARMA, Dharmendra

PY - 2014

Y1 - 2014

N2 - One of the biggest challenges in file fragment classification is the low classification rate of compound files known as high entropy files that contain different types of data, such as images and compressed text. It is seen that current methods for file fragment classification may not work for classifying these compound files. In this paper we propose a novel approach based on detecting deflate-encoded data in compound file fragments then decompress that data before applying a machine learning technique for classification. We apply our proposed method to classify Adobe portable document format (PDF) file type. Experiments showed high classification rate for the proposed method.

AB - One of the biggest challenges in file fragment classification is the low classification rate of compound files known as high entropy files that contain different types of data, such as images and compressed text. It is seen that current methods for file fragment classification may not work for classifying these compound files. In this paper we propose a novel approach based on detecting deflate-encoded data in compound file fragments then decompress that data before applying a machine learning technique for classification. We apply our proposed method to classify Adobe portable document format (PDF) file type. Experiments showed high classification rate for the proposed method.

KW - Compound file fragment classification

KW - Digital forensics

KW - File type classification

KW - Network forensics

UR - http://www.scopus.com/inward/record.url?scp=84908680717&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-11698-3_38

DO - 10.1007/978-3-319-11698-3_38

M3 - Conference contribution

SN - 9783319116976

VL - 8792

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 493

EP - 500

BT - International Conference on Network and System Security

A2 - Au, Man Ho

A2 - Carminati, Barbara

A2 - Kuo, C.-C Jay

PB - Springer

CY - Cham, Switzerland

ER -

TRAN D, MA W, SHARMA D. A proposed approach to compound file fragment identification. In Au MH, Carminati B, Kuo C-CJ, editors, International Conference on Network and System Security: 8th International Conference, NSS 2014 Xi’an, China, October 15-17, 2014 Proceedings. Vol. 8792. Cham, Switzerland: Springer. 2014. p. 493-500. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-11698-3_38