Shaping datasets: Optimal data selection for specific target distributions across dimensions

Vassilios Vonikakis, Ramanathan Subramanian, Stefan Winkler

Research output: A Conference proceeding or a Chapter in BookConference contributionpeer-review

11 Citations (Scopus)

Abstract

This paper presents a method for dataset manipulation based on Mixed Integer Linear Programming (MILP). The proposed optimization can narrow down a dataset to a particular size, while enforcing specific distributions across different dimensions. It essentially leverages the redundancies of an initial dataset in order to generate more compact versions of it, with a specific target distribution across each dimension. If the desired target distribution is uniform, then the effect is balancing: all values across all different dimensions are equally represented. Other types of target distributions can also be specified, depending on the nature of the problem. The proposed approach may be used in machine learning, for shaping training and testing datasets, or in crowdsourcing, for preparing datasets of a manageable size.

Original languageEnglish
Title of host publication2016 IEEE International Conference on Image Processing, ICIP 2016 - Proceedings
EditorsFernando Pereira, Gaurav Sharma
Place of PublicationUnited States
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages3753-3757
Number of pages5
ISBN (Electronic)9781467399616
DOIs
Publication statusPublished - 3 Aug 2016
Externally publishedYes
Event23rd IEEE International Conference on Image Processing, ICIP 2016 - Phoenix, United States
Duration: 25 Sept 201628 Sept 2016

Publication series

NameProceedings - International Conference on Image Processing, ICIP
Volume2016-August
ISSN (Print)1522-4880

Conference

Conference23rd IEEE International Conference on Image Processing, ICIP 2016
Country/TerritoryUnited States
CityPhoenix
Period25/09/1628/09/16

Fingerprint

Dive into the research topics of 'Shaping datasets: Optimal data selection for specific target distributions across dimensions'. Together they form a unique fingerprint.

Cite this