Management of time series data

  • Abel Matus Castillejos

    Student thesis: Professional Doctorate

    Abstract

    Every day large volumes of data are collected in the form of time series. Time series are collections of events or observations, predominantly numeric in nature, sequentially recorded on a regular or irregular time basis. Time series are becoming increasingly important in nearly every organisation and industry, including banking, finance, telecommunication, and transportation. Banking institutions, for instance, rely on the analysis of time series for forecasting economic indices, elaborating financial market models, and registering international trade operations. More and more time series are being used in this type of investigation and becoming a valuable resource in today's organisations. This thesis investigates and proposes solutions to some current and important issues in time series data management (TSDM),using Design Science Research Methodology. The thesis presents new models for mapping time series data to relational databases which optimise the use of disk space, can handle different time granularities, status attributes, and facilitate time series data manipulation in a commercial Relational Database Management System (RDBMS). These new models provide a good solution for current time series database applications with RDBMS and are tested with a case study and prototype with financial time series information. Also included is a temporal data model for illustrating time series data lifetime behaviour based on a new set of time dimensions (confidentiality, definitiveness, validity, and maturity times) specially targeted to manage time series data which are introduced to correctly represent the different status of time series data in a timeline. The proposed temporal data model gives a clear and accurate picture of the time series data lifecycle. Formal definitions of these time series dimensions are also presented. In addition, a time series grouping mechanism in an extensible commercial relational database system is defined, illustrated, and justified. The extension consists of a new data type and its corresponding rich set of routines that support modelling and operating time series information within a higher level of abstraction. It extends the capability of the database server to organise and manipulate time series into groups. Thus, this thesis presents a new data type that is referred to as GroupTimeSeries, and its corresponding architecture and support functions and operations. Implementation options for the GroupTimeSeries data type in relational based technologies are also presented. Finally, a framework for TSDM with enough expressiveness of the main requirements of time series application and the management of that data is defined. The framework aims at providing initial domain know-how and requirements of time series data management, avoiding the impracticability of designing a TSDM system on paper from scratch. Many aspects of time series applications including the way time series data are organised at the conceptual level are addressed. The central abstraction for the proposed domain specific framework is the notions of business sections, group of time series, and time series itself. The framework integrates comprehensive specification regarding structural and functional aspects for time series data management. A formal framework specification using conceptual graphs is also explored.
    Date of Award2006
    Original languageEnglish
    SupervisorMasoud MOHAMMADIAN (Supervisor) & Dharmendra Sharma AM PhD (Supervisor)

    Cite this

    '