Random Sampling as a Clutter Reduction Technique to Facilitate Interactive Visualisation of Large Datasets

Within our physical world lies a digital world populated with an ever increasing number of sizeable data collections. Exploring these large datasets for patterns or trends is a difficult and complex task, especially when users do not always know what they are looking for. Information visualisation can facilitate this task through an interactive visual representation, thus making the data easier to interpret. However, we can soon reach a limit on the amount of data that can be plotted before the visual display becomes overcrowded or cluttered, hence potentially important information becomes hidden.

The main theme of this work is to investigate the use of dynamic random sampling for reducing display clutter.

Although randomness has been successfully applied in many areas of computer science and sampling has been used in data processing, the use of random sampling as a dynamic clutter reduction technique is novel. In addition, random sampling is particularly suitable for exploratory tasks as it offers a way of reducing the amount of data without the user having to decide what data is important. Sampling-based scatterplot and parallel coordinate visualisations are developed to experiment with various options and tools. These include simple, dynamic sampling controls with density feedback; a method of checking the reality of the representative sample; the option of global and/or localised clutter reduction

using a variety of novel lenses and an auto-sampling option of automatically maintaining a reasonable view of the data within the lens. Furthermore, this work showed that sampling can be added to existing tools and used effectively in conjunction with other clutter reduction techniques.

Sampling is evaluated both analytically, using a taxonomy of clutter reduction developed for the purpose, and experimentally using large datasets. The analytic route was prompted by an exploratory analysis, which showed that evaluation of information visualisation based on user studies are problematic.

Novel characteristics of the work

  1. -Prior to the preliminary papers [Dix and Ellis 02, Ellis and Dix 02] there was no dynamic control for sampling within information visualisations, hence the application of random sampling to clutter reduction is innovative. This has led to a systematic exploration and analysis of the highly scalable sampling technique, in particular for scatterplot and parallel coordinate plots.

  2. -An effective approach to generating random samples, the z-index method (Sections 2.2 and 4.1), has been devised that not only ensures the display continuity necessary for interactive clutter reduction but provides a means for the user to check the reality of perceived artefacts.

  3. -Sampling has also been applied as a lens-based focus+context technique with automatic sampling rate adjustment. A noteworthy aspect of this work has been the creation of a metric and accompanying method for calculating occlusion in parallel coordinate plots (Chapter 5). Moreover, the metric has a strong theoretical underpinning as evidenced by a theoretical model developed by the author and supervisor.

  4. -Although the classification of clutter reduction techniques was initially devised as a method for evaluating the sampling-based approach, its scope broadened. The outcome is the Clutter-reduction Taxonomy (Section 3.4), a novel criteria-based taxonomy of clutter reduction for information visualisation, which visualisation designers can use to critique existing visualisations and inform new ones. The literature-based analytical approach used in the construction of the taxonomy is also a novel feature of this work.

Abstract

Structure of the thesis

Chapter 1 looks at the problems of visualising large datasets and some of the approaches that have been tried. The use of randomness in computer science and data processing is then considered. Having exposed the main issues, the objectives of the work are stated, followed by an outline of the approach taken. The novel characteristics of the work are then described, together with the contribution that have been made to the research area. Finally, the structure of the thesis is outlined.

Chapter 2 investigates the use of random sampling to reduce clutter in overcrowded displays. Issues pertinent to sampling are raised by considering sampling-based star gazing and a solution to many of these issues is proposed through the z-index method. Three visualisations that use sampling in different ways are then discussed. Relevant statistical sampling methods are examined before considering current database support for sampling. The sampling-based visualisation proposed in this chapter leads to the design, implementation and experimental phase of this work that begins in Chapter 4.

Chapter 3 presents the Clutter-reduction Taxonomy for information visualisation describing the novel method used in its construction. A review of classification schemes for information visualisation is presented, highlighting two schemes that relate specifically to clutter reduction. The clutter reduction techniques and criteria used in the taxonomy are described and the taxonomy table and accompanying discussion notes are presented. The utility of the taxonomy is demonstrated through several examples of its use to critique existing visualisations and propose new ones. The taxonomy is compared to the two clutter reduction classifications and the importance of criteria is illustrated.

Chapter 4 documents the development of the first sampling-based scatterplot and parallel coordinate visualisations. The effectiveness of sampling in dynamic cutter reduction is demonstrated and a comparison is made to three other techniques – change opacity, change point size and filtering. A lens-based sampling visualisation is implemented to provide a focus+context solution to large overplotting density variations across a plot. The z-index method for generating lens samples is successfully adapted for a lens and also for re-sampling. The requirement for automatic adjustment of the lens sampling rate is identified and its pursuit and resolution is described in Chapter 5.

Chapter 5 describes the work undertaken to facilitate an auto-sampling lens for parallel coordinates. An occlusion metric is first defined and justified through a series of practical experiments and a theoretical model. Three very different methods for calculating occlusion are then assessed for accuracy and efficiency by means of an extensive empirical study. The proposed solution is both very efficient and accurate, which is counterintuitive given its theoretical underpinnings.

Chapter 6 presents an evaluation of sampling and consider the use sampling with different visualisations. We reflect on the difficulties of undertaking effective user studies of information visualisations and assess the objectivity of criteria-based evaluation of sampling. Sampling is compared to other clutter reduction techniques using the Clutter-reduction Taxonomy (Chapter 3). Further exploration of scatterplot visualisations using sampling consolidate the use of global and lens-based sampling, and the constant-density interface proposed in Chapter 2. This chapter also reflects on the Sampling Lens application, the lessons learnt about sampling as a clutter reduction technique and explores the integration of sampling into visualisations other than the scatterplot and parallel coordinates.

Chapter 7 reflects on the main issues raised by each chapter and their resolution or outcomes. It also summarises how the objectives of this work were met and their outcomes are noted. Finally, some future directions of sampling are considered with suggestions for further work.

Appendix A presents examples of the clutter reduction techniques used in the Clutter-reduction Taxonomy. The focus is on how each technique manipulates attributes such as position, visual, association and temporal to reduce display clutter.

Appendix B describes the datasets used in this work.

Appendix C presents details of the experiments in Chapter 5.

Appendix D examines various issues related to the implementation of the sampling-based visualisation, including the software instrumentation devised to carry out the empirical studies, an overview of the visualisation toolkit and the development of an OpenGL version of the Sampling Lens that improved the interactive performance considerably.

full table of contentsContents.html