Adaptive Multi-Sensor Fusion for Awareness

in Dynamic Environments

 

- First year report: web version 1.7* -

 

 

author: Kristof Van Laerhoven                                                                                    April 2003

supervisor: Prof. Dr. Hans-Werner Gellersen                                               Lancaster University

panel: Prof. Dr. Nigel Davies and Prof. Dr. Alan Dix                       LA1 4YR Bailrigg, Lancaster

http://www.comp.lancs.ac.uk/~kristof/                                                                United Kingdom

 

* Notice: this is a web-version that will not be updated after version 1.7

please refer to the pdf document for the very latest version.



Index

1. Structure of this report.. 3

1.1. Objectives: What?. 4

1.2. Foundation: Where?. 5

1.3. Motivation: Why?. 6

1.4. Methodology: How?. 7

2. Objectives.. 8

2.1. Requirements and Essential Characteristics. 8

2.2. Targets. 9

3. Foundation.. 10

3.1. Sensor Networks in Ubiquitous and Wearable Computing.. 10

3.1.1. The Ubiquitous Vision. 10

3.1.2. The Wearable Vision. 10

3.1.3. Ubiquitous Sensors. 11

3.2. Signal Processing and Sensor Fusion.. 15

3.2.1. Features. 16

Artificial Neural Networks. 16

Tracking and filtering algorithms. 16

Sensor/Feature Fusion. 17

3.2.2. Embedded Sources, Concepts and Phenomena. 18

Topographic Mapping. 18

Cocktail Party Problem. 19

Graphical Models and Bayesian Networks. 20

3.2.3. The Curse of Dimensionality. 21

3.3. Unsupervised Learning Techniques. 21

3.3.1. Clusters and Topological Maps. 21

3.3.2. Concept Drift/Shift and Hidden Contexts. 22

3.3.3. Incremental/Online Learning. 22

3.3.4. Catastrophic Forgetting, Memory Decay and Ageing. 23

3.3.5. Summary. 23

4. Motivation.. 24

4.1. Scenarios. 24

4.1.1. DrWhatsOn. 24

4.1.2. Ubicomp scenarios. 27

4.2. Potential Contributions to Multi-Modal Perception.. 28

4.3. Novel HCI Focus in Learning Algorithms. 29

5. Methodology.. 30

5.1. Approach.. 30

5.1.1. Multiple Embedded Sensors, Dynamically Recruited. 30

5.1.2. User-level context descriptions, minimal interaction. 32

5.2. Sub-Goals. 32

5.2.1. Sensors and Sensor Networks. 32

5.2.2. Pre-processing and sensor fusion. 33

5.2.3. Source Detection, Concept Creation and Mapping. 33

5.2.4. Annotation. 34

6. Work Plan.. 36

7. Conclusions.. 38

8. References.. 39



1. Structure of this report

This overview section briefly describes the proposed objectives, background, rationale, and methodology, using non-specific vocabulary to express the general idea. Subsequent chapters will elaborate on each of these four sections and introduce a more exact and appropriate terminology. After covering the methodology, a work plan will conclude this report with expected timeline and milestones.

Figure 1. Structure of this report.

The schematic above (Figure 1) shows the four main chapters that make up this report with a focus on the background, which in this case is a combination of three distinct disciplines: Wearable and ubiquitous computing, signal-processing-based machine learning, and incremental-learning-based algorithms. There is also a strong connection between the objectives and methodology sections, where the question on how the system should look like (using a list of requirements in objectives) is answered (in methodology) by specifying an approach.

1.1. Objectives: What?

The objective of the research described in this report is to explore and adapt self-learning computer systems that observe the environment with low-level sensors so they can learn, perceive, and anticipate what the user is doing. The technological aim is to build a bridge between the available sensing infrastructure, and applications in ubiquitous and wearable computing. The context acquisition can be thought of as an abstract sensor for detecting ‘context’ within a set of user situations for which the sensor is configured and trained.

This ability of an algorithm or application to calculate descriptions of the world around it, based on sensor data, has generally been referred to as context awareness, situational awareness, or sensor-based perception. The most correct terminology from a more formal perspective would probably comprise classification or recognition, but context acquisition will be used throughout this report to make the distinction between the proposed method and methodologies in other fields.

The Objectives chapter will be divided into two sections:

  • Requirements will try to describe the envisioned system in detail by giving requirements and listing the characteristics it is expected to have. These will re-appear in the Methodology chapter as each requirement or characteristic has immediate consequences for the context acquisition methods.

1.        The system’s output is a user-intelligible description of the current context.

2.        The system’s information can come from many different sensors, embedded in the user’s clothing or surroundings.

3.        The designer of the system is not expected to have an idea on the specific nature of the contexts.

4.        The user has to be able to re-train the system for contexts that are new, changed, or for contexts for which the system performs badly.

  • Targets details the expected contributions of the proposed research, forming a basis for a work plan that details what will be done. These are, in short:

1.    The main objective is to come up with a reusable software framework for low-cost sensor analysis and a toolkit-like collection of algorithms for sub-symbolic sensor fusion in low-end embedded systems. Simultaneously, custom-built hardware will support the acquisition of real-world data.

2.      Another goal is a number of prototypes of applications that make use of this service, and software architecture and guidelines for sensor integration at application level, to support deployment in wearable/ubiquitous systems, as well as to explore novel applications.

3.      To open up the research to other institutions by setting up a web-mediated database where datasets can be submitted/uploaded to facilitate comparison of algorithms and hardware platforms.

1.2. Foundation: Where?

The focus of this research is on making existing computer systems more aware of the task at hand, to adapt to new situations based on gained knowledge, and to allow them to become more familiar to the user. It is useful to discuss what has already been established, before going ahead with giving a motivation for this work or specifying a methodology. In short, this proposal is founded on observations from three diverse research areas:

1)      Huge sensor networks are becoming a reality. The prospect that both computers and sensors will be embeddable into almost anything, is widely accepted (and even partly realized) for computers, but has only now started to happen for sensors. This also means that the information that is generated by this multitude of objects could be routed and processed somehow through a network. This is a view thoroughly discussed and sought after in the fields of ubiquitous and wearable computing, which thrives on the assumption that computing elements, as well as sensors, will become woven into the fabric of everyday life.  

2)      Plenty modern algorithms exist to process and abstract their raw data. Most modern signal processing approaches, such as those based on connectionist (artificial neural networks) or (Bayesian) statistics views, deal with raw sensor data. Not only are processing techniques for a single sensor well covered, combining data from multiple (and different) sensors has been explored in detail as well. The main objective of these algorithms is to abstract the raw data into higher-level information that is more suitable for a certain application (e.g., speech processing).  

3)      Algorithms also exist to let them explore the data without supervision from the programmer. Especially the field of behavioural AI has applied these techniques to allow (robotic) agents to explore and learn concepts from their environment without help from the designers. This biology-inspired attitude not only boosted interest in modelling self-sufficient individuals, but also advanced research in unsupervised learning algorithms. Flexibility and emergence are key concepts here, presuming no properties of the environment are known before introduction in that environment.

Each observation comes from a research discipline which somehow links to one of the others: The ubiquitous and wearable sensor networks (1) need adaptive algorithms frameworks, the signal processing and sensor fusion community (2) needs compelling applications, and concept learning research (3) seeks real-world implementation.

In general, there is substantial background research done for abstracting sensor data to a higher level. However, utilising this for the acquisitions of context that are trainable and intelligible by a human user, and via ubiquitous platforms, is a wide open field where a lot of questions remain unanswered.

1.3. Motivation: Why?

This section of the proposal will provide answers as to why this research is useful and what its benefits are. The main reason why context acquisition is worthwhile is the assumption that systems do a better job when they can harvest information about their user and situations, but potential contributions to the disciplines of sensor fusion and HCI are achievable as well.

Providing contexts as a service to ubiquitous and wearable applications. Because the main purpose of the context acquisition system is to provide a service to other applications, a scenario-based listing of example applications in ubiquitous and wearable computing will be given in the Motivations chapter. The scenarios will be taken from work that other research in ubiquitous and wearable computing has created, to stress that the statement that this would be beneficial, indeed adds up. The purpose of these scenarios is thus not to mine for characteristics and requirements for the proposed framework, but to show that research into ubiquitous and wearable computing has come up in several independent cases with a desire for a context acquisition facility.

Potential contribution to multi-modal perception. The novelty of this proposal lies in the application and adaptation of signal processing and AI learning techniques in ubiquitous and wearable computing settings. The combination of diverse, multi-modal sensors for context acquisition has not been considered in perceptual computing research, but has particular application potential in mobile and embedded devices. The real-world, noisy nature of these settings, and the availability of distributed computational elements, make that it could give a fresh view on this data abstraction and sensor fusion.

The requirements for the proposed system position it on the border of signal processing techniques (having a particular application in mind, using sensor data and assessable via implementation and experiments) with bio-inspired modelling and behavioural AI, proposing massive parallelism in basic sensors, and complete encapsulation of the classification process of context.

Novel focus on HCI in learning algorithms. Most machine learning methods operate on a fixed set of data, or are trained with data that was chosen beforehand. Tweaking these algorithms is usually done by the designers of the system, and they lose their flexibility as soon as they get used (with the exception of certain limited algorithms that do self-calibration). Many of these adaptive algorithms behave like a black box as they are not transparent to the user. Neural networks are a typical example: their distributed internal representation of the input is hard to track, and there is generally no explanation given why it gives a particular result. It is therefore an HCI challenge to integrate these machine learning techniques in common environments, and make them transparent, usable, and trainable for human users. This would result in visualising certain internal states of the context acquisition algorithms, but also limits the fundamental choice of learning methods.

1.4. Methodology: How?

The approach will be largely defined by the list of characteristics and requirements in the Objectives section of this proposal, which will be reported on separately in the Methodology chapter. The proposed methodology contains four basic components in order to meet the requirements set out in the objectives section, each operating on a different abstraction level and with its distinctive needs for a work plan: 

1)   Sensors. The main source of information for the system is a large and diverse collection of hardware sensors, to make sure that the distinction can be made between as many situations or contexts as possible.

2)   Sensor-based pre-processing. Sensor data from these sensors can be processed locally, using established algorithms, to abstract the low-level information.

3) Fusion. The (pre-processed) values need to be fused, in general by autonomously mapping them onto a (machine-constructed) concept. This requires that the sensors shouldn't give a complex output, so that their signals can easily be fused together.

4) Annotation. The user needs to annotate (=”teach”) these concepts, to make the system’s output intelligible afterwards. Time and nature of these annotations is chosen by the user.

Figure 2. Abstract diagram of the proposed bottom-up methodology.

The general proposed approach is a bottom-up one (see Figure 2), starting with the actual data from the hardware sensors, and generalizing towards the user’s description of the observed data. This method is the opposite of the traditional design of a sensor-based system where a specific application dictates which concepts are useful, and what sensors are required.

 

2. Objectives

In short, the objective is to build up self-learning systems that observe the environment with low-level sensors, and form a bridge between the sensed data, and applications in ubiquitous and wearable computing. But in order to get a description that is as precise as possible, a list of properties and requirements that have to be fulfilled will specify how the proposed system differs from any of the mainstream research. After that, a list of targets follows this requirements phase which further specifies what can be expected as an outcome of this research.

2.1. Requirements and Essential Characteristics

This list contains the most fundamental properties of the system, from which the proposal starts.

The system’s output is a user-intelligible description of the current context. The most important property of the system is that it can be described as a service that provides a description of the current situation, environment, user-activity, or context. The requirement that this output should be intelligible to the user makes it almost a clear-cut classification algorithm, were it not for the absence of restrictions on the information source and output set of classes (see the next two items on this list).

The system’s information can come from many, different sensors, embedded in the user’s clothing or surroundings. Only practical limitations are considered on the variety, amount or nature of these sensors. This assumption leans closely towards visions in wearable and ubiquitous computing where sensors are available in abundance throughout the environment and possibly in close proximity of the user.

The designer of the system is not expected to have an idea on the specific nature of the contexts. Which is the strength and weakness of the system. It allows a layered architecture with unambiguous borders between the user, the application(s), and the context acquisition system. Imposing no restrictions at all on the contexts, on the other hand, results in an unachievable 100%-recognition, and at least the prospect on possible failures.

The user has to be able to re-train the system for contexts that are new, changed, or for contexts for which the system performs badly. A lot of methods that implement incremental training come usually from the discipline of bottom-up AI, where for instance autonomous robots drive around in a special arena where they constantly need to adapt to changes in their environment. Since the system produces the description of what it thinks is the current context, user intervention is a straightforward way of correcting the learning process.

2.2. Targets

The best method to examine a system with the aforementioned characteristics is striving towards an actual implementation. Prototyping will therefore be a principal technique to establish whether a certain choice is realistic or not, and how it compares to other existing methods, fitting seamlessly in the proposed bottom-up approach. This covers mainly the analysis algorithms (for combining and examining the sensor signals), and to a lesser extent custom hardware modules (driving the sensors). Out of these objectives, some proofs of concept need to be constructed to evaluate the functioning of both algorithms and hardware platform. Finally, a way to disseminate the research is another obvious target, since this research is aimed at being a service that is linked to other related research.

Platform. The main objective is to come up with a reusable software framework for sensor analysis and a toolkit-like collection of algorithms for sub-symbolic sensor fusion in low-end embedded systems. The focal point of this proposal will be the collection of algorithms that take in multiple streams of sensor values, and use this to learn and estimate what this means in terms of user-annotated descriptions. A large selection of analysis algorithms can be applied to the sensor data, and different solutions may be required in various situations. The proposal is therefore focusing on a software framework – a toolkit – rather than one specific algorithm that will be considered as the optimal solution in any circumstance. Strict limitations and requirements however, such as the source being noisy real-world sensor data and the need for transparency towards the user, will have an undoubtedly narrowing effect on applicability of the vast amount of algorithms that are available.

Simultaneously, custom-built hardware will support the acquisition of real-world data. Although off-the-shelf products are already available for simply logging the recorded data, no sensor hardware really surpasses off-line collection and conservative analysis of sensors, usually with tough constraints on the kinds of sensors, and the integration in other than the intended systems. It is thus necessary to build suitable modules to achieve the sensing network, combining already embedded sensors (like those in the iPAQ PDA, for instance) and novel sensors.

Proof of concept. Another goal is a number of prototype applications that make use of this service and software architecture, as well as a basic set of guidelines for sensor integration at application level to support deployment in wearable/ubiquitous systems.

A crucial part of the methodology is to experiment in a ‘living laboratory’ to further the identification of concepts that could be helpful in identifying contexts. The idea behind a living laboratory is that we will experiment in a routine environment rather than a dedicated lab space. Only in a routine environment we will be able to assess our method’s performance in predicting common user situations from sensor data.

Dissemination. As the core objective is to build a service, it is critical to open up the intermediate results of the research to other institutions by setting up a web-mediated database where datasets can be submitted/uploaded to facilitate a collective comparison of algorithms and hardware platforms.

3. Foundation

This chapter describes the research background for this report, blending topics from the fields of ubiquitous computing and artificial intelligence. Theories and structures from both fields will be discussed separately on an abstract level, to allow easy reference in the methods chapter (chapter 5) when specific algorithms and methods will be covered.

3.1. Sensor Networks in Ubiquitous and Wearable Computing

Interaction with computers can happen in different ways, on different levels of user-involvement, and with different applications in mind. This section will introduce two fields with strong ties to research in human-computer interaction, and an emerging culture of involving sensors in these views.

3.1.1. The Ubiquitous Vision

The term ubiquitous computing is originally dubbed by Mark Weiser, who used it to name the third wave of computing. His website clarifies [68]: “First were mainframes, each shared by lots of people. Now we are in the personal computing era, person and machine staring uneasily at each other across the desktop. Next comes ubiquitous computing, or the age of calm technology, when technology recedes into the background of our lives. Alan Kay of Apple calls this ‘Third Paradigm’ computing“. Ubiquitous computing is in Weiser’s work positioned as an opposite of virtual reality, where computers are integrated in the real world (instead of putting people in computer-generated environments).

Ubiquitous Computing could be seen as an approach to human-computer interaction, but is also about distributing computation in the environment, as opposed to keeping it bottled in a desktop-bound personal computer.

3.1.2. The Wearable Vision

Wearable computing is closely related to ubiquitous computing, but restricts itself to the computer as a worn, personal technology that makes information instantly available to the wearer. The research was initially aimed at handheld-sized, hip-worn devices with micro-optical displays integrated in spectacles, but has gradually shifted to one or more computing elements actually embedded in clothing.

The fuzzy definition of a wearable computer is that it's a computer that is always with the wearer, is comfortable and easy to keep and use, and is as unobtrusive as clothing.  However, this ‘smart clothing’ definition is unsatisfactory when pushed in the details.  A more specific definition is that wearable computers have many of the following characteristics: Portable while operational, possible to use hands-free, using sensors, attention-getting, and always in on-mode [38].

3.1.3. Ubiquitous Sensors

The sensor is an additional element that is making its way into both the ubiquitous computing and wearable computing fields. It has the potential to give the computer system information without any user-interaction (a light sensor regulating the brightness of a display for instance), or augment the user’s senses (e.g. a thermometer giving the precise temperature). Many sensors can also be combined with the advances made in wireless ad-hoc networks (and development of protocols such as Bluetooth or Wave LAN), to give clearer, more precise readings. This section will give an overview of several popular sensor platforms from both commercial and academic groups as typical examples, to give the reader an idea of how this sensing is usually managed.

 

The Active Badge system [66], developed between 1989 and 1992 at the former Cambridge AT&T labs was one of the first devices that made use of distributed sensors in an office environment, incorporating wearable components. An infrared transceiver is used for communication with other badges, and to pick up signals planted in the environment. This system is the first that was produced in big numbers (+/- 1500 users) and used over a variety of labs.

A second well-known system that is based on a distribution of location sensors, is the Xerox ParcTab [67] (1992) project. It was designed to explore the capabilities and impact of mobile computers in an office setting. At the mature state of the platform, it was in use by about 40 members of Xerox Parc (in 1994), with about 50 infrared transceiver base-stations.

The Smart Badge [02] built in 1997 contains an IrDA transceiver, PIC microcontroller and an array of sensors. It periodically polls its sensors, transmits the result of the poll and then waits for commands from the location server. After a time-out period the Smart Badge goes to sleep; when it wakes up it repeats its loop. It is very alike the Active Badge, but has additional sensors (light, temperature, etc).

The TEA board was developed within the TEA [57] project in 1997 to provide a prototype that enables to acquire raw sensor data. The first version of the sensor board is equipped with many different low-level sensors (namely light sensors, accelerometers, infrared sensors, microphones, CO gas sensors, temperature sensors and atmospheric pressure sensors) and a chip that digitizes the signals from the sensors at a certain rate. The second version (1999) of the TEA sensor board has been tailored to slip into the back of a mobile phone and contains two photodiodes, two microphones, a dual axis accelerometer, a digital temperature sensor and a touch sensor, enabling the phone to switch profiles [52].

The ProComp is a series of sensor modules that is geared towards the measuring of bio-feedback and physiological signals, such as EEG, EMG, ECG, blood pressure, skin temperature, and respiration. It communicates via fiber-optic cables to meet strict requirements in terms of noise and accuracy. This module is especially popular in the wearable computing field, as it is one of the few lightweight, multi-sensor systems available with such a range of bio-sensors.

Rockwell Scientific and UCLA developed WINS [36] (1999) or wireless sensor networks, mainly for monitoring and surveillance in rough circumstances (e.g. machinery in navy ships). Special versions were even launched in space to form inter-communicating “pico-satellites”. Pluggable sensors include microphones, accelerometers, magnetometers, and seismic sensors.

The Motes [55] (1999) form self-organizing wireless-sensor networks, a realization of the Pentagon's "smart-dust" concept. They were created by the University of California and Intel, and are being tested out worldwide today. The prototype Motes consist of an application-specific sensor array board married to a generic wireless controller board, both in a hermetically sealed enclosure.

To facilitate the self-organizing of Motes into a sensor network, the researchers created TinyOS and TinyDB as well as a host of Tiny applications and a simulator. In the summer of 2001, eight hundred (800!) motes were connected in a self-organized wireless network [40]. In 2002, tests with a network of motes sensing surrounding temperature, humidity, barometric pressure, and mid-range infrared were deployed on Great Duck Island, Maine, to monitor nesting burrows used by the Leach's Storm Petrel [35].

The MediaCup [3,17] (1999) is a typical example of common objects that get augmented with sensing, in this case an ordinary coffee cup. The MediaCup collects and communicates sensed context information in a given environment. The objective of the MediaCup setup (consisting of several cups and peripheral equipment) is to explore the added value of computerized everyday objects.

The Dallas iButton [26] (1999) has been used in various ubiquitous computing projects, due to its small size and robust steel casing (e.g., [64], [65]). Specific products vary between timers, memory storage, but also miniature data loggers (mainly as temperature sensors). With more than 65 million iButtons circulated in 2002 [26], this is probably the most widespread sensor system, albeit mainly restricted to logging.

The Smart-Its [18] project (2001) had sensor board development as one of its priorities, and therefore produced a wide variety of modules. The aim was to make them as small-scaled as possible so they could be attached to common objects to augment them with a “digital self”. The prototypes consist of a communication board, and a separate sensor board, just like the Motes, but in a wider variety. Bluetooth, and two other types of RF communication boards are available, and about a dozen of application-driven sensor boards.

 

      

The DrDAQ board [12] from Picotech is not designed to be small or wearable, but has the advantage that it can be connected to a parallel port, which makes it easily deployable in office environments where most PCs have an unused parallel port anyway. The sensors are highly calibrated, and sockets are available for extras.

Sensor boards were until recently considered to be rigid in structure, and minimizing them therefore meant reducing the size and number of its components. At Starlab Research’s I-wear project [27], flexible sensor boards (2001) were built that, coated with polyurethane, could be washed and embedded in clothing. Their suppleness allows seamless integration without obstructing the user’s movements.

sensewear armband features

The SensWearTM armband from BodyMedia [6] is a wearable body monitor that enables wireless data collection. Worn on the back of the upper arm, it utilizes a combination of sensors that continuously gather the biometric data: movement (via a dual-axis accelerometer), heat flow, skin temperature, ambient temperature, and galvanic skin response. It is mainly used for logging the data, with health-oriented applications mining the data afterwards.


3.2. Signal Processing and Sensor Fusion

Every sensor in the system is expected to spit out a large amount of values over time. Some of them will produce such a large stream of data, giving such a low-level description, that it is almost impossible to use this directly as input for a recognition system. The values of a light sensor can for instance be replaced by the mean and the variance over a sliding history window. Another – and perhaps better - example is the microphone since it produces an even larger amount of values. A range of transformations and filters are traditionally applied by default for this purpose (e.g. power spectrum, Fourier transformation, etc.).

 

Figure 3. A layered view of the different levels in which sensor data is abstracted in this report, starting from the middle layer and gradually abstracting towards the outer layer. Each layer is based upon one or more previous layers. Varying complexity of elements in the same layer is loosely ranked as indicated by the arrow to the left, to show that complexity of elements and abstracting layers do not necessarily mean the same. Arrows within layers or from inner to outer layers depict examples of intermediate abstractions.

Not all these methods are on the same level of abstraction, which is why this section is sub-divided into two different kinds of abstraction: features and embedded sources. They differ mainly in how their output primitives look like, inclining towards either the sub-symbolic or the symbolic respectively. Figure 3 shows these two intermediate levels as transitions between sensor values and context description, including some descriptive examples. Note that complexity between primitives can still vary wildly (e.g. with a power spectrum of microphone data telling more that the maximum).

3.2.1. Features

The most basic way to pre-process a data stream from a sensor is to use common elements from statistics, such as minimum, maximum, average or standard deviation. These values are usually referred to as features, descriptors, or cues, as they describe a stream of data by crunching it down to just one value. This value can be either discrete, scalar, or a binary value. The features can be interpreted as values that are the results of often small and quick calculations on data that is sent in a dense stream.

Features have several benefits:

  1. By using these features instead of the raw stream of sensor data, bandwidth and amount of data can be reduced. This enables any slower adaptive learning algorithms that work on the features instead of the raw sensor data to be as near real-time as possible. The average, for instance, takes samples during a specified period, and describes this data with just one value.
  2. Filters can also be used as features, to cancel out any noise that might be generated by the sensing. A simple example would be the ‘moving average’, where the average is recalculated over the last few samples of sensor data.
  3. Utilizing features also optimises the system’s generalization performance since a slightly more abstract interpretation of the data is processed. The higher level interpretation makes it furthermore easier to inspect any rules that adaptive algorithms may form afterwards.

Artificial Neural Networks.

Artificial neural networks provide a scheme that is closely related to networks of brain cells, with special concentration on massively distributing information and concurrent processing of that information, while keeping the processing elements as simple as possible. Depending on the kind of neural network, many different sensors, or values from the same sensor that were sensed at different times, might be used as an input. Generally, artificial neural networks learn what they should output from example, and their output is treated as an approximation. Most neural network models are known to be profoundly robust in the presence of noise (even noise from the example data), and make therefore a good type of algorithm to handle (inherently noisy) sensor data.

A good example [39] is the ALVINN [43] system, which uses backpropagation to learn to steer an autonomous car driving at speeds up to 70 miles per hour, using a grid of camera-pixels as sensors.

Tracking and filtering algorithms.

The kind of algorithm that is probably the most often applied upon high-speed sensor data is the tracking/filtering algorithm. Kalman filters [30], for instance, are widely applied to predict what values a specific set of sensors will produce next, given previous values. It reassesses its performance with every array of new values coming in, and changes its model accordingly. Applications range from the tracking of visual markers, to re-calibrating positioning sensors.

Sensor/Feature Fusion.

These features do not necessarily calculate description values for a stream of data from just one sensor. (Cross-)Correlation is for example a basic measurement of how related or alike two data streams are in time. Dimensionality reduction algorithms (like principal component analysis, PCA [28], or independent component analysis, ICA [25]) exist that merge the values from several sensors into one value, while trying to preserve most of the original information. The components that PCA and ICA algorithms produce are often linear combinations of the individual data streams. Other common algorithms, such as Sammon mapping [48], Curvilinear Component Analysis and Curvilinear Distance Analysis (CCA and CDA [33]), are not limited to this linear mapping, but are often more demanding on processing resources. See Figure 4 for an example.

Figure 4. Using dimensionality reduction algorithms (Curvilinear Distance Analysis, CDA, in this case) to reduce the dimension of a data set (upper: from 2d to 1d, and lower: from 3d to 2d), while retaining as much information as possible. The objects on the left are non-linear in nature (not describable by linear equations), yet they can be unfolded with CDA. Think in the scope of this proposal of combining the information in two streams of data to one stream (or three to two, etc.) [33]. 

3.2.2. Embedded Sources, Concepts and Phenomena

A lot of algorithms are able to expand the raw sensor values to symbolic units. It becomes then possible to use algorithms acting on a symbolic to perform further reflection on the data streaming in to the system. Directed graph-based models for instance are known to handle temporal and recurring patterns very well, with the hidden Markov model (HMM) as a typical example.

Topographic Mapping.

Topographic mapping, under some constraints, has several properties that make it very descriptive of what is being sensed. The Kohonen Self-Organizing Map, for instance, is known to display structures from the input (sensor) space that are on a very abstract level, including hierarchies. Famous examples are Kohonen’s Spanning Tree example, where a topologic map acts as a taxonomy of difficult to describe, artificial data, constructed by the mapping algorithms (Figures 5a and 5b), or the world poverty map, where 39 welfare/poverty aspects of countries are mapped onto a 2d space (Figure 6). Topographic maps based on sensor data can thus give indications of patterns that are present in the sensor data but are hard to distinguish with lower-level pre-processing techniques.

Figure 5. a) The input for Kohonen’s Spanning Tree [32] example. b) The minimal spanning tree for the data. Source: [31].

Figure 6. a) The Kohonen Map representation. b) The representation from the grid growing algorithm. Source: [31].

Figure 7. A dataset describing 39 welfare- and poverty-related aspects for 77 countries (source: World Development Report, World Bank 1992), mapped onto a 2d plane by topographic mapping (source:[31]). Note how geographic location is somehow integrated in the mapping.

Cocktail Party Problem.

A very well known problem in source detection is the so-called cocktail party problem (coined by Cherry [9]): Imagine a cocktail party: for humans, it is no problem at all to follow the discussion of neighbours and focus on it, even if there are lots of other equally-loud sound sources in the room: other discussions, music in the background, noise from passing traffic seem easy to be filtered out. The cocktail party problem sets out to do the same, but with microphones attached to a computer. The real difficulty is that a given microphone doesn’t necessarily have to be placed to a given speakers mouth and doesn’t have to be shielded from the other speakers (see Figure 7 for an example).

It is not known exactly how we are able to separate the different sound sources, but algorithms exist that are able to do it, almost under real-world circumstances (Independent Component Analysis or ICA works well, if there are at least as many microphones or 'ears' in the room as there are different simultaneous sound sources, for instance). Various approaches with their variations have been employed to tackle the cocktail party problem; the most popular seems to be ICA, but many other approaches tackle the same problem (e.g. from multivariate Bayesian statistics [46]). Blind source detection/separation is applicable to any system that tries to detect information from complex concepts with multiple sensors that pick up one or many sources.

Figure 8. Example for the cocktail party problem, in the case of two sources, and two microphones (source:[25]). Right are the timeseries plots of what sound pattern they produce and hear, respectively.

Graphical Models and Bayesian Networks.

Despite of what the name might suggest, Bayesian networks do not necessarily commit to Bayesian statistics, but are so called because they use Bayes' rule for probabilistic inference. Graphical models are a marriage between probability theory and graph theory. Graphical models provide a natural tool for dealing with uncertainty and complexity. Fundamental to the idea of a graphical model is the notion of modularity: a complex system is built by combining simpler parts. Probability theory provides the glue that combines the parts, ensuring that the system as a whole is consistent, and providing ways to interface models to data. The graph theoretic side of graphical models provides both an intuitively appealing interface by which humans can model highly-interacting sets of variables as well as a data structure that lends itself naturally to the design of efficient general-purpose algorithms. Examples include mixture models, factor analysis, hidden Markov models, Kalman filters and Ising models [29].

An interesting fielded application is the Vista [23] system, developed at Microsoft Research. It is a decision-theoretic system that has been used at NASA Mission Control Center in Houston for several years. The system uses Bayesian networks to interpret live telemetry and provides advice on the likelihood of alternative failures of the space shuttle's propulsion systems. It also considers time criticality and recommends actions of the highest expected utility. The Vista system also employs decision-theoretic methods for controlling the display of information to dynamically identify the most important information to highlight. Horvitz has gone on to attempt to apply similar technology to Microsoft products, the Lumiere [24] project for instance [41].

3.2.3. The Curse of Dimensionality

As more sensors are added to the system, the dimension of the input space increases, and any algorithm acting upon it needs more resources to be able to map the bigger input space to the output space. The consequences are that the algorithm gets slower and less fault tolerant. This problem is well known in the domain of machine learning (and beyond) as the “Curse of Dimensionality” (coined by Bellman [4]). This is a fundamental problem, since it is assumed that the sheer number of sensors (plus the diversity among them), makes the system generally able to learn any context!

3.3. Unsupervised Learning Techniques

Normally, classification algorithms need an initial training phase in which the designer introduces labelled sensor data. The algorithm then stops learning and incoming sensor data is classified using the labelled data as example. The notion of unsupervised learning is a very appealing one if the system needs to stay adaptive, yet user-friendly. These methods are extensively used in what Steels calls Behavioural AI [56], where behaviour in embodied software/hardware components emerges and becomes intelligent and adaptive. The measure of success for these units, or (robotic) ‘agents’, is specifically their ability to function and survive in dynamic and unprepared environments. Other names for this field are also bottom-up AI [37], Animal Robotics [7] or the Animat approach [71]. After summing up the various traditional obstacles one can expect in this approach, some examples from Behavioural AI will illustrate this.

3.3.1. Clusters and Topological Maps

A first common technique is to cluster the data from the sensors, grouping similar patterns together into so-called clusters. Critical in this approach is to define a metric that characterizes similarity between two samples of sensor data. Conventional metrics for similarity in sensor data are for instance the Manhattan –or Cityblock metric, the Euclidean metric, the Minkowski metric (which is a general case of the previous two), or the Mahanalobis metric (see [59] for a concise introduction). Some metrics (e.g. Mahalanobis) can factor in experience from previous data, for example to incorporate correlation between the data from two sensors. Widely used clustering algorithms are the sequential leader algorithm [21] and k-means [34].

Topologic mapping is a more specific kind of clustering, as it also requires a metric between the clusters themselves. This structuring of information in two layers of similarity has the advantage that it makes it easy to let the system generalise after forming the clusters: Neighbouring clusters can be consulted, if a cluster hasn’t been linked to a concept (yet). Typical algorithms are the Kohonen self-organizing map (KSOM), the (growing) neural gas algorithm (GNG), LBG (generalized Lloyd), and derivatives (see [16] for an overview).

The main benefit of using unsupervised learning algorithms is that exploration is extensively used, which results in minimal required user interaction and interruption. Such an eager learning approach allows a loose coupling between the process where the user trains the system what clustered phenomena are linked to what user-intelligible concepts and the process where the system tries to abstract the raw sensor information into clusters.

3.3.2. Concept Drift/Shift and Hidden Contexts

In machine learning, a hidden context refers to information that is not perceived, but can be important for classification tasks. “Mild weather means different things in Siberia and in Central Africa; Beatle fans had a different idea of a fashionable haircut than the Depeche-Mode generation. Or consider weather prediction rules, which may vary radically depending on the season.” [69]. 

If the concepts that are to be learned depend on hidden contexts, small changes in a hidden context can result in significant changes in those concepts. In the machine learning literature, this is generally known as concept drift. Concept shift is a similar issue where concepts change more drastically instead of gradually changing over time.

Real concept drift reflects real changes in the world, whereas virtual concept drift occurs in the world model and can be usually tracked down to a lack of training of the learning algorithm. Most typically the teacher is to blame, having only a particular context in mind and considering only the related pieces of information to be relevant. Also, some contexts may depend on values that cannot be measured, or required measurements can be too expensive.

Algorithms that attempt to track the concept drift, such as STAGGER [50] or the FLORA framework [69] aim at detecting context changes without being explicitly told about them. Usually the concept description (or hypothesis) is only valid in a window over the stream of examples, and older descriptions (or hypotheses) are stored and re-examined whenever the algorithm suspects a change in context. Previously mentioned tracker algorithms, such as the Kalman filter, would also be applicable for this problem.

3.3.3. Incremental/Online Learning

Most real-world environments are exceptionally dynamic, which creates problems for traditional classification algorithms that tend to stay truly adaptive only during the initial training phase and are left unchanged afterwards. Since problems like concept shift and concept drift mean that sensor data for a particular context will change after the training phase, recognition accuracy is bound to suffer, gradually worsening.

An obvious solution is to refresh the training every now and then, possibly triggered by a mechanism that monitors how novel the sensor data is, or triggered by the user when faulty prediction is noticed. It would in this case be a shame to throw away the material that was already learned in previous training phases, and incremental learning therefore means updating, instead of completely re-learning from scratch. A second advantage of incremental learning is that generally, during concept shift, less training is needed.

3.3.4. Catastrophic Forgetting, Memory Decay and Ageing

A major problem that incremental (or online) learning brings along is known as the ‘stability-plasticity dilemma’ or ‘catastrophic forgetting’ (Grossberg, [20]): the system tries to store and remember as much as it can because it does not know what information is really relevant to the user, and can therefore possibly throw away (forget) crucial data. When classifying new sensor data or, more generally new examples, incrementally, one describes the learning as online learning. Information will enter the system only once, after which it updates its model description and gets ready for a new example. Given that there is only limited memory storage available, information that was once stored during learning, could disappear because it was replaced by new information. This is not a problem if this happens in a controlled way, and the discarded information was not valid any more.

On the other hand, many modifications of well-established learning techniques use memory decay or forgetting operators to make sure that recent examples in online learning have a stronger influence on the learning than older ones. The momentum function in the delta rule of neural networks [22], or the plasticity-stability mechanisms in Adaptive Resonance Theory (ART) (Carpenter & Grossberg, 1987 [10]) explicitly try to forget outdated information, using time to mark outdated examples. In contrast to this, density-adaptive forgetting (Salganicoff, 1993) [47] discards previously learned information only when similar information can supersede it.

3.3.5. Summary

This section followed a train of thought that started from the ideal way to represent and learn sensor data, with minimal supervision from design or user. As this leads to traditional problems and solutions in machine learning, this section was structured as a concise overview of what is to be expected throughout the research (see Figure 9).

 

 Figure 9. Organization of this section, explaining pitfalls and methodologies in unsupervised learning.

 

4. Motivation

The motivation for investigation of multi-sensor context acquisition is to support systems with an awareness of user situations in the real world. This chapter will discuss the rationale for some choices that were made in this proposal. Firstly, to illustrate the envisioned benefits of the proposed research, several scenarios give an idea of end-user applications. After that, more specific targets are given per abstraction level.

4.1. Scenarios

In order to show the range of applications that could benefit from this specific work, several existing scenarios are listed below that were devised by various research groups. Most of the scenarios come from the fields of ubiquitous and wearable computing, although many also have their roots in broader, human-computer interface (HCI)-related, research.

4.1.1. DrWhatsOn

This scenario is based on previous research carried out in the TEA [57, 51] project and is mainly oriented towards mobile and wearable computing. The title DrWhatsOn comes from a scenario-based presentation of a “context sensitive” phone that was made at Nokia Research [13] after completion of the TEA project.

The context sensitive mobile phone would use pure sensory information from the mobile phone, combined with the state of the hand-held device, personal preferences and activities (from an internal calendar). This would allow to make better user interfaces, maybe avoiding disturbing people at wrong times, giving the right information at the right time, and residing to background (“peripheral attention”) [13].

The following table (Table 1) shows a scenario where the day of a fictitious Finnish student, Dude, is depicted. As he goes through his daily routines, his phone picks up certain contexts and (sometimes) reacts accordingly. The captions give more in-depth information with the text marked in red where context is being abstracted from the mobile phone’s sensors.

Table 1. A Mobile Phone Scenario using context acquisition: DrWhatson. All images and captions are copyright by, and used with permission from: Urpo Tuomela, Nokia Research Laboratories, Oulu, Finland. [13].

Dude stands at bus stop. It's cold outside. The bus arrives soon and he gets in. The mobile phone pays the trip automatically. There are other people in the bus, but otherwise the trip is as boring as usual.

Dude enters the main library of the university. The phone goes to silent mode. He returns some books and the mobile phone gives him a reminder to ask if a reserved book has arrived. It has not. After browsing some books on shelves he leaves to look for Assistant Mr. Smith.

 

Dude walks on a corridor looking for the assistant of Applied Physics. Finally he finds his room, but assistant Mr. Smith is not there. The information tag by the door tells that he has left for a coffee break. Dude checks the location of the assistant with his mobile phone. Server manages to find him: he is in the local café Magma.

 

Dude buys coffee with a donut. He goes to a table where two other guys sit. One of them is the Assistant Mr. Smith. Dude's mobile phone recognizes the fellows, and finds the note related to Mr. Smith: an exercise report that has to be finished soon. They discuss it and a time is agreed for a check. Dude's mobile phone recognizes it and saves the time in the calendar.

 

Dude walks in lecture hall D1 and sits on the back row. The mobile phone goes automatically to silent mode. The lecturer starts speaking, and after a while he sends the correct results of the previous exercise to participants. Then suddenly a message arrives about an interesting girl outside. Profiles of interest match 83%. Dude leaves.

Dude walks to the girl and discusses lively with her. He is very excited. The mobile phone recognizes this from his hand movements, sweating of his palms and from the tone of his voice.  The girl also has a mobile phone and she sends her personal card to him. After a while they leave to different directions.

 

 

Dude arrives at home. After some time the doorbell rings and the girl enters his apartment. It's evening. They share nice moments with talking, drinking tea and playing guitar. Eventually they leave to another room. The mobile phone is left alone on the table.

 

 

4.1.2. Ubicomp scenarios

The following scenario was introduced during the Ubicomp 2003 workshop on User Centred Evaluations for Ubiquitous Computing Systems [54], and covers the distribution of the user’s personal agenda in objects (calendar, cars, alarm, …). The knowledge per object could come from its own sensors (detecting when the user wakes up, for instance):

One ordinary morning in the near future. You live in an apartment at the old town of Gothenburg, Sweden. It is an ordinary autumnal Tuesday morning. Your electronic calendar has noticed that you have to be at the office by 9, because you have a meeting at 9.15. Your car's travelling program has sent a message to calendar, that the trip from your apartment to the office takes 20 minutes. The calendar has sent a message to the alarm system, which actives wake-up at 8 o'clock.

First there are silent voices, birds are singing softly in the background. Your favourite bird is finch. Little by little the curtains are opened, lights turn brighter and the finch's song becomes louder. As the awakening continues, the room fills with coffee smell and with the scent of coffee and with sounds of morning routines. Gradually you start to be aware of your surroundings and sit up at the side of the bed. You notice that you have a morning meeting and you go to the shower... [54]

The next scenario comes from research at GeorgiaTech’s AwareHome [42]. The house acts in this case as a single ‘smart object’, and is not only required to know what it is supposed to do, but also who is addressing the house, in what situation the people are,  and from where:

From the kitchen, Mom sends Sally down to the basement to get some items from the pantry. Once Sally gets down to the pantry, she cannot find the items Mom sent her down to retrieve. Sally wants to ask for some clarification from Mom, but Mom cannot hear even if Sally yells. So, Sally instructs the house intercom, “House, I want to talk to Mom.” Meanwhile, Mom has set up a baby monitor connection to her younger son, Joey. She can hear Joey crying, so she departs to the family room to care for him. When the house recognizes the request from Sally down in the basement pantry, it then locates Mom, who has now moved to the family room. The house knows that baby Joey is also in the family room, so tells Sally, “Mom is now in the living room with Joey. Do you still wish to speak with Mom?” Sally guesses that Mom is changing Joey’s diaper because he was heard crying before she went down to the basement. Though Mom’s attention will be divided, Sally still wants to speak with her, so she responds, “Yes.” A two-way audio connection is established between Sally in the basement pantry and Mom in the living room. Sally asks Mom to help her determine which items to bring up to the kitchen. During the course of the conversation, Mom finishes with Joey and returns to the kitchen to see what else she needs Sally to bring up from the pantry. The conversation between Sally and Mom continues uninterrupted as both move about the house. As Sally finally returns to the kitchen where Mom is, the house determines that their remote conversation has ended and automatically terminates the audio connection between them. [42]

Finally, Thomas Erickson mentions in [14] a few short scenarios that explore the limits of context acquisition:

  • Spying a newsrack, Tom pulls his rented car to the side of the street and hops out to grab a paper. The car, recognizing the door has just closed and the engine is running, locks its doors.
  • In the midst of her finely honed closing pitch, Susan’s prospective clients watch intently as her screensaver kicks in and the carefully crafted text of her slide slowly morphs into flowing abstract shapes that gradually dissolve into blackness.
  • “What a cretin,” Roger mutters as the CEO finishes his presentation, unaware, for the moment, that the high-tech speaker phone in the table’s center has triangulated on his whisper and upped its gain to broadcast his remark to the meeting’s remote audience.[14]

The scenarios stress the following two aspects of context awareness/acquisition: One is the fact that the user is taken out of the loop and processes are expected to act autonomously (with action as an important property), and there is also the fact that the ability to recognize the context and determine the appropriate action requires considerable intelligence” [14].

4.2. Potential Contributions to Multi-Modal Perception

The topic of this proposal is positioned in a shared territory between the traditionally theoretic disciplines around signal processing and incremental learning, and the more application-oriented fields of ubiquitous and wearable computing. On first sight, the setting of the two latter needs the algorithms of the first two, to make applications more powerful, pro-active, or generally smarter. It is probable, however, that the real-world situation and multi-sensor-driven devices from ubiquitous and wearable computing will have an impact on the learning frameworks and theories. At the very least, it offers an opportunity to test algorithms for an application that fits closely to their objective, but with certain requirements and limitations.

To aid the development and the in-between evaluations of these signal processing – and machine learning algorithms, a portion of the work will be invested in visualisation techniques as well. Plotting routines for embedded systems (such as on the smaller iPAQ screens), to monitor sensor values, pre-processing and the behaviour of learning algorithms in real-world settings. This might also have benefits in making any learning methods more transparent toward the user.

4.3. Novel HCI Focus in Learning Algorithms

As the system is expected to use the context descriptions of its user, it is inevitable that the user has to be involved in the learning process. User interaction, on the other hand, must be avoided to avoid time-wasting and annoying interruptions. The result is a trade-off between interaction and learning accuracy.

Figure 10. Context Awareness and Personal Augmentation (source: Nitin Sahwney [49]). The arrow indicates the proposed objective from an HCI perspective.

The goal from a Human Computer Interaction (HCI) perspective is to make an adaptive system as aware as possible and at the same time as passive as possible towards its user. A device that is aware of its situation can be mapped onto a two-dimensional plane [49] that describes the amount of involvement required from the user and the level of situational awareness. Sunglasses are an example of not very intelligent and passive devices; Not intelligent as they have no clue whether the sun is shining or not, and passive because they operate usually in an autonomous way, requiring little user-involvement – the wearer only has to decide whether to put them on or off. More intelligent sunglasses exist that adapt themselves to the brightness of the sunlight; these would be placed higher on the situational awareness scale, but they remain as passive as their traditional versions.

5. Methodology

In general, a bottom-up approach will be followed, starting from genuine sensors that generate real-world data, rather than using simulation or abstract models. The biggest disadvantage of this tactic is that it requires a slow and tedious development process; In particular since hardware (sensor-) platforms are involved. However, this does not measure up to the gain of having actual verifiable demonstrators in the process.

Two fundamental choices define the general direction of this proposal, and will have consequences on the upcoming research. After a rationalization for both options, and an assessment on the alternatives, four sub-goals will be identified.

5.1. Approach

5.1.1. Multiple Embedded Sensors, Dynamically Recruited

Why the amount of sensors matters.

Imagine a wearable application where the computer has to react whenever the wearer is sitting. Many researchers ([60, 45, 15]) have 'solved' this problem by placing one or two motion/position sensors on the leg or hip of the wearer.

This approach is far from error-proof, however: if the sensors are just above the knee for instance, lying on the ground would produce the same sensor values as sitting, and so would lifting the leg the sensors are attached to. A solution would be to attach two more sensors on the other leg, but lying horizontally would still look like 'sitting down' to the system. We can go on like this and even after adding many more sensors of the same type, still come up with situations that make the system incorrectly 'detect' sitting.

And the same goes for other types of sensors: light, temperature, humidity, or infrared on their own might not be much of use - give a highly faulty system - but a combination of them could be extremely valuable. Either to improve recognition of one situation, or to better distinguish many different ones.

The assumption that the sensor platform has a multitude of diverse sensors, implies that the right sensor combinations are present in the hardware to link the sensor’s data to the correct context descriptions. The designer of the application should therefore embed as many different sensors as possible (note the earlier made observation that sensors become both smaller and cheaper), expecting the software to fuse, mine and abstract the data appropriately.

Low level sensors.

Another important feature is the low-level output of the involved sensors. This suggests that a combination, or fusion, of many sensors is needed to reach a higher-level view of the user’s situation and context. It is therefore more important to focus on learning algorithms that adequately combine the sensor data, rather than exploring the (more traditional) signal processing algorithms that work per data stream.

The integration of changing collections of sensors into the model, results in more than one difficulty. This is not possible without providing some descriptive framework for each sensor, specifying what kind of data it provides, in which measurement unit, and how precisely it is calibrated. This introduces serious overhead, which wouldn’t be necessary in a static system where the network of sensors remains the same at all times.

Pro and contra.

Most of the benefits of a multi-simple-sensor approach were mentioned in previous research (see for instance [19]):

•      Cheap. The small and simple sensors require generally less resources and cost less than for instance cameras and GPS systems. An extremely large amount of simple sensors could of course invalidate this.

•      Robust. Since the sensors we use are small, they can smoothly be distributed over a larger area which makes the sensing system less prone to errors. In case a sensor gets blocked or damaged, other sensors will still capture context-relevant information due to the redundancy in the sensors.

•      Distributed. The size also allows the sensors to be integrated much easier into the environment and especially in clothing.

•      Flexible. The richness and complexity of the identifiable contexts is directly linked to the amount, position and kind of sensors. Adding, moving, or improving sensors hence increases the performance of the system.

The advantages of this approach within the framework of this proposal around context acquisition are:

  • Mostly software based design instead of a customised, hybrid, hardware-software design. The assumption that there are plenty of suitable sensors moves the focus to the algorithm that processes the sensor data.
  • Reuse. The system lets all sensors act together to generate descriptions for all applications, instead of having to design a tailored sensor system for each application.
  • Modularity. The concept behind the bottom-up approach is more suitable for ad-hoc sensor networks where new sensors can be added or removed, since it is less based on a pre-designed collection of sensors.
  • Exploratory. The designer of the system is not expected to be aware of the capabilities of the sensors. Unusual and unconventional sensors, and especially combinations of sensors, might be superior and are easier exploited in the proposed scheme.

But there are also some disadvantages:

  • Complexity. As it is a more application-generic approach, it will be too cumbersome for specific, obvious applications (e.g. one light sensor driving the brightness of a screen).
  • Oversimplification. The recognition performance of this method will always be only as good, but often worse, than any system that was designed beforehand for that specific task with the same sensors, due to the generic nature of the method. The goal is not to tackle very specific tasks like speech recognition, but to provide a crude mechanism that could provide a service to many applications.

5.1.2. User-level context descriptions, minimal interaction

The training phase, where the algorithm learns the new contexts, is usually governed by the designer of the application. Giving the user the chance to train the system for new contexts, or contexts that aren’t recognized well enough, would result in a flexible solution.

It is fairly straightforward to evaluate how adaptive a learning method is for a fixed set of data, but algorithms that involve a flexible training scheme are harder to assess, especially if the re-training can be invoked by the user. It would be better to choose for a slower learning, but transparent, algorithm where the user has direct feedback on how well the recognition goes and re-trains when necessary, than a faster one that doesn’t allow real-time evaluation on its performance.

5.2. Sub-Goals

5.2.1. Sensors and Sensor Networks

The first step in recognizing patterns in the world begins by the actual sensing itself. Most platforms use a combination of plain hardware sensors. In general however, almost anything could be considered a sensor as long as it gives a description of the environment or situation; this includes user-input or phenomena that the system previously observed. For the sake of simplicity however, only easily embeddable, low-cost sensors will be targeted.

Many devices already contain sensors, and sensor modules have become adequate in size to be embedded almost anywhere. The source for the context acquisition is therefore considered to be a collection of sensors that are embedded, or embeddable, in both the environment and the user’s wear.

Working with these embedded sensors results in two properties:

  • The majority of these sensors is cheap, small, and gives a low-level output.
  • The user can change both apparel and environments, the same holds thus for the embedded sensor-configurations.

A bottom-up approach of collecting and analysing multi-sensor data in a test-bed environment will play a central part in the proposed research. Data needs to be recorded over longer periods of time (typically a matter of weeks, rather than days or hours) and then analyzed to identify clusters and patterns that may correlate with user-level perception of situational context (manually recorded). The bottom-up approach will be designed both to uncover further application opportunities, and to further our experience with sensor-related system requirements (such as required sampling rates).

The development of a method that relates multi-sensor data to user-level context requires collection of sensor data in real situations. This assumes basic research in special-purpose sensor hardware, which will be kept up-to-date in parallel with the other planned stages. This to ensure the end results do not suffer due to weak data from of outdated sensors.

Iterative prototyping will be used to develop and generalize our context acquisition method. Prototyping at a basic level involves the selection of sensors and device-level integration (i.e. development of drivers for sensor control). It will furthermore involve investigation of feature extraction algorithms for low-powered processing environments, and of sub-symbolic machine learning techniques for sensor fusion.

5.2.2. Pre-processing and sensor fusion

Plenty of examples for pre-processing methods were given in a previous section, but proposing a universal pre-processing approach or specifying the best pre-processing algorithms per type of sensor is too huge a task to be feasible. Preferably, a set of widespread algorithms that have proven their worth for the handling of sensor-based data should be more helpful, especially if they can be applied in a modular fashion to a wide range of different sensors, together with a painless interface to adjust their parameters.

The approach for development of sensor analysis and fusion techniques will be based on statistical analysis of captured data, and reliability testing with both logged data and live data. Specifically for sensor fusion, the approach will include experimentation with a range of neural network architectures, building on earlier work in which the application of Kohonen self-organizing maps and similar techniques were investigated to real-time sensor data analysis [61, 62, 63].

5.2.3. Source Detection, Concept Creation and Mapping

The algorithms that operate on a higher, symbolic level have less-defined restrictions than the features, but are harder to construct.

The evaluation of the proposed method requires trials under realistic rather than simulated conditions. Therefore a deployed test-bed of wearable and ubiquitous sensor boards will be important as a ‘sampling’ environment for our research. The test-bed is targeted at both wearable applications and ubiquitous computing scenarios, and it needs to be set up so not to interfere with the situations we aim to model based on multi-sensor data. Spaces in the work environment are chosen for convenience and controllability. Within these spaces the intention is to focus on common situations (detecting social situations, types of activity, state of environment) that reveal generic sources, rather than applications that are specific to work and office settings only.

We plan a series of experiments each conducted over a time span that involves many changes in situation, as well as recurring situations (typically days, but possibly weeks). The wearable experiments will be targeted at acquisition of context that changes according to people’s mobility. Examples are movement patterns (e.g. strolling, rushing, …), mobile activities (e.g. related to home or work routines), and types of place/event (e.g. supermarket, football pitch, public transport). The ubiquitous computing experiments will be targeted at acquisition of context in a given space. Examples are activities that take place, people that are present, changing physical conditions, and so on.

5.2.4. Annotation

The behavioural AI approach (as described previously) fully depends on self-generation of concepts that are embedded in the environment. In this research proposal, however, a significant shortcut can be made by taking the user in the learning loop: the system does not have to figure out how to survive in the environment, or how the sensations are to be connected to actions: the user acts as a teacher [58, 60]. This reduces the objective to a classification problem, but in highly flexible circumstances.

The user should be able to teach the system the name of whatever situation or context it can encounter. This name is just a designation, an annotation, so that both the computer and the wearable computer have a common vocabulary. The teaching should furthermore be done on-the-spot: while both user and computer are in that situation or context. The user chooses when, where, and how long the teaching will occur in order to avoid interruption or cumbersome human-computer interaction.

Validation of the annotating in the multi-sensor context acquisition method will be done by prototyping a variety of applications that demonstrate the use of the method. Application scenarios will be developed as part of the initial work on requirements. The aim is to specifically identify and demonstrate scenarios that require context that can not be inferred from position and/or vision sensors. Furthermore, quantitative evaluation is necessary of the approach in a range of experiments, to assess correct predictions, wrong predictions, failure of detection, detection delay, stability, etc. Experimental evaluation will also cover research questions such as how well the method scales with number of sensors, and how it performs when individual sensors become unavailable. For selected scenarios we also intend to reproduce experimental conditions reported in related work to obtain quantitative data for comparison.

An important point to make is that methods based on hardware sensors intrinsically contain noise and extracting these context descriptions from low-level sensors isn’t always reliable. However, incorporating contextual knowledge in user interface designs can be done on several levels, depending on how accurate and critical that information is:

  1. In the case of near-perfect context recognition and/or non-vital applications, contextual information can directly steer behaviour of the application(s). Slowly increasing and decreasing the font-size on a display, for example, while the display is resting on a table, while being held by someone, or while being carried while walking or running [51].
  1. On a less-critical level, choices can be pre-selected or pre-ordered, so that the user is not required to scroll down a list or page before finding what he/she needs. An example here is a mobile phone menu that already orders profiles (like “in a meeting”, “outside”, etc.) according to how likely they are to occur [52].

Logging contextual knowledge or annotating documents with this information, for instance, is completely non-critical. This is generally regarded as bonus information, and adding a confidence measure per estimation is sufficient for this use.

6. Work Plan

This work plan details the time-frame per objective in which it should be finished. A key distinction is made between the construction, optimization and keeping up-to-date with the hardware-based sensor platforms, and the investigation in the properties of suitable algorithms. Both have an impact of the eventual system’s performance, and thus have to run in parallel to incorporate advances made in sensor systems during the research. On a higher level of detail, demonstrators and performance-measured experiments act as milestones, using pre-set requirements for the hardware platforms, and well-established measures for the performance of adaptive algorithms.

The work to be undertaken is structured into four consecutive phases (preparation, two cycles of prototyping combined with experimentation, and finally consolidation) with a number of goals per phase:

Phase I: Preparation

  • Installation of the test-bed: integration and deployment of sensor boards, sensor control infrastructure and data collection infrastructure in environments; integration of sensor boards and data logging hardware/software in mobile/wearable devices.
  • Initial analysis using scenarios: identification and classification of context descriptions identified in context-aware computing scenarios adopted from collaborative projects.
  • Technology survey: survey of available low-cost sensors, and of sensor analysis techniques suitable for low-powered embedded systems.
  • Data collection and analysis: Logging of multi-sensor data in the test-bed spaces and with wearable setups over the duration of at least a week; statistical analysis of the data and correlation with manually annotated events/situations.

Milestone: Operational test-bed; Initial understanding of user and system requirements; Publication of collected data sets (as a database of downloadable content vi