TUM Kitchen Data Set

Introduction

The TUM Kitchen Data Set is provided to foster research in the areas of markerless human motion capture, motion segmentation and human activity recognition. It should aid researchers in these fields by providing a comprehensive collection of sensory input data that can be used to try out and to verify their algorithms. It is also meant to serve as a benchmark for comparative studies given the manually annotated “ground truth” labels of the underlying actions. The recorded activities have been selected with the intention to provide realistic and seemingly natural motions, and consist of everyday manipulation activities in a natural kitchen environment.


Description of the Data

The TUM Kitchen Data Set contains observations of several subjects setting a table in different ways. Some perform the activity like a robot would do, transporting the items one-by-one, other subjects behave more natural and grasp as many objects as they can at once. In addition, there are two episodes where the subjects repetitively performed reaching and grasping actions. Applications of the data are mainly in the areas of human motion tracking, motion segmentation, and activity recognition.

To provide sufficient information for recognizing and characterizing the observed activities, we recorded the following multi-modal sensor data:

A more detailed documentation of the data set is provided in the following Technical Report: The TUM Kitchen Data Set

Publications

Workshop Papers

The TUM Kitchen Data Set of Everyday Manipulation Activities for Motion Tracking and Action Recognition (Moritz Tenorth, Jan Bandouch, Michael Beetz), In IEEE International Workshop on Tracking Humans for the Evaluation of their Motion in Image Sequences (THEMIS), in conjunction with ICCV2009, 2009. [bib] [pdf]
Powered by bibtexbrowser
Export as PDF or BIB

News

Episodes

ID Subject task video mocap sensor data startframe endframe labels
0-0 S01 stt robot avi raw jpg avi raw jpg avi raw jpg avi raw jpg bvh csv rfid doors 240 1480 csv
0-1 S01 stt robot avi raw jpg avi raw jpg avi raw jpg avi raw jpg bvh csv rfid doors 255 1530 csv
0-2 S01 stt human avi raw jpg avi raw jpg avi raw jpg avi raw jpg bvh csv rfid doors 234 1190 csv
0-3 S02 stt robot avi raw jpg avi raw jpg avi raw jpg avi raw jpg bvh csv rfid doors 306 2206 csv
0-4 S02 stt robot avi raw jpg avi raw jpg avi raw jpg avi raw jpg bvh csv rfid doors 256 1763 csv
0-6 S02 stt robot avi raw jpg avi raw jpg avi raw jpg avi raw jpg bvh csv rfid doors 281 1645 csv*
0-7 S01 stt robot avi raw jpg avi raw jpg avi raw jpg avi raw jpg bvh csv rfid doors 240 1720 csv*
0-8 S01 stt robot avi raw jpg avi raw jpg avi raw jpg avi raw jpg bvh csv rfid doors 237 1520 csv*
0-9 S04 stt robot avi raw jpg avi raw jpg avi raw jpg avi raw jpg bvh csv rfid doors 254 1910 csv*
0-10 S04 stt robot avi raw jpg avi raw jpg avi raw jpg avi raw jpg bvh csv rfid doors 215 1850 csv*
0-11 S03 stt robot avi raw jpg avi raw jpg avi raw jpg avi raw jpg bvh csv rfid doors 290 2100 csv*
0-12 S03 stt human avi raw jpg avi raw jpg avi raw jpg avi raw jpg bvh csv rfid doors 265 1400 csv*
1-0 S01 stt robot avi raw jpg avi raw jpg avi raw jpg avi raw jpg bvh csv rfid doors 309 2320 csv
1-1 S01 stt robot avi raw jpg avi raw jpg avi raw jpg avi raw jpg bvh csv rfid doors 195 2030 csv
1-2 S01 stt robot avi raw jpg avi raw jpg avi raw jpg avi raw jpg bvh csv rfid doors 228 2000 csv
1-3 S01 stt robot avi raw jpg avi raw jpg avi raw jpg avi raw jpg bvh csv rfid doors 255 1988 csv
1-4 S01 stt robot avi raw jpg avi raw jpg avi raw jpg avi raw jpg bvh csv rfid doors 242 1890 csv
1-5 S01 repetitive avi raw jpg avi raw jpg avi raw jpg avi raw jpg bvh csv rfid doors 205 6710 csv
1-6 S03 stt robot avi raw jpg avi raw jpg avi raw jpg avi raw jpg bvh csv rfid doors 239 2280 csv
1-7 S03 stt robot avi raw jpg avi raw jpg avi raw jpg avi raw jpg bvh csv rfid doors 200 2060 csv*

* Labels provided by Angela Yao, ETH Zürich

Tracking starts at the video frame startframe and finishes at endframe, i.e. the first row in the pose files of episode 0-0 corresponds to the video frame 240 in that episode.

The task descriptions read as follows:

The video data is available in three different formats:

Additional Information and Tools