Keyword [Behavior]
Ehsani K, Bagherinezhad H, Redmon J, et al. Who Let The Dogs Out? Modeling Dog Behavior From Visual Data[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 4051-4060.
1. Overview
1.1. Motivation
- most cv task related to visual intelligence
In this paper, it directly model a visullay intelligent agent
- input visual information, predict actions of the agent
- DECADE dataset. ego-centric videos from a dog’s perspective
how the dog acts
how the dog plans
learn from a dog
the task of walkable surface estimation and scen classification by using this dog modeling task as representation learning.
1.2. Definition of the Problems
- understanding visual data to the extent that an agent can take actions and perform tasks in the visual world
1.3. Dataset
- mount Intertial Measurement Units (IMU) on the joints and body of the dog. record the absolute position and calculate the relative angle of the dog’s main limbs and body (angular displacement represented as a 4 dimensional quaternion vector)
- mount a camera on dog’s head. (380 video clips; 24500 frames, 21000 for training, 1500 for validation and 2000 for testing; various indoor and out door scenes, more than 50 different location)
- the differences of the angular displacements between two consecutive frames represents the action of the dog in that timestep
- connect all IMU to the same embedded system (Raspberry pi 3.0)
- the rate of the joint movement readings and video frames are different. perform interpolation and averaging to compute the absolute angular orientation for each frame
- use K-means clustering to quantize the action space. formulate the problems as classification rather than regression
1.4. Related Work
- Visual Prediction. (activity forecast, people intent)
- Sequence to Sequence Models
- Ego-centric Vision
- Ego-motion estimation
- Action Inference & Planning
- Inverse Reinforcement Learning
- Self-supervision
1.5. Act like a Dong
- input. a series of frame (1~t)
- output. a series action (t+1~N)
- ResNet’s weights are shared.
1.6. Plan like a Dog
- Input. two frames (1, N)
- Output. a series action (2, N-1)
1.7. Learn from a Dog
Compare the pre-trained ResNet-18 (input two frames [t, t+1], predict the action between [t, t+1]) on DECADE and ImageNet.
1.8. Future Work
- variety input. touch, smell
- collect data from multiple dogs. evaluate generation across dogs
1.9. Experiments
1.9.1. Metric
- class accuracy
- perplexity