We present an approach That combines passive scene understanding with object search in order To recognize scenes in indoor environments That cannot be perceived from a single point of view. Passive scene recognition is performed using Implicit Shape Models based on spatial relations between objects. ISMs, a variant of The Generalized Hough Transform, are extended To describe scenes as sets of objects with relations lying between Them. Relations are expressed as six-degree-of-freedom (DoF) relative object poses. They are extracted from sensor recordings of human demonstrations of actions usually Taking place in The corresponding scene. In a scene ISMs solely represent relations of n objects Towards a common reference. Violations of other relations are not detectable. To overcome This limitation, we extend our scene model, using hierarchical agglomerative clustering, To a binary Tree consisting of ISMs. Active scene recognition aims To simultaneously detect present scenes and look for objects These scenes consist of. For a pivoting stereo camera rig, we achieve This by performing recognition with ISMs in an object search loop using next-best-view (NBV) estimates. A criterion, on which we greedily choose views The rig shall adopt next, is The confidence To detect objects in Them. In each step during The search, confidences on potential positions of objects, not found yet, are calculated based on The best available scene hypothesis. This is done by reversing The principle of ISMs and using spatial relations To predict potential object positions starting from The objects already detected.
|Number of pages||7|
|Journal||Proceedings - IEEE International Conference on Robotics and Automation|
|Publication status||Published - 22 Sep 2014|
|Event||2014 IEEE International Conference on Robotics and Automation, ICRA 2014 - Hong Kong, China|
Duration: 31 May 2014 → 7 Jun 2014