Reasoning engines for ontological and rule-based knowledge bases are becoming increasingly important in areas like the Semantic Web or information integration. It has been acknowledged however that judging the performance of such reasoners and their underlying algorithms is difficult due to the lack of publicly available datasets with large amounts of (real-life) instance data. In this paper we describe a framework and a toolbox for creating such datasets, which is based on extracting instances from the publicly available OpenStreetMap (OSM) geospatial database. To this end, we give a formalization of OSM and present a rule-based language to specify the rules to extract instance data from OSM data. The declarative nature of the approach in combination with external functions and parameters allows one to create several variants of the dataset via small modifications of the specification. We describe a highly flexible toolbox to extract instance data from a given OSM map and a given set of rules. We have employed our tools to create benchmarks that have already been fruitfully used in practice.