We are developing technology for generating English textual summaries of time-series data, in three domains: weather forecasts, gas-turbine sensor readings, and hospital intensive care data. Our weather-forecast generator is currently operational and being used daily by a meteorological company. We generate summaries in three steps: (a) selecting the most important trends and patterns to communicate; (b) mapping these patterns onto words and phrases; and (c) generating actual texts based on these words and phrases. In this paper we focus on the first step, (a), selecting the information to communicate, and describe how we perform this using modified versions of standard data analysis algorithms such as segmentation. The modifications arose out of empirical work with users and domain experts, and in fact can all be regarded as applications of the Gricean maxims of Quality, Quantity, Relevance, and Manner, which describe how a cooperative speaker should behave in order to help a hearer correctly interpret a text. The Gricean maxims are perhaps a key element of adapting data analysis algorithms for effective communication of information to human users, and should be considered by other researchers interested in communicating data to human users.
|Title of host publication||Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003)|
|Publication status||Published - Aug 2003|