Boris Mirkin




His background is in Theoretical Computer Sciences. However, after completing his Ph.D. in abstract automata and formal languages, he shifted to the area of data analysis and classification which, for quite a while, was considered by computer scientists as part of Statistics and by statisticians as not belonging in the Sciences at all. Things have changed with the advent of modern computer systems that are capable of processing really massive data. Nowadays, his area of research has become part of the Computer Science under the title of data mining and knowledge discovery.

His earlier work, on revealing order and cluster structures in qualitative data, is reflected in his monographs: Group Choice (in Russian 1974, English translation 1979, Wiley Interscience), Graphs and Genes (with S.N. Rodin, in Russian 1977, English translation 1984, Springer), and Analysis of Qualitative Attributes and Structures (in Russian 1976, 1980).

His later work focuses on the field of cluster analysis considered as data driven classification, which is partly described in the monograph, Mathematical Classification and Clustering, 1996, written while at DIMACS, Rutgers University, USA. He maintain that two problems - revealing clusters in data and describing clusters/groups - are the core of data driven classification. Traditionally, only the former is considered as clustering. To deal with these problems, He assumes that it must be possible to use the cluster structure found in data to approximately reconstruct the original data; and the quality of the cluster structuring should be evaluated according to the quality of the reconstruction. This idea leads to a class of methods and algorithms that have proven successful in theory as well as in applications such as biomolecular analysis, industrial organizations, large-scale surveys, etc. He shows how this view, referred to as the ``data recovery approach'', can be applied to two most popular methods, K-Means and Ward clustering, leading to a consistent theory in data analysis, that provides a wealth of mutually compatible methods and interpretation aids, in his latest book, Clustering for Data Mining: A Data Recovery Approach, Chapman & Hall/CRC2005.

He has spent some time by travelling and working with colleagues in France (1991-1993), USA (1993-1998), and Germany (1996-1999); this gave him a unique opportunity to update his knowledge in modern developments and enhance his understanding of data driven classification problems.

More info:

Customize your search:

E.g., 2017-08-19
E.g., 2017-08-19
E.g., 2017-08-19
Apr 21st 2014

Learn both theory and application for basic methods that have been invented either for developing new concepts – principal components or clusters, or for finding interesting correlations – regression and classification. This is preceded by a thorough analysis of 1D and 2D data.

No votes yet