Keynote Speakers | DeSE - Developments in E-Systems Engineering

Mixture Models Estimation. Application to Data Clustering and Classification

Hani Hamdan

SUPELEC, Department of Signal Processing and Electronic Systems, FRANCE

Peter Wellstead - formal2

Dr. HAMDAN Hani is a Professor in the Department of Signal Processing and Electronic Systems of the Ecole Supérieure d’Electricité (SUPELEC) since 2008. He received his Engineering Diploma in Electricity and Electronics option Computer Science and Telecommunication in 2000 from the Faculté de Génie of the Université Libanaise in Beirut, Lebanon, his Masters in Industrial control in 2001 from the Faculté de Génie of the Université Libanaise in collaboration with the Université de Technologie de Compiègne (UTC), France, and earned his PhD in Systems and Information Technologies in 2005 from the UTC. Between March 2002 and August 2005, he worked as a Research Engineer in Computer Science at the pole ICM (Ingénierie, Contrôles non destructifs et Mesure) of CETIM (CEntre Technique des Industries Mécaniques) in Senlis, France. During this period, he developed Clustering and Classification methods for real-time monitoring by acoustic emission of pressure vessels. From 2004 to 2005, he was President-elect of the Mouvement Associatif Doctoral of the UTC. From 2005 to 2006, he was Researcher at CNRS (French National Center for Scientific Research), where he worked on the analysis and synthesis of speech. From 2006 to 2008, he was an Assistant Professor at the Université Paris-Nord (Paris 13), where he conducted research in classification, automatic control, and data analysis. He is author or co-author of more than 50 scientific papers. He is involved in the International Scientific Committees of many International Conferences such as the International Conference on Advances in Computing, Communications and Informatics (ICACCI-2014), Delhi, India, the IEEE International Conference on Ultra-Wideband (ICUWB 2014), Paris, France, the International Multi-Conference on Computing in the Global Information Technology (ICCGI 2014), Seville, Spain, the International Conference on Computational Logics, Algebras, Programming, Tools, and Benchmarking (COMPUTATION TOOLS 2014), Venice, Italy, the International Conference on Developments in eSystems Engineering (DeSE 2014), Paphos, Cyprus, the International Conference on Advanced Technologies for Communications (ATC 2014), Hanoi, Vietnam, the IEEE International Conference on Systems, Man, and Cybernetics (IEEE SMC 2014), San Diego, California, USA, the IEEE International Symposium on Applied Computational Intelligence and Informatics (IEEE SACI 2014), Timisoara, Romania, the International Conference on Bio-inspired Information and Communications Technologies (BICT 2014), Boston, Massachusetts, USA. His current research interests include signal processing, automatic control, and pattern recognition.

Abstract: Mixture models are famous in statistics and data analysis. Their use is particularly interesting for data clustering where they can offer a powerful, flexible, and interpretable framework for many applications. In this context, it is important to estimate the parameters of a mixture model from observed data. Many approaches were proposed in the literature for this purpose. In particular, two maximum likelihood approaches are commonly used: the mixture approach and the classification approach. Loosely speaking, the mixture approach is aimed to maximize the likelihood over the mixture parameters, whereas the classification approach is aimed to maximize the likelihood over the mixture parameters and over the identifying labels of the mixture component origin for each point. These approaches are suitable when clusters may present several constraints (different proportions, volumes, orientations, and shapes). In this talk, the fundamental and basic concepts of mixture models estimation will be presented. A special attention will be paid for the Expectation-Maximization (EM) algorithm and the Classification EM (CEM) algorithm. A feedback on the implementation and use of these algorithms will be provided. In addition, some hard and open problems (big amount of data, new data structures and types, imprecision and variability of data, validation of the obtained partition structure, model selection, choice of the clusters number, etc.) and some promising solutions will be suggested. In order to show the usefulness of the presented approaches, some examples from real applications will be illustrated.