Real-time anomaly detection through analyzing sensor signal of semiconductor process with deep learning

  • Period : 2018.1.22 ~
  • Supported by : Samsung Electronics
  • Associated Member : Nayeong Kim, Jungtaek Kim


In semiconductor process, detecting failure of equipment is important. As the number of sensors increases, it is almost impossible to decide failure of equipment with considering every sensors by human. We want to overcome this problem through deep learning. In this research, based on deep generative model, we develop a way of anomaly detection handling multivariate irregular time series data.

Bayesian inference on random simple graphs with power-law degree distributions

  • Period: 2017.05.01 ~ 2018.04.30
  • Supported by: Naver
  • Associated Member: Juho Lee, Jeongtaek Kim, Youngseok Yoon, Minkyo Suh


Many real world graphs are known to be scale-free, which means that the degree distribution of the networks are power-law distributed. However, most of the existing random graph models in machine learning literature do not guarantee that the degree distributions of generated graphs to be power-law. In this project, we propose a novel statistical model that generates random simple graphs with power-law degree distributions. We show that the degree distributions of graphs generated from our model converge to the power-law distribution with arbitrary learnable exponent. We also propose an efficient posterior inference algorithm that scales well for large-scale real world graphs.


Deep Learning for Sudden Braking Detection

  • Period : 2017.08.16~2017.12.31
  • Supported by : Nomad Connection
  • Associated Member: Jiyuu Yi, Wonbin Kim


With the spread of smartphones, research on sudden braking detection based on inertial sensor data is being conducted. Existing methods relied on handcrafted features extracted from segmented time series data. The aim of this project is to seek better algorithms which do not require effort to extract handcrafted features and are more suitable for noisy real-world data. To achieve our goal, we applied deep learning and temporal anomaly detection to sudden braking detection.

Development of Uncertainty Measure System for Open Set Recognition

  • Period : 2017.03.01~2017.10.31
  • Supported by: Software R&D Center, Device Solutions, Samsung Electronics
  • Associated Member: Jungtaek Kim, Minseop Park, Inhyuk Jo, Jiyuu Yi


What really matter to build efficient manufacturing process of semiconductor is to detect new sorts of fault. Of course, if we have those fault data, we could train our model with them but we are not able to gather all kinds of classes in practice. A new algorithm to detect new classes so-called unknown classes given to a conventional deep classifier is in need. We have developed two algorithms: regularizing a classifier with generated data that act as fake unknown classes and measuring uncertainty in time series data similar to temporal anomaly detectionm without using prediction results of classifier(LSTM).


Basic Software Research in Human-level Lifelong Machine Learning (Machine Learning Center)

  • Period : 2014.04.01~2018.02.28
  • Supported by : IITP
  • Associated Member : 5 universities (13 professors and 150 graduate-level students in POSTECH, KAIST, SNU, Yonsei University, and Korea University) and 4 companies (20 researchers in Swink, So-so, Diquest, Daumsoft)


Machine Learning Research Center (MLRC) has been founded in April 2014, to develop an efficient algorithm for lifelong machine learning, which is an extension of multi-task learning paradigm. MLRC is funded by a basic research IT program from Institute for Information & Technology Promotion (IITP). MLRC is broadly divided into the two parts: a research-oriented part and an application part for commercialization. For the research-oriented part, MLRC recruited highly excellent professors and graduate students from POSTECH, KAIST, SNU, Korea, and Yonsei university. For the application part, MLRC invited the four technically innovative companies: Diquest, Daumsoft, Swink, and Soso. MLRC is designed to force researchers from the research and application parts cooperate each other, to transfer the research outcome into a software product effectively. The primary goal of MLRC is to release SMILE (Software for Machine Intelligence with Lifelong machine lEarning) platform. SMILE platform is composed of the four key components: (1) Bayesian learning for lifelong machine learning, (2) deep learning on Spark platform, (3) multi-modal learning, and (4) healthcare system. Each component has been carefully chosen to achieve the purpose of MLRC, which is to develop an efficient algorithm for lifelong machine learning and to transfer the algorithms into a real-world software product. We are trying to do our best to make SMILE platform be the #1 lifelong machine learning platform in the world.

Action Recognition with wearable device

  • Period : 2015.08.~2016.07.
  • Supported by : Samsung Electronics
  • Associated Member : Sojeong Ha, Juho Lee, Jungtaek Kim, Saehoon Kim


Action recognition has been receiving much attraction due to the development of smart devices and the rapid growth of interest in healthcare service. In this research, we develop an action recognition classifier on smart devices. Our classifier especially focus on classifying aerobic exercise, and swimming style with machine learning algorithm. In swimming style, we classify four swimming style (freestyle, backstroke, breaststroke, and the butterfly), and count the number of strokes and turns.

Anomaly detection in high dimensional time-series of broadcast stations

  • Period : 2015.05.~2015.12.
  • Supported by : SK Telecom Corp.
  • Associated Member : Suwon Suh, Jeong-Min Yun


Cellular network has increased its capacity. Along with the expansion, there is a strong need for detecting anomaly and takes preemptive support or maintenance to prevent failure of local network. The number of broadcast station is huge and related data are generated with high velocity, which makes human operators hard to handle this task. In this research, based on Bayesian learning, we develop (1) a way of anomaly detection with representation learning using deep structure handling spatio-temporal data and (2) visualize the high dimensional data for ease of human understanding.

Machine learning methods for multi-modal data analysis

  • Period : 2013.11.~2016.10.
  • Supported by : Ministry of Science, ICT and Future Planning
  • Associated Member : Jeong-Min Yun, Saehoon Kim, Bonkon Koo


The need for multi-modal data analysis for social networks (Twitter, Facebook) and media sharing services (Youtube, Flickr) is increasing. In this research, we develop machine learning methods for multi-modal data analysis. (1) Based on probabilistic graphical models and Bayesian learning, we develop (2) deep networks for multi-modal feature learning and (3) matrix co-factorization for multiple relational data analysis. The product of this research enables the systems for automatic image tagging, content-based image search, and product (i.e. movie, music) recommendation.

Development of brain-computer interface based on integrated information processing

  • Period : 2010.05.~2015.04.
  • Supported by : Ministry of Education, Science and Technology
  • Associated Member : Hyohyeong Kang, Yunjun Nam


We develop a brain computer interface (BCI) system that recognizes multipe commands of patients with quadriplegia or total paralysis, to communicate with a personal assistance system. We develop simultaneous recording and processing techniques for electroencephalogram (EEG), functional near infrared spectroscopy (fNIRS), electromyogram (EMG) and electrooculogram (EOG). Moreover, we develop techniques for recognizing users intentions and integrating various signals for classification. By using those techniques, we develop an interface system with automatic patient bed and robot arms equipped.

Development of models for feature extraction from multi-sensory data

  • Period : 2009.07.~2014.06.
  • Supported by : Systems Bio-Dynamics National Core Research Center
  • Associated Member : Yoonseop Kang


The objective of this project is to develop a numerical model which is capable of extracting useful features for processing multi-sensory data. We consider several specific purposes, including: (1) Developing algorithms for extracting useful features from general multi-view data by adopting approaches from "Learning from multiple sources" using various methods including restricted Boltzmann machine (RBM), manifold integration, and (2) Developing a feature extraction model for multiple sensory data sources, which reflects the knowledge acquired by numerical model of cerebral neural network to understand the dynamic mechanism of massive integrated sensory data processing of human brains.

Probabilistic models for prediction of gene expression patterns

  • Period : 2010.05.01~2013.02.28.
  • Supported by : National Research Foundation
  • Associated Member : Jong Kyoung Kim, Yongsoo Kim, Hwalim Lee


Our goal is to develop a unified model for prediction of gene expression patterns using probabilistic models, and to provide related inference and learning algorithms based on machine learning methods. For this goal we propose subdivided objectives as following: (1) Developing probabilistic models for DNA motif discovery, (2) Discovering the transcription factor candidates in transcription initiation level, (3) Discovering the RNA regulatory motifs in post-transcriptional level, (4) Establishing a unified gene-regulation function for gene expression and testing the model. We will finally develop the unified model which quantitatively predicts gene expression patterns from sequences and provide clues to understand the regulation of gene expression mechanisms in the cell.

Face clustering system with human tagging

  • 2012.04. ~ 2012.12.
  • Samsung Digital Media & Communications Research & Development center
  • Associated member: Juho Lee, Saehoon Kim


We develop a face clustering system that automatically builds a photo album where each person is gathered in a single folder. Our system detects faces from pictures, extracts features and clusters features to be organized in the way that the pictures of the same person belong to the same cluster. To improve the accuracy, we assume that a human gives tags for clusters; he or she indicates the name of the people belong to a subset of clusters. Using these tags, the system recommends the name of untagged clusters or improves clustering by re-ranking based on semi-supervised learning techniques.

User Activity Inference Based on Mobile Sensor Data

  • 2011.05. ~ 2011.11.
  • Samsung Advanced Research Institute
  • Associated member: Saehoon Kim


Our goal is to develop user's behaviors from mobile sensor data, such as, 3-axis accelerometer, gyroscope, and etc. We develop a new feature extraction algorithm, known as Orthogonal Semi-Supervised Non-negative Matrix Factorization (OSSNMF), to extract the hidden features from sensor data. We implement our inference engine based on Android platform.

Renewable Energy Management System for Smart Grid

  • Period : 2010.11.~2012.04.
  • Supported by : POSCO ICT
  • Associated Member : Jiho Yoo, Jeong-Min Yun


The goal of this project is to develop an electricity generation forecasting system for renewable energy, especially solar energy and wind power. Obviously, we don't have control of the amount of the production of these energies, but there is a consensus that their production is highly affected by weather condition. Therefore, we will develop a statistical machine learning based electricity generation forecasting system, in which its prediction is actually based on weather forecasts.

Approximation algorithms for large scale face recognition systems

  • Period : 2010.06.~2011.01.
  • Supported by : ETRI (Electronics and Telecommunications Research Institute)
  • Associated Member : Jeong-Min Yun


The aim of this project is to handle large size data matrices in the large scale face recognition system using well-known approximation techniques such as Nystrom approximation, CUR decomposition, Compact Matrix Decomposition (CMD), and compressed sensing. Furthermore, we will consider more realistic situations of the face recognition system: the face database is frequently updated (added / deleted). To handle these situations, above approximation algorithms and incremental learning algorithms will be combined.

Exploiting musical prior knowledge for musical source separation

  • Period : 2010.6.1 ~2011.01.31
  • Supported by : ETRI (Electronics and Telecommunications Research Institute)
  • Associated Member : Jiho Yoo


Musical source separation problem is a kind of underdetermined source separation problem where there exist more number of sources than the number of available mixture signals. In the underdetermined environment, the classical source separation methods based on the independence between sources cannot be applied. The objective of this project is to develop a way to exploit musical prior knowledge to properly separate music sources. Nonnegative matrix partial co-factorization was developed to separate drum sources and the remaining harmonic sources by exploiting prior drum signal. We will expand the method to further use of musical prior knowledge to enhance the quality of separation.

A development of a software toolbox to analyze data from hot rolling process (열연 사상압연 두께 품질 예측 및 분석 소프트웨어 개발)

  • Period : 2010.01.~2010.11.
  • Supported by : POSCO
  • Associated Member : Sunho Park


The project aims to develop a software toolbox for analyzing data gathered during hot rolling process in Posco. We mainly concern about the prediction of abnormality in the thick of a coil based on the input measurements. The toolbox is needed to solve two problems: 1) to select relevant features from the numerous input attributes; 2) to efficiently solve class imbalance classification. We use non-parametric hypothesis tests for feature selection, cost-sensitive learning methods for class imbalanced classification.

Relational learning with collective nonnegative matrix/tensor factorization for recommendation systems

  • Period : 2008.11.~2010.10.
  • Supported by : Korean Research Foundation Grant
  • Associated Member : Yong-Deok Kim


The objective of this project is to develop collective nonnegative matrix/tensor factorization methods for analyzing relational data. We will apply this methods to recommendation system based on collaborative/contents-based hybrid filtering. Finally we will develop the movie recommendation system which makes full use of the information from the users' rating on movie (which is used for collaborative filtering) and contents of movies such as genre, director, actor, synopsis, review (which are used for contents-based filtering).

A Development of a Handheld System for Motion based Interface Control and Tangible Feedback

  • Period : 2007.03. ~ 2007.11.
  • Supported by : Korea Telecommunication(KT)
  • Associated Member : SangKi Kim


The objective of this project is to develop handheld controller for motion based interactive game play. We use HMM based motion modeling technique with accelerometer data and controller vision tracking data to identify player's motion.

Identifying driver emotion based on physiological signals and developing adaptive driver interfaces

  • Period : 2006.04. ~ 2008.03.
  • Supported by : the Korea Science and Engineering Foundation (KOSEF) grant funded by the Korea government (MOST)
  • Associated Member : Hyekyoung Lee, Sunho Lee, Yongdeok Kim

This work is a cooperative research with Professor Sung H. Han of dept. of Industrial and Management Engineering, POSTECH and Professor Gerard Jounghyun Kim of dept. of Computer Science and Engineering, Korea University.


  • The main objectives of this research are three-fold (1) to computationally model the driver's emotional state using physiological signals and other sensor data, (2) to develop an emotion adaptive driving interface and vehicle control for safe driving, and (3) to develop an emotionally compelling virtual driving simulator and validate the developed model, techniques, and the overall safety effect.
  • Modeling and recognizing the human's emotional state is one of the important interdisciplinary research topics drawing interests from communities of AI, HCI, cognitive science, and signal processing in the international arena. One of the natural applications is in the emotion adaptive interface which can have a huge implication in safety critical interactive systems. As an emerging area with only scattered research results, a comprehensive and consolidated research is needed. An interdisciplinary team with expertise in physiological signals, human computer interaction, artificial intelligence and signal processing, and virtual training have come together to conduct the following research in three years: (1) to computationally model driver's emotional state using physiological signals and other sensor data, (2) to develop an emotion adaptive driving interface and vehicle control for safe driving, and (3) to develop an emotionally compelling virtual driving simulator and validate the model, techniques, and the safety effect.

Vision Interface for Mobile Environment

  • Period : 2005.09. ~ 2009.08.
  • Supported by : Ministry of Information and Communication Associated
  • Associated Member : Yong-Deok Kim and KyeHyeon Kim


The project is a service part of the next generation mobile technologies researched by CMEST(Center for Mobile Embedded Software Technology, Our goal is developing a face recognition methods which 1) are robust to illumination and pose changes 2) require small computational cost. We study on sequential and semi-supervised learning.

Context-awareness for ubiquitous services

  • Period : 2005.09. ~ 2009.08.
  • Supported by : Ministry of Information and Communication Associated
  • Associated Member : KyeHyeon Kim, Yongdeok Kim


The project is a service part of the next generation mobile technologies researched by CMEST(Center for Mobile Embedded Software Technology, In this year (2006.10. ~ 2007.09), we study on semi-supervised sensor fusion methods to automatically integrate primitive information from sensors into meaningful contexts. Semi-supervised learning obtains a mapping from high dimensional space onto user-specified coordinates, by (1) recognizing manifolds from unlabeled data points and (2) unfolding them to desired coordinates using few labeled points.

Statistical/probabilistic machine learning methods for cellular network analysis

  • Period : 2004.12.~2011.08.
  • Supported by : Systems Bio-Dynamics National Core Research Center
  • Associated Member : JongKyoung Kim, Yongsoo Kim


The objective of this project is to infer cellular networks using probabilistic graphical models. We consider several specific purposes, including: (1) New theoretical methods to characterize the network topology, (2) Automated and objective algorithms to identify modules or clusters and their biological functions based on a cellular network topology, (3) Systematic integration of comprehensive and inhomogeneous data sets, (4) Systematic integration of three major cellular networks ? genome, proteome and metabolome, and (5) Foundation of mathematical modeling of the dynamics of cellular network by providing core network structure.


Source Separation and Restoration Based on Human Auditory Models

  • Period : 2004.07. ~ 2008.03.
  • Supported by : Ministry of Commerce, Industry and Energy
  • Associated Member : SunHo Park, Jiho Yoo


Human ear works well even in the case of noisy and mixed signals. So, in this project we will model the human brain system mechanism to represent and process the auditory signal which leads to the generative model and representation model for sound sources. Based on these models, we will develop new algorithms for source separation and restoration on mixed speech data with noise and reverberation environment. This approach is expected to be helpful in the other brain research and the results could be applied in the electronics industry related to sound facilities.

Design of high density array EEG experiments and development of on-line EEG analysis software for brain computer interface

  • Period : 2004.05.~ 2007.04.
  • Supported by : KOSEF
  • Associated Member : Hyekyoung Lee


Brain computer interface provides a new communication channel between human brain and computer. The goal of our project is the development of software for cursor movement to up, down, left or right by detecting the human intention. Human intention can be obtained by measuring and analyzing EEG. We design the stimuli to measure valid EEG and research its analysis algorithm based on machine learning technique.

ICA Algorithms for High-Dimensional Data

  • Supported by : POSTECH Research Fund


We will develop a new optimization technique on Riemannian manifold and apply it to ICA. In addition, we exploit a special structure of the equation for memory-efficiency in the algorithm, so that the algorithm is useful, especially for the case of high-dimensional data.

Continuous Speech Recognition System Based on Selective Attention

  • Supported by : Ministry of Science and Technology


To develop the preprocessing method for artificial speech recognition system, that is, cocktail party processor. This method can localize the source sounds and have a cocktail party effect by selective attention. We will develop the preprocessing method for artificial speech recognition system, that is, cocktail party processor. This method can localize the source sounds and have a cocktail party effect by selective attention.

CA Algorithm and its VLSI Implementation for Efficient Signal Separation

  • Supported by : Ministry of Science and Technology


To develope the application of BSS/ICA algorithm into the real data, for example, EEG analysis or Cocktail Party problem and DNA microarray data analysis.

EEG Signal Analysis using ICA

  • Supported by : ETRI (Electronics and Telecommunications Research Institute)


To apply the ICA algorithm into EEG data analysis as preprocessor or feature extractor. And then we designed the whole system for EEG data analysis (Preprocessing, Feature Extractor, Classifier).

Technical development in blind source separation for manufacturing audio-broadcast contents based on the object

  • Supported by : ETRI (Electronics and Telecommunications Research Institute)


Blind source Separation is a fundamental problem to separate mixing signals such as sounds of speech and musical instruments into original sources. In the real situations such as radio communication, telemetry, radar, sonar, and speech context, sources are often nonstationary or quasi-cyclostationary, so practically observed signals are usually convolutive mixtures. Through this project, we want to develop a new efficient algorithm for the BSS problem using relative trust-region method in the case of Joint diagonalization for Convolutive Mixture. Moreover, we will suggest a new algorithm for solving a permutation ambiguity of a frequency domain approach in the case of convolutive mixture. because a permutation ambiguity gives a large effect on the performance of convolutive blind source separation. This approach will produce new algorithms that have faster convergence rate and better performance than L. Parras algorithm for convolutive blind source separation. Those algorithms will be used to manufacture audio-broadcast contents based on the object.