Welcome to Pengyu Hong's Homepage |
||
My group is carrying out interdisciplinary research to advance Informatics in the fields of biology, biochemistry, e-healthcare, fintech, intelligent education, and mobile social. Some of our research results are summarized below. More details on intelligent education (with a startup) and mobile social (online social content analysis) will follow. On machine learning side, I am continue working on Automatic Spatial and Temporal Pattern Modeling and Detection, which I believe will come to the front stage and deliver useful tools due to the recent rise of Deep Learning and the availability of related large datasets. I have also done some research on Human-Computer Interaction research. → → → |
||
Glycomics (A New Frontier) We are developing efficient algorithms to accurately de novo sequence glycans from their tandem mass spectra with no or little help from an oligosaccharide database. Glycosylation is a common modification by which a glycan (or oligosaccharide) is covalently attached to a target macromolecule (such as, proteins and lipids). This modification serves various important functions (such as, protein folding, protein stability, cell-cell/matrix/environment interaction, immunity, and so on) and is one of the essential factors in optimizing many glycoprotein-based drugs. Hence it is important to profile the changes of glycans under different conditions. Glycan structure analysis is a challenging and essential task in biological and biomedical research. We have developed an advanced algorithm, named GlycoDeNovo (executalbe software, source codes), for reconstructing glycans from data of pure samples and theoretically proof that this problem can be solved with polynomial complexity rather than NP in conventional thinking. GlycoDeNovo features a Machine Learning based IonClassifier which can effectively rank the reconstructed topology candidates (experimental results showed that IonClassifier is essential!). We have also advanced into an uncharted landscape where we need to reconstruct glycans from data of mixture samples and have obtained excellent results under certain experimental conditions. This research is a collaboration with Boston University Medical School (NIH P41). |
||
Intelligent Education Knowledge tracing is an important task in Intelligent Education. It aims to quantify how well students master the knowledge (tags) being tutored by analyzing their learning activities (e.g., coursework interaction data). It plays an important role in intelligent tutoring systems. In this paper, we cast knowledge tracing as a performance-prediction problem, which predicts the performances of students on exercises labeled by multiple knowledge tags, and propose to tackle this problem using Deep Learning techniques. We applied several Recurrent Neural Network architectures to model complex representations of student knowledge and predict future performances of students. Our experimental results demonstrate that the neural network architecture based on stacked Long Short Term Memory and residual connections give superior predictions on the future performances of learners. To model how a student answered a question that contains multiple knowledge tags, we explored three different setups to map knowledge states to prediction. Keywords: Recurrent Neural Network, LSTM, Stacked LSTM, Residual Connection. |
||
BioImage Informatics
|
||
Computational Systems Biology Animal organogenesis is a complicate process controlled by a network of intercellular signaling, intracellular signal transduction, and transcriptional regulation. The construction of quantitative models for this network can provide insights into the underlying biological mechanisms. We have developed a mathematical model to model multi-level biological networks (spanning molecular level to tissue level) that govern C. elegans vulval induction. The model was automatically learned from heterogeneous biological data and is capable of simulating vulval induction under various different genetic conditions (Sun X. and Hong P. 2007, Sun X. and Hong P. 2009). At cellular level, together with our biological collaborators, we elucidated the topologies of complex molecular networks (Breitkopf S.B., et al. 2016, Kwon Y., et al. 2013, Sun and Hong 2013, Friedman AA. et al. 2011, Vinayagam A. et al. 2016), which advanced our knowledge of complex biological processes and related diseases. |
||
Medical Informatics We applied machine learning techniques to detect medical concepts and the relationships among them from the free-text (unstructured) clinical narratives in electronic healthcare records (Anick et al. presented at I2B2 Challenge 2010 and cited in Uzuner et al. 2011, Warner et al. 2011). In another study, we investigated various psychiatric attributes in the categories of depression, anxiety and insomnia by comparing the patients with liver cirrhosis, patients with non-cirrhosis chronic diseases, and healthy people. A set of psychiatric attributes were found to be uniquely associated with cirrhosis. Literature shows that these attributes are related to some mechanisms underlying liver cirrhosis, such as zinc/selenium deficiency and vitamin D deficiency. In addition, we investigated the relationships between those psychiatric attributes unique to cirrhosis and the “four diagnostic methods” of Traditional Chinese Medicine (TCM), and discovered interesting correlations. The above two findings can shed new light on the diagnosis and prognosis of liver cirrhosis, and the integration of Traditional Chinese and Western Medicine in the future Bei et al. 2013). |
||
Other Bioinformatics Efforts We developed memory efficient and fast algorithms for accurately calculating the isotopic fine structures of large molecules (Li et al. 2008, Li et al. 2010). The algorithms will greatly increase our capability of interpreting experimental mass spectrometry data. We also developed a new data processing pipeline for top-down mass spectrometry (Karabacak N.M. 2009). We have developed a boosting approach for building better TF-DNA binding models (Hong et al. 2005), which are important for differentiating the true binding targets of transcription factors from spurious ones and understanding gene regulation. |