Data Science at K-State
The Kansas State University Computer Science (CS) Department is a recognized leader in applied machine learning, which places CS at a confluence of relevant Data Science research areas such as intersecting multidisciplinary approaches to predictive analytics, business intelligence, data mining and visualization using heterogeneous large data. Our core strengths include approaches to the analysis of natural language text, linked, spatial and temporal data, security data, and biological data. CS has two faculty members focused on machine learning applications. Dr. Doina Caragea directs the Laboratory for Machine Learning and Data Science (MLDS), while Dr. William H. Hsu directs the Laboratory for Knowledge Discovery and Databases (KDD).
The MLDS laboratory concentrates on machine learning algorithms and tools for gaining insights from large data sets related to social media, biology, security and user behavior, with applications in crisis informatics, bioinformatics, security informatics, and recommender systems. Each of these applications present a common challenge to traditional machine learning due to the scarcity of labeled data and the large amounts of unlabeled data. To address this challenge, the MLDS laboratory has designed semi-supervised and domain adaptation approaches that leverage unlabeled data for the target problem and labeled data available for related problems. Specifically, the MLDS laboratory has designed domain adaptation algorithms for classifying crisis-related tweets to help disaster response teams sift through large amounts of data generated by affected individuals. They have also designed semi-supervised and domain adaptation methods for genome annotation and semi-supervised methods for Android malware detection. The MLDS laboratory has also studied cross-domain recommender systems that improve recommendation accuracy in one domain by leveraging knowledge from additional domains with implicit feedback.Selected publications
- Li, H., Caragea, D., Caragea, C. and Herndon, N. (2017). Disaster Response Aided by Tweet Classification with a Domain Adaptation Approach. In: Journal of Contingencies and Crisis Management (JCCM), Special Issue on HCI in Critical Systems. In press.
- Herndon, N., Caragea, D. (2016). An evaluation of approaches for using unlabeled data with domain adaptation. In: Network Modeling Analysis in Health Informatics and Bioinformatics. 5(25):1-12, 2016.
- Stanescu, A., Tangirala, K. and Caragea, D. (2016). Predicting Alternatively Spliced Exons Using Semi-supervised Learning. In: International Journal on Data Mining and Bioinformatics (IJDMB) Vol. 14, No. 1, pages 1-21.
- DeLoach, J., Caragea, D., Ou, Xinming (2016). Android Malware Detection with Weak Ground Truth Data. In: Proceedings of the 3rd International Workshop on Pattern Mining and Application of Big Data (BigPMA). In conjunction with the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), Washington DC.
- Stanescu, A., and Caragea, D. (2015). An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets. In: BMC Systems Biology supplement. 9(Suppl 5):S1.
- Parimi, R. and Caragea, D. (2015). Cross-Domain Matrix Factorization for Multiple Implicit-Feedback Domains. In: Proceedings of the International Workshop on Machine learning, Optimization and big Data (MOD 2015), pages 80-92, Taormina - Sicily, Italy.
The KDD laboratory emphasizes machine learning, data mining and knowledge discovery from large spatial and temporal databases, human-computer intelligent interaction, and high-performance computation in learning and optimization. KDD researchers systematically transform analytical learning problems using information theoretic and probabilistic criteria so that the most appropriate machine learning methods may be applied. A major challenge in this area is the design of unsupervised learning and bias optimization methods that effectively decompose learning tasks. By addressing the high-level control of inductive learning in a statistically sound fashion, techniques for both model selection and model integration (as practiced in multimodal sensor fusion) can be improved significantly. The KDD laboratory has developed and applied such approaches to multistrategy learning and interesting analytical problems in the areas of decision support and control automation. The goal of this work is to understand the interaction between systems that adapt or learn and their users. Important examples of this interaction include data visualization in intelligent displays, software agents for distributed high-performance computation and information retrieval, and virtual environments for simulation and computer-assisted instruction.
- CIS 530: Introduction to Artificial Intelligence
- CIS 560: Database System Concepts
- CIS 590: Top/Introduction to Genomics and Bioinformatics
- CIS 730: Artificial Intelligence
- CIS 732: Machine Learning and Pattern Recognition
- CIS 734: Introduction to Genomics an Bioinformatics
- CIS 798: Data Base Management Systems
- CIS 833: Information Retrieval and Text Mining
Kansas State University
- National Agricultural Biosecurity Center
- Ecological Genomics Group
- Arthropod Genomics Center
- Bioinformatics Center
- K-State EPICENTER