For a while now, machine learning has moved to analyzing the feature space with transformations, selection and engineering of features. Feature engineering is the process of generating new features that are not a part of the original set of raw features. This is towards better analysis of the data, since raw features may not be ideally suited for classification, clustering and other downstream analysis tasks.
This work was done in collaboration with Boeing and RSA labs. We are also working with IBM on other kinds of feature engineering for different kinds of applications. We are also working on unsupervised methods for feature engineering.
Our approach to feature engineering relies on Booleanizing the features in raw data. We then apply operations on the features that are akin to Boolean logic synthesis. We generated Fourier analysis of the Boolean functions, whose coefficients provide the predictability of a feature. We applied this to enterprise log data and detected intrusions. Interestingly, the method consistently provides much better recall than any other classifiers, indicating its value in imbalanced data sets, with low representation of positive instances in the data.
- Jiayi Duan, Ziheng Zheng, Alina Oprea, Shobha Vasudevan. Feature Engineering for Detecting Compromises in Enterprise Log Data. IEEE International Conference on Big Data (IEEE BIGDATA) 2018. To appear.