Using Probabilistic Approach to Joint Clustering and Statistical Inference: Analytics for Big Investment Data

UMMS Affiliation

Department of Quantitative Health Sciences



Document Type





This paper proposes a Contrarian Probabilistic Model (CPM) to evaluate the effectiveness of contrarians' investment in preferred stocks using big data from Tradeline. CPM accommodates the unique features of investment data which are often correlated, nested, heterogeneous, non-normal with missing values. The clustering and statistical inference are integrated in CPM, which enables joint investment behavior trajectory pattern recognition and risk analyses based on the entire variance-covariance structure between and within clusters. The empirical study using CPM provides a finer and comprehensive evaluation of contrarian investment in preferred stocks. Two distinctive investment behavior trajectory clusters were identified, showing a few high-risk-seeking contrarians achieved high returns over five year long-term investment, while the majority of contrarians did not outperform glamour stockholders in preferred stock investment. Although CPM was developed using historical data, it could be developed into an analytical tool for online near real time big investment data analyses.


Paper presented at the 2015 IEEE International Conference on Big Data (Big Data), held Oct. 29, 2015-Nov. 1, 2015, Santa Clara, CA, USA.

Citation: Fang H, Wang H, Wang C, Daneshmand, M. Using Probabilistic Approach to Joint Clustering and Statistical Inference: Big Investment Data. IEEE BigData 2015. 2916-2918, DOI: 10.1109/BigData.2015.7364121.