Author : John Robert Yaros
Publisher :
ISBN 13 :
Total Pages : 121 pages
Book Rating : 4.:/5 (95 download)
Book Synopsis Data Mining Perspectives on Equity Similarity Prediction by : John Robert Yaros
Download or read book Data Mining Perspectives on Equity Similarity Prediction written by John Robert Yaros and published by . This book was released on 2014 with total page 121 pages. Available in PDF, EPUB and Kindle. Book excerpt: Accurate identification of similar companies is invaluable to the financial and investing communities. To perform relative valuation, a key step is identifying a ``peer group'' containing the most similar companies. To hedge a stock portfolio, best results are often achieved by selling short a hedge portfolio with future time series of returns most similar to the original portfolio - generally those with the most similar companies. To achieve diversification, a common approach is to avoid portfolios containing any stocks that are highly similar to other stocks in the same portfolio. Yet, the identification of similar companies is often left to hands of single experts who devise sector/industry taxonomies or other structures to represent and quantify similarity. Little attention (at least in the public domain) has been given to the potential that may lie in data-mining techniques. In fact, much existing research considers sector/industry taxonomies to be ground truth and quantifies results of clustering algorithms by their agreement with the taxonomies. This dissertation takes an alternate view that proper identification of relevant features and proper application of machine learning and data mining techniques can achieve results that rival or even exceed the expert approaches. Two representations of similarity are considered: 1) a pairwise approach, wherein a value is computed to quantify the similarity for each pair of companies, and 2) a partition approach analogous to sector/industry taxonomies, wherein the universe of stocks is split into distinct groups such that the companies within each group are highly related to each other. To generate results for each representation, we consider three main datasets: historical stock-return correlation, equity-analyst coverage and news article co-occurrences. The latter two have hardly been considered previously. New algorithmic techniques are devised that operate on these datasets. In particular, a hypergraph partitioning algorithm is designed for imbalanced datasets, with implications beyond company similarity prediction, especially in consensus clustering.