**E. Seregina, "A Basket Half Full: Sparse Portfolios"**

*(Under Review)*

T

he existing approaches to sparse wealth allocations (1) are suboptimal due to the bias induced by $\ell_1$-penalty; (2) require the number of assets to be less than the sample size; (3) do not model factor structure of stock returns in high dimensions. We address these shortcomings and develop a novel strategy which produces unbiased and consistent sparse allocations. We demonstrate that: (1) failing to correct for the bias leads to low out-of-sample portfolio return; (2) only sparse portfolios achieved positive cumulative return during several economic downturns, including the dot-com bubble of 2000, the financial crisis of 2007-09, and COVID-19 outbreak.

**TH Lee, E. Seregina, "Optimal Portfolio Using Factor Graphical Lasso"**

*(Under Review)*

G

raphical models are a powerful tool to estimate a high-dimensional inverse covariance (*precision*) matrix, which has been applied for portfolio allocation problem. The assumption made by these models is a sparsity of the precision matrix. However, when the stock returns are driven by the common factors, this assumption does not hold. Our paper develops a framework for estimating a high-dimensional precision matrix which combines the benefits of exploring the factor structure of the stock returns and the sparsity of the precision matrix of the factor-adjusted returns. The proposed algorithm is called

*Factor Graphical Lasso*(FGL). We study a high-dimensional portfolio allocation problem when the asset returns admit the approximate factor model. In high dimensions, when the number of assets is large relative to the sample size, the sample covariance matrix of the excess returns is subject to the large estimation uncertainty, which leads to unstable solutions for portfolio weights. To resolve this issue, we consider the decomposition of low-rank and sparse components. This strategy allows us to consistently estimate the optimal portfolio in high dimensions, even when the covariance matrix is ill-behaved. We establish consistency of the portfolio weights in a high-dimensional setting

*without assuming sparsity on the covariance or precision matrix of stock returns*. Our theoretical results and simulations demonstrate that FGL is robust to heavy-tailed distributions, which makes our method suitable for financial applications. The empirical application uses daily and monthly data for the constituents of the S&P500 to demonstrate superior performance of FGL compared to the equal-weighted portfolio, index and some prominent precision and covariance-based estimators.

**TH Lee, E. Seregina, "Learning from Forecast Errors: A New Approach to Forecast Combination"**

*(Under Review)*

T

his paper studies forecast combination (as an expert system) using the precision matrix estimation of forecast errors when the latter admit the approximate factor model. This approach incorporates the facts that experts often use common sets of information and hence they tend to make common mistakes. This premise is evidenced in many empirical results. For example, the European Central Bank's Survey of Professional Forecasters on Euro-area real GDP growth demonstrates that the professional forecasters tend to*jointly*understate or overstate GDP growth. Motivated by this stylized fact, we develop a novel framework which exploits the factor structure of forecast errors and the sparsity in the precision matrix of the idiosyncratic components of the forecast errors. The proposed algorithm is called

*Factor Graphical Model*(FGM). Our approach overcomes the challenge of obtaining the forecasts that contain unique information, which was shown to be necessary to achieve a "winning" forecast combination. In simulation, we demonstrate the merits of the FGM in comparison with the equal-weighted forecasts and the standard graphical methods in the literature. An empirical application to forecasting macroeconomic time series in big data environment highlights the advantage of the FGM approach in comparison with the existing methods of forecast combination.

**V. Kutateladze, E. Seregina, "Fast and Efficient Data Science Techniques for Covid-19 Group Testing"**

*(Journal of Data Science (2021), 1-19)*

R

esearchers and public officials tend to agree that until a vaccine is developed, stopping SARS-CoV-2 transmission is the name of the game. Testing is the key to preventing the spread, especially by asymptomatic individuals. With testing capacity restricted, group testing is an appealing alternative for comprehensive screening and has recently received FDA emergency authorization. This technique tests pools of individual samples, thereby often requiring fewer testing resources while potentially providing multiple folds of speedup. We approach group testing from a data science perspective and offer two contributions. First, we provide an extensive empirical comparison of modern group testing techniques based on simulated and real, laboratory data. Second, we propose a simple one-round method based on $\ell_1$-norm sparse recovery, which outperforms current state-of-the-art approaches at certain disease prevalence rates.

**E. Seregina, "Time-Varying Factor Graphical Models"**

A

t the beginning of COVID-19 outbreak, stock market was volatile, exhibiting sudden trend switches. As a result, using a long history of the past performance leads to large estimation errors. One efficient way to overcome this difficulty is to use the information extracted from higher frequency returns, e.g. daily data, to make longer term predictions of lower frequency returns, e.g. monthly data. Such strategy naturally augments the information set for the monthly data leading to decreased estimation errors and improved performance. This paper proposes to estimate the lower frequency precision matrix using higher frequency returns. In addition, we allow the dependence structure between stocks to change over time, which makes the proposed model more flexible. We call the proposed algoritm "Time-Varying Factor Graphical Model". Our model is solved using the alternating directions method of multipliers (ADMM), we derive closed-form solutions for the ADMM subproblems to further speed up the runtime.

**E. Seregina, "Projected Factor Graphical Models"**

F

undamental analysis and the mean-variance portfolio optimization are traditionally viewed as two alternative approaches to portfolio allocation. In this paper we develop a novel precision matrix estimator that integrates these approaches. The proposed algorithm is called "Projected Factor Graphical Models". Our method allows to incorporate the information on the companies’ fundamentals, such as current earnings, growth in net operating assets and growth in financing, when deciding which stocks to include in the portfolio and how much to invest in these stocks. Using the fact that, at some point, the stock’s market value will converge to its intrinsic value, we use the partial equilibrium returns model that governs the behavior of stock returns as a linear function of firm’s characteristics. The latter is used to construct a precision matrix estimator for portfolio weights.