Introduction to Programming with Scientific Applications

Aarhus University, Department of Computer Science

Project II - Portfolio optimization

We consider the problem of choosing a long term stock portfolio, given a set of stocks and their price over some period under risk aversion parameter γ > 0.

Assume there are m stocks to be considered. The portfolio will be represented by a column vector w ∈ ℝm, such that ∑i=1..m wi = 1. If wi > 0, you use a fraction wi of your total money to buy the i‘th stock, while wi < 0 represent shorting that stock. In both cases we assume the stock is bought/shorted for the entire period.

Let pj,i represent the price of the i‘th stock at time step j. If there are n + 1 time steps, then p ∈ ℝ(n+1)×m is a matrix.

We let r ∈ ℝn×m be the matrix, where rj,i represents the fractional reward of stock i at time step j, i.e. rj,i = (pj+1,i − pj,i) / pj,i for 1 ≤ j ≤ n.

By rj we denote the j‘th row of r, viewed as a column vector (rj,1, …, rj,m).

We make the (unrealistic) assumption that we can model r by a random variable, distributed as a multivariate Gaussian, with estimated means

μ ≃ 1/n · ∑j=1..n rj

and estimated covariance matrix

Σ ≃ 1 / n · ∑j=1..n [(rj − μ)(rj − μ)T]

Note that μi and Σi,i are the estimated mean and variance for stock i.

The distribution of returns using some w is then

Rw = Nw, σw2)

μw = wTμ

σw2 = wTΣw

Now, we want to maximize for a balance between high return μw and low risk σw2. This leads to the following optimization problem, where we want to find the value w* of w maximizing the following expresion:

maximize     wTμ − γwTΣw

subject to     ∑i=1..m wi = 1

where γ controls the balance between risk and return. A high value of γ indicate we are willing to take low risk and vise versa.

In this project you should find w* for different values of γ and using real stock values of your choice. The project consists of the following three questions.

  1. We need a module for collecting stock values. For this you can use the module pandas-datareader. Using this you should write a function get_prices([stock1, ..., stockk], step_size, period) that returns a tuple (stocks, p), where p[j, i] represents the opening price of stock i at time step j and stocks[i] is the name of the i‘th stock (adjust the arguments to get_prices to the data available at your data source). Make a plot of p, where each stock is labeled with its name, e.g. MSFT or GOGL. You should use at least five stocks.

  2. Calculate r, μ and Σ using the formulas above and the p calculated in the first question. Plot the probability density function (pdf) of the return of each stock.
    Hint. The method norm.pdf from the module scipy.stats might become convenient.

  3. Solve the optimization problem defined above for different values of γ, e.g. gammas = (np.arange(10) / 5) + 1, and plot the pdf of each solution to a single plot with appropriate legends. Finally create a scatter plot of how w* changes as γ changes. For each value of γ plot the fraction of each stock in the portfolio.