improve the implementation of worldquant 101 alpha factors using numpy
2
votes
1
answer
43
views
I was trying to implement 101 quant trading factors that was published by WorldQuant (https://arxiv.org/pdf/1601.00991.pdf) .
A typical factor is about processing stocks' price and volume information along with both time dimension and stock dimension. Take the example of alpha factor #4: (-1 * Ts_Rank(rank(low), 9)). This is a momentum alpha signal. low is a panel of stocks' low price within certain time period. rank is a cross-sectional process of ranking panel’s each row (a time snapshot). Ts_Rank is a time-series process of moving_rank panel’s each column (a stock) with a specified window.
Intuitively, pandas dataframe or numpy matrix should fit for the implementation of 101 alpha factors. Below is the best implementation using numpy I got so far. However, the performance was too low. On my Intel core i7 windows machine, it took around 45 seconds to run the alpha #4 factor with a 5000 (trade dates) by 200 (stocks) matrix as input.
I also came across DolphinDB, a time series database with built-in analytics features (https://www.dolphindb.com/downloads.html) . For the same factor Alpha#4, DolphinDB ran for mere 0.04 seconds, 1000 times faster than the numpy version. However, DolphinDB is a commercial software. Does anybody know better python implementations? Or any tips to improve my current python code to achieve performance comparable to DolphinDB?
Here is the implementation of python.
import numpy as np
def rankdata(a, method='average', *, axis=None):
# this rankdata refer to scipy.stats.rankdata (https://github.com/scipy/scipy/blob/v1.9.1/scipy/stats/_stats_py.py#L9047-L9153)
if method not in ('average', 'min', 'max', 'dense', 'ordinal'):
raise ValueError('unknown method "{0}"'.format(method))
if axis is not None:
a = np.asarray(a)
if a.size == 0:
np.core.multiarray.normalize_axis_index(axis, a.ndim)
dt = np.float64 if method == 'average' else np.int_
return np.empty(a.shape, dtype=dt)
return np.apply_along_axis(rankdata, axis, a, method)
arr = np.ravel(np.asarray(a))
algo = 'mergesort' if method == 'ordinal' else 'quicksort'
sorter = np.argsort(arr, kind=algo)
inv = np.empty(sorter.size, dtype=np.intp)
inv[sorter] = np.arange(sorter.size, dtype=np.intp)
if method == 'ordinal':
return inv + 1
arr = arr[sorter]
obs = np.r_[True, arr[1:] != arr[:-1]]
dense = obs.cumsum()[inv]
if method == 'dense':
return dense
# cumulative counts of each unique value
count = np.r_[np.nonzero(obs), len(obs)]
if method == 'max':
return count[dense]
if method == 'min':
return count[dense - 1] + 1
# average method
return .5 * (count[dense] + count[dense - 1] + 1)
def rank(x): return rankdata(x,method='min',axis=1)/np.size(x, 1)
def rolling_rank(na): return rankdata(na.transpose(),method='min',axis=0)[-1].transpose()
def ts_rank(x, window=10): a_rolled = np.lib.stride_tricks.sliding_window_view(x, window,axis = 0) return np.append(np.full([window-1,np.size(x, 1)],np.nan),rolling_rank(a_rolled),axis = 0)
def alpha004(data): return -1 * ts_rank(rank(data), 9)
import time
# The input is a 5000 by 200 matrix, where the row index represents trade date and the column index represents security ID.
data=np.random.random((5000, 200)) start_time = time.time() alpha004(data) print("--- %s seconds ---" % (time.time() - start_time))
output: 44.85(s)
Asked by Huang WeiFeng
(31 rep)
May 13, 2025, 09:33 AM
Last activity: May 13, 2025, 09:39 AM
Last activity: May 13, 2025, 09:39 AM