Top Records¶
Source: adopted from here
Introduction¶
The first thing to do in most quantitative analysis in the electronic trading is to determine the data scope, which covers two aspects:
- the sample period, i.e. the start and end date/time of the data
- the cross sectional space, i.e. the universe of securities included in the analysis
A common approach to determine the security universe is to include the top n
most actively traded symbols.
Question¶
Assuming a trade
table, which has three columns: date
, ticker
and volume
, is simulated with the function simulateTrades
defined below. This function generates 500000
trades for all week days in March 2020. The stock ticker is generated as three random letters. The column volume
indicates how many shares are traded.
getTradingDays:{
firstDate:2020.03.01;
lastDate:2020.03.31;
dates:firstDate+til (lastDate-firstDate)+1;
dates where not (dates mod 7) in 0 1
};
simulateTrades:{[seed;nTrades]
tradingDays:getTradingDays[];
system "S ",string seed;
dates:nTrades?tradingDays;
system "S ",string seed;
tickers:nTrades?`3;
system "S ",string seed;
volumes:100*nTrades?1+til 100;
([] date:dates;ticker:tickers;volume:volumes)
};
trades:simulateTrades[-314159;5000000]
Find the top 10
symbols that are most actively traded (measured by total traded shares) on each day. The returned table should have 220
rows, i.e. 10
rows for each day and the following schema:
topTradedSymbols:([]
date:`date$();
ticker:`symbol$();
dailyVol:`long$()
)
Answer¶
First we compute the daily volume from the trade tick data and sort the volume in descending order for each date. Note that 0!
is used to remove the key from the table.
dailyVolume:`date`dailyVol xdesc 0!select dailyVol:sum volume by date,ticker from trades;
Below four different approaches are provided to find the top 10
most actively trades symbols.
Use fby
¶
Find the first 10
indices on each day and then filter out these rows.
select from dailyVolume where ({x in 10#x};i) fby date
Use sublist
¶
ungroup select sublist[10] ticker,sublist[10] dailyVol by date from dailyVolume
Use #
¶
Given sublist
is implemented internally with take operator, using #
directly is slightly faster than using sublist
.
ungroup select 10#ticker,10#dailyVol by date from dailyVolume
Use group
¶
select from dailyVolume where i in raze 10#/:group date
Remarks¶
The last approach using group
is preferred due to two reasons:
- it runs fastest
- it is cleaner if multiple columns are present in the table