Data Filter¶
Source: adopted from here
Introduction¶
In many data analysis, we are interested in a subset of data sample, which meet certain conditions. This is represented by the well-known map-reduce paradigm. This programming model is very popular among modern programming languages. Kdb+/q also provides multiple functions to help us perform analysis on a subset of data.
Question¶
The function simTrade
simulates the volume and timestamp of trades for multiple different stocks in a week.
simTrade:{
n:100000;
system "S -314159";
:`date`sym xasc ([]date:n?2020.06.22+til 5;sym:n?`IBM`MSFT`AAPL`MS`GS`C`EDU;volume:n?10000);
};
trades:simTrade[];
Find the trades with largest volume for each stock on each day of the week. The expected result should look like below:
date sym volume
----------------------
2020.06.22 AAPL 9995
2020.06.22 C 9999
2020.06.22 EDU 9991
2020.06.22 GS 9993
2020.06.22 IBM 9999
2020.06.22 MS 9999
2020.06.22 MSFT 9996
2020.06.23 AAPL 9989
2020.06.23 AAPL 9989
2020.06.23 C 9999
2020.06.23 EDU 9998
2020.06.23 GS 9996
2020.06.23 IBM 9999
2020.06.23 MS 9998
2020.06.23 MSFT 9998
2020.06.23 MSFT 9998
Answer¶
The suggested answer is as follows:
select from trades where volume=(max;volume) fby ([] date;sym)
The q keyword learned from this question is fby. Note that when the grouping is performed on multiple columns, these columns need to be put into a table.