Skip to content

Data Filter

image Source: adopted from here

Introduction

In many data analysis, we are interested in a subset of data sample, which meet certain conditions. This is represented by the well-known map-reduce paradigm. This programming model is very popular among modern programming languages. Kdb+/q also provides multiple functions to help us perform analysis on a subset of data.

Question

The function simTrade simulates the volume and timestamp of trades for multiple different stocks in a week.

simTrade:{
  n:100000;
  system "S -314159";
  :`date`sym xasc ([]date:n?2020.06.22+til 5;sym:n?`IBM`MSFT`AAPL`MS`GS`C`EDU;volume:n?10000);
  };
trades:simTrade[];

Find the trades with largest volume for each stock on each day of the week. The expected result should look like below:

date       sym  volume
----------------------
2020.06.22 AAPL 9995
2020.06.22 C    9999
2020.06.22 EDU  9991
2020.06.22 GS   9993
2020.06.22 IBM  9999
2020.06.22 MS   9999
2020.06.22 MSFT 9996
2020.06.23 AAPL 9989
2020.06.23 AAPL 9989
2020.06.23 C    9999
2020.06.23 EDU  9998
2020.06.23 GS   9996
2020.06.23 IBM  9999
2020.06.23 MS   9998
2020.06.23 MSFT 9998
2020.06.23 MSFT 9998

Answer

The suggested answer is as follows:

select from trades where volume=(max;volume) fby ([] date;sym)

The q keyword learned from this question is fby. Note that when the grouping is performed on multiple columns, these columns need to be put into a table.