Dataframe.filter(...)
This function returns Dataframe
and it is used to filter out Dataframe
, by checking if key
which is the column label against a filter condition Condition
using value
.
Parameters
-
key: string
- this is the column key or label, if unsure of the labels, it can be retrived by runningDataframe.labels
-
filter: Condition
- is aenum
type that has the following available filtersCondition
Meaning(s) ==
equal to
!=
not equal to
>
greater than
<
less than
>=
greater than or eqaul to
<=
less than or eqaul to
E
contains
oris an element of
is between
is between
lower & upper limitmatches
matches
- a RegEx -
value: unknown
- this can be a number of data types, this is determined based on the type of filter query -
limit?: unknown
- this is an optional argument, used with range filters likeis between
Returns
Dataframe<T>
const filtered: Dataframe<T> = df.filter("salary", ">", 70000);
Multiple / Chained Filter
You can chain the filter ie. filtering the previously filtered Dataframe
, the chained filter can be as long as you need them to be;
const filtered: Dataframe<T> = df.filter("salary", ">", 70000).filter("work_year", "==", 2020);
Range Filters
Range filters filter numerical values in the Dataframe that fall between a certain range (lower limit and upper limit);
const filtered: Dataframe<T> = df.filter("salary", "is between", 70000, 100000);
Regex Filter
Regex filter uses regular expression to perform complex queries like matching certain patterns in a String
, this uses the matches
keyword and takes in a RegExp
(opens in a new tab) as the value for the value
paramter of the .filter(...)
function.
Please using this Regex filter is a big trade off on time, performing a query with a simple regex like /engineer/i
on a dataset of almost 35,000+ rows take ~7.9ms
to ~10ms
, and performing a query with a regex like /(a*)*b/
on
the same dataset can take ~100ms
, as we see a asymptotic
time complexity, it searches n
characters of n
rows,
ie. time grows with growth in search space. We recommend you use "E"
or "=="
, and only use the RegEx when time is
not a factor.
There are two ways to work with the "matches" regex filter
// Create the RegExp object
const iRe = new RegExp(/engineer/, "i");
const filtered = selected_cols.filter("job_title", "matches", iRe);
// or pass the expression directly
const filtered = selected_cols.filter("job_title", "matches", /engineer/i);