Docs
.filter

Dataframe.filter(...)

This function returns Dataframe and it is used to filter out Dataframe, by checking if key which is the column label against a filter condition Condition using value.

Parameters

  • key: string - this is the column key or label, if unsure of the labels, it can be retrived by running Dataframe.labels

  • filter: Condition - is a enum type that has the following available filters

    ConditionMeaning(s)
    ==equal to
    !=not equal to
    >greater than
    <less than
    >=greater than or eqaul to
    <=less than or eqaul to
    Econtains or is an element of
    is betweenis between lower & upper limit
    matchesmatches - a RegEx
  • value: unknown - this can be a number of data types, this is determined based on the type of filter query

  • limit?: unknown - this is an optional argument, used with range filters like is between

Returns

  • Dataframe<T>
const filtered: Dataframe<T> = df.filter("salary", ">", 70000);

Multiple / Chained Filter

You can chain the filter ie. filtering the previously filtered Dataframe, the chained filter can be as long as you need them to be;

const filtered: Dataframe<T> = df.filter("salary", ">", 70000).filter("work_year", "==", 2020);

Range Filters

Range filters filter numerical values in the Dataframe that fall between a certain range (lower limit and upper limit);

const filtered: Dataframe<T> = df.filter("salary", "is between", 70000, 100000);

Regex Filter

Regex filter uses regular expression to perform complex queries like matching certain patterns in a String, this uses the matches keyword and takes in a RegExp (opens in a new tab) as the value for the value paramter of the .filter(...) function.

⚠️

Please using this Regex filter is a big trade off on time, performing a query with a simple regex like /engineer/i on a dataset of almost 35,000+ rows take ~7.9ms to ~10ms, and performing a query with a regex like /(a*)*b/ on the same dataset can take ~100ms, as we see a asymptotic time complexity, it searches n characters of n rows, ie. time grows with growth in search space. We recommend you use "E" or "==", and only use the RegEx when time is not a factor.

There are two ways to work with the "matches" regex filter

// Create the RegExp object
const iRe = new RegExp(/engineer/, "i");
const filtered = selected_cols.filter("job_title", "matches", iRe);
 
// or pass the expression directly
const filtered = selected_cols.filter("job_title", "matches", /engineer/i);