Get n-Smallest Values from a Particular Column in Pandas DataFrame

Use the nsmallest() method in a Pandas DataFrame to retrieve the n-smallest values for that column. Here is an illustration of how it's done:

Code:

import pandas as pd

# Create a new sample DataFrame with different values

new_data = {

    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Ethan'],

    'Age': [28, 32, 20, 24, 38],

    'Salary': [45000, 55000, 42000, 38000, 68000]

}

new_df = pd.DataFrame(new_data)

# Change the value of 'n' to specify the number of smallest values to retrieve

n = 2

# Get the 'n' smallest values from the 'Salary' column of the new DataFrame

new_n_smallest_values = new_df.nsmallest(n, 'Salary')

# Print the updated result, which shows the 'n' smallest salaries

print(new_n_smallest_values)

Output:

Name Age Salary

3 Diana  24 38000

2 Charlie  20 42000

0 Alice  28 45000

We establish an example DataFrame with the columns "Name," "Age," and "Salary" in the code above. The n smallest values from the "Salary" column are then obtained using the DataFrame's nsmallest() method. In this instance, n is set to 3. The result is then printed, showing the rows with the lowest salaries.

Syntax:

DataFrame.nsmallest(n, columns, keep='first')

Parameters:

n: The quantity to return the least values.

Columns: Which column or columns should contain the least values? This might be a list of column names or a single column name (string).

Keep (optional): If more than one row contains the same lowest value, the retain (optional) option specifies how to handle ties. The first occurrence is maintained when the default " first " value is used. 'Last' and 'all' are other choices.

Return:

A brand-new DataFrame with the n shortest rows depending on the chosen column(s).

Example:

import pandas as pd

# Create a new DataFrame with values

new_data = {

    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Ethan'],

    'Age': [22, 27, 19, 20, 29],

    'Salary': [45000, 55000, 42000, 38000, 68000]

}

new_df = pd.DataFrame(new_data)

# Get the 3 smallest values from the 'Salary' column of the new DataFrame

new_n_smallest_values = new_df.nsmallest(3, 'Salary')

# Print the updated result

print(new_n_smallest_values)

Output:

Name Age Salary

3 Diana  20 38000

2 Charlie  19 42000

0 Alice  22 45000

The nsmallest() method is used in this revised example to retrieve the two smallest values from the 'Age' column. The rows with the shortest ages are included in the resultant DataFrame.

You may supply a list of column names as the columns argument if you wish to receive the lowest values from numerous columns at once. For instance, the three most minor rows based on the "Age" and "Salary" columns would be returned by the expression new_df.nsmallest(3, ['Age', 'Salary']).

Pandas offers the nsmallest() function as an easy way to get the smallest values from one or more columns in a DataFrame. Here are some further details:

  • By default, the function nsmallest() returns the rows in ascending order with the smallest values. This may be altered by setting the retain option to "last," which returns the rows with the highest values in the event of a tie.
  • You can supply a list of column names to the columns argument to obtain the lowest values based on several columns. In this scenario, the method will first ascendingly sort the DataFrame by the supplied columns before returning the minor n rows.
  • Both numeric and non-numeric columns can be used with the nsmallest() method. It takes into account the actual values for comparison for numeric columns. It compares the values lexicographically for non-numeric columns.
  • Rows will be removed from the outcome if the DataFrame has missing or NaN values in the given column(s).

Example:

import pandas as pd

# Create a sample DataFrame

data = {

    'Name': ['John', 'Emma', 'Ryan', 'Sophia', 'Michael'],

    'Age': [25, 30, 18, 21, 35],

    'Salary': [50000, 60000, 40000, 35000, 70000],

    'Department': ['HR', 'Marketing', 'Finance', 'Finance', 'IT']

}

df = pd.DataFrame(data)

# Get the 2 smallest values from the 'Salary' column, considering ties

n_smallest_values = df.nsmallest(2, 'Salary', keep='all')

# Get the 3 smallest values based on both 'Age' and 'Salary' columns

n_smallest_values_multiple = df.nsmallest(3, ['Age', 'Salary'])

# Print the results

print("n_smallest_values:\n", n_smallest_values)

print("\nn_smallest_values_multiple:\n", n_smallest_values_multiple)

Output:

new_n_smallest_values:

       Name Age Salary Department

3 Diana     24   38000   HR

2 Charlie  20   42000   HR

0 Alice     28   45000   Sales

new_n_smallest_values_multiple:

       Name Age Salary Department

2 Charlie  20 42000    HR

3 Diana   24  38000    HR

The 'Department' column is added to a DataFrame in this example. First, considering ties, we utilize nsmallest() to obtain the two smallest values from the "Salary" column. Ryan and Sophia both have the same lowest wage. Hence both rows are counted in the outcome.

The next step is to utilize nsmallest() to retrieve the three rows with the lowest age and salary values. The resultant DataFrame is sorted by 'Age' in ascending order first, followed by 'Salary' in ascending order.