本文共 3910 字,大约阅读时间需要 13 分钟。
熊猫分发
Pandas cut() function is used to segregate array elements into separate bins. The cut() function works only on one-dimensional array-like objects.
Pandas cut()函数用于将数组元素分离到单独的bin中。 cut()函数仅适用于一维类似数组的对象。
The cut() function is useful when we have a large number of scalar data and we want to perform some statistical analysis on it.
当我们有大量标量数据并且想要对其进行一些统计分析时,cut()函数很有用。
For example, let’s say we have an array of numbers between 1 and 20. We want to divide them into two bins of (1, 10] and (10, 20] and add labels such as “Lows” and “Highs”. We can easily perform this using the pandas cut() function.
例如,假设我们有一个1到20之间的数字数组。我们想将它们分为(1,10]和(10,20]的两个bin,并添加标签,例如“ Lows”和“ Highs”。可以使用pandas cut()函数轻松地执行此操作。
Furthermore, we can perform functions on the elements of a specific bin and label elements.
此外,我们可以对特定bin元素和label元素执行功能。
The cut() function sytax is:
cut()函数的语法为:
cut( x, bins, right=True, labels=None, retbins=False, precision=3, include_lowest=False, duplicates="raise",)
Let’s look into some examples of pandas cut() function. I will use to generate random numbers to populate the DataFrame
object.
让我们看一下pandas cut()函数的一些示例。 我将使用生成随机数来填充DataFrame
对象。
import pandas as pdimport numpy as npdf_nums = pd.DataFrame({'num': np.random.randint(1, 100, 10)})print(df_nums)df_nums['num_bins'] = pd.cut(x=df_nums['num'], bins=[1, 25, 50, 75, 100])print(df_nums)print(df_nums['num_bins'].unique())
Output:
输出:
num0 801 402 253 94 665 136 637 338 209 60 num num_bins0 80 (75, 100]1 40 (25, 50]2 25 (1, 25]3 9 (1, 25]4 66 (50, 75]5 13 (1, 25]6 63 (50, 75]7 33 (25, 50]8 20 (1, 25]9 60 (50, 75][(75, 100], (25, 50], (1, 25], (50, 75]]Categories (4, interval[int64]): [(1, 25] < (25, 50] < (50, 75] < (75, 100]]
Notice that 25 is part of the bin (1, 25]. It’s because the rightmost edge is included by default. If you don’t want that then pass the right=False
parameter to the cut() function.
注意25是bin(1,25]的一部分。这是因为默认情况下包括了最右边。如果您不希望这样做,则将right=False
参数传递给cut()函数。
import pandas as pdimport numpy as npdf_nums = pd.DataFrame({'num': np.random.randint(1, 20, 10)})print(df_nums)df_nums['nums_labels'] = pd.cut(x=df_nums['num'], bins=[1, 10, 20], labels=['Lows', 'Highs'], right=False)print(df_nums)print(df_nums['nums_labels'].unique())
Since we want 10 to be part of Highs, we are specifying right=False in the cut() function call.
由于我们希望10成为高点的一部分,因此我们在cut()函数调用中指定right = False 。
Output:
输出:
num0 51 162 63 134 25 106 187 108 29 18 num nums_labels0 5 Lows1 16 Highs2 6 Lows3 13 Highs4 2 Lows5 10 Highs6 18 Highs7 10 Highs8 2 Lows9 18 Highs[Lows, Highs]Categories (2, object): [Lows < Highs]
翻译自:
熊猫分发
转载地址:http://qtlzd.baihongyu.com/