熊猫分发_熊猫cut（）函数示例-白红宇

熊猫分发_熊猫cut（）函数示例

阅读量：2533 次

发布时间：2019-05-11

本文共 3910 字，大约阅读时间需要 13 分钟。

熊猫分发

1.熊猫cut（）函数 (1. Pandas cut() Function)

Pandas cut() function is used to segregate array elements into separate bins. The cut() function works only on one-dimensional array-like objects.

Pandas cut（）函数用于将数组元素分离到单独的bin中。 cut（）函数仅适用于一维类似数组的对象。

2.熊猫cut（）函数的用法 (2. Usage of Pandas cut() Function)

The cut() function is useful when we have a large number of scalar data and we want to perform some statistical analysis on it.

当我们有大量标量数据并且想要对其进行一些统计分析时，cut（）函数很有用。

For example, let’s say we have an array of numbers between 1 and 20. We want to divide them into two bins of (1, 10] and (10, 20] and add labels such as “Lows” and “Highs”. We can easily perform this using the pandas cut() function.

例如，假设我们有一个1到20之间的数字数组。我们想将它们分为（1，10]和（10，20]的两个bin，并添加标签，例如“ Lows”和“ Highs”。可以使用pandas cut（）函数轻松地执行此操作。

Furthermore, we can perform functions on the elements of a specific bin and label elements.

此外，我们可以对特定bin元素和label元素执行功能。

3. Pandas cut（）函数语法 (3. Pandas cut() function syntax)

The cut() function sytax is:

cut（）函数的语法为：

cut(    x,    bins,    right=True,    labels=None,    retbins=False,    precision=3,    include_lowest=False,    duplicates="raise",)

x is the input array to be binned. It must be one-dimensional.
x是要合并的输入数组。它必须是一维的。

bins defines the bin edges for the segmentation.
bin定义用于分割的bin边缘。

right indicates whether to include the rightmost edge or not, default value is True.
right表示是否包括最右边，默认值为True。

labels is used to specify the labels for the returned bins.
标签用于为返回的垃圾箱指定标签。

retbins specifies whether to return the bins or not.
retbins指定是否返回垃圾箱。

precision specifies the precision at which to store and display the bins labels.
precision指定存储和显示垃圾箱标签的精度。

include_lowest specifies whether the first interval should be left-inclusive or not.
include_lowest指定第一个间隔是否应为左包含。

duplicates speicifies what to do if the bins edges are not unique, whether to raise ValueError or drop non-uniques.
重复项专门说明如果垃圾箱边缘不唯一时该怎么做，是引发ValueError还是丢弃非唯一变量。

4. Pandas cut（）函数示例 (4. Pandas cut() function examples)

Let’s look into some examples of pandas cut() function. I will use to generate random numbers to populate the DataFrame object.

让我们看一下pandas cut（）函数的一些示例。我将使用生成随机数来填充DataFrame对象。

4.1）将段号划分为垃圾箱 (4.1) Segment Numbers into Bins)

import pandas as pdimport numpy as npdf_nums = pd.DataFrame({'num': np.random.randint(1, 100, 10)})print(df_nums)df_nums['num_bins'] = pd.cut(x=df_nums['num'], bins=[1, 25, 50, 75, 100])print(df_nums)print(df_nums['num_bins'].unique())

Output:

输出：

num0   801   402   253    94   665   136   637   338   209   60   num   num_bins0   80  (75, 100]1   40   (25, 50]2   25    (1, 25]3    9    (1, 25]4   66   (50, 75]5   13    (1, 25]6   63   (50, 75]7   33   (25, 50]8   20    (1, 25]9   60   (50, 75][(75, 100], (25, 50], (1, 25], (50, 75]]Categories (4, interval[int64]): [(1, 25] < (25, 50] < (50, 75] < (75, 100]]

Notice that 25 is part of the bin (1, 25]. It’s because the rightmost edge is included by default. If you don’t want that then pass the right=False parameter to the cut() function.

注意25是bin（1，25]的一部分。这是因为默认情况下包括了最右边。如果您不希望这样做，则将right=False参数传递给cut（）函数。

4.2）将标签添加到垃圾箱 (4.2) Adding Labels to Bins)

import pandas as pdimport numpy as npdf_nums = pd.DataFrame({'num': np.random.randint(1, 20, 10)})print(df_nums)df_nums['nums_labels'] = pd.cut(x=df_nums['num'], bins=[1, 10, 20], labels=['Lows', 'Highs'], right=False)print(df_nums)print(df_nums['nums_labels'].unique())

Since we want 10 to be part of Highs, we are specifying right=False in the cut() function call.

由于我们希望10成为高点的一部分，因此我们在cut（）函数调用中指定right = False 。

Output:

输出：

num0    51   162    63   134    25   106   187   108    29   18   num nums_labels0    5        Lows1   16       Highs2    6        Lows3   13       Highs4    2        Lows5   10       Highs6   18       Highs7   10       Highs8    2        Lows9   18       Highs[Lows, Highs]Categories (2, object): [Lows < Highs]

5.参考 (5. References)

翻译自:

熊猫分发

转载地址：http://qtlzd.baihongyu.com/

你可能感兴趣的文章