Handling By Groups¶
If By groups are specified when running a CAS action, the result are returned with the following behaviors.
- A result key named ‘ByGroupInfo’ is returned with all of the By group variable values.
- Each By group table is returned in a separate result key with a prefix of ‘ByGroup#.’.
These behaviors can help when you have a large number of By groups and you want to process them as they arrive at the client rather than trying to hold the entire set of results in memory. However, when your result sets are smaller, you may want to combine all of the By group tables into a single pandas.DataFrame. To help in these situations, the CASResults class defines some helper methods for you.
Here is what it looks like to run a standard summary action, and a summary action with a By group specified. We will use this output to demonstrate the By group processing methods.
In [1]: tbl = tbl[['MSRP', 'Horsepower']]
In [2]: tbl.summary(subset=['Min', 'Max'])
Out[2]:
[Summary]
Descriptive Statistics for TMP_X94FHXF
Column Min Max
0 MSRP 10280.0 192465.0
1 Horsepower 73.0 500.0
+ Elapsed: 0.0159s, user: 0.012s, sys: 0.007s, mem: 5.41mb
In [3]: tbl.groupby(['Origin', 'Cylinders']).summary(subset=['Min', 'Max'])