swat.cas.table.CASTable

class swat.cas.table.CASTable(name, **table_params)

Bases: swat.cas.utils.params.ParamManager, swat.cas.utils.params.ActionParamManager

Object for interacting with CAS tables

CASTable objects can be used in multiple ways. They can be used as simply a container of table parameters and used as CAS action parameter values. If a connection is associated with it (either by instantiating it from CAS.CASTable() or using set_connection()), it can be used to call CAS actions on the table. Finally, it supports much of the pandas.DataFrame API, so it can interact with CAS tables in much the same way you interact with local data.

The parameters below are a superset of all of the available parameters. Some CAS actions may not support all parameters. You will need to see the help for each CAS action on what it supports.

Parameters:
name : string or CASTable

specifies the name of the table to use.

caslib : string, optional

specifies the caslib containing the table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.

where : string, optional

specifies an expression for subsetting the input data.

groupby : list of dicts, optional

specifies the names of the variables to use for grouping results.

groupbyfmts : list, optional

specifies the format to apply to each group-by variable. To avoid specifying a format for a group-by variable, use “” (no format). Default: []

orderby : list of dicts, optional

specifies the variables to use for ordering observations within partitions. This parameter applies to partitioned tables or it can be combined with groupBy variables when groupByMode is set to REDISTRIBUTE.

computedvars : list of dicts, optional

specifies the names of the computed variables to create. Specify an expression for each parameter in the computedvarsprogram parameter.

computedvarsprogram : string, optional

specifies an expression for each variable that you included in the computedvars parameter.

groupbymode : string, optional

specifies how the server creates groups. Default: NOSORT Values: NOSORT, REDISTRIBUTE

computedondemand : boolean, optional

when set to True, the computed variables specified in the compVars parameter are created when the table is loaded instead of when the action begins. Default: False

singlepass : boolean, optional

when set to True, the data does not create a transient table in the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs. Default: False

importoptions : dict, optional

specifies the settings for reading a table from a data source.

ondemand : boolean, optional

when set to True, table access is less aggressive with virtual memory use. Default: True

vars : list of dicts, optional

specifies the variables to use in the action.

timestamp : string, optional

specifies the timestamp to apply to the table. Specify the value in the form that is appropriate for your session locale. Used only on output table definitions.

compress : boolean, optional

when set to True, data compression is applied to the table. Used only on output table definitions. Default: False

replace : boolean, optional

specifies whether to overwrite an existing table with the same name. Used only on output table definitions. Default: False

replication : int32, optional

specifies the number of copies of the table to make for fault tolerance. Larger values result in slower performance and use more memory, but provide high availability for data in the event of a node failure. Used only on output table definitions. Default: 1 Note: Value range is 0 <= n < 2147483647

threadblocksize : int64, optional

specifies the number of bytes to use for blocks that are read by threads. Increase this value only if you have a large table and CPU utilization by threads shows thread starvation. Used only on output table definitions. Note: Value range is 0 <= n < 9223372036854775807

label : string, optional

specifies the descriptive label to associate with the table.

maxmemsize : int64, optional

specifies the maximum amount of physical memory, in bytes, to allocate for the table. After this threshold is reached, the server uses temporary files and operating system facilities for memory management. Used only on output table definitions. Default: 0

promote : boolean, optional

when set to True, the output table is added with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope. Used only on output table definitions. Default: False

ondemand : boolean, optional

when set to True, table access is less aggressive with virtual memory use. Used only on output table definitions. Default: True

Returns:
CASTable

Examples

Create a CASTable registered to conn.

>>> conn = swat.CAS()
>>> iris = conn.CASTable('iris')

Use the table as a CAS action parameter.

>>> summ = conn.summary(table=iris)
>>> print(summ)

Call a CAS action directly on the CASTable.

>>> summ = iris.summary()
>>> print(summ)

Use a CASTable as an output table definition.

>>> summout = conn.summary(table=iris,
...                        casout=swat.CASTable('summout', replace=True))
>>> print(summout)

Use a CASTable like a pandas.DataFrame

>>> print(iris.head())
>>> print(iris[['petal_length', 'petal_width']].describe())
__init__(name, **table_params)

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(name, **table_params) Initialize self.
abs() Return a new CASTable with absolute values of numerics
all([axis, bool_only, skipna, level]) Return whether all elements in the column are True
any([axis, bool_only, skipna, level]) Return whether any elements in the column are True
append(other[, ignore_index, …]) Append rows of other to self
append_columns(*items, **kwargs) Append variable names to action inputs parameter
append_computed_columns(names, code[, inplace]) Append computed columns as specified
append_computedvars(*items, **kwargs) Append variable names to tbl.computedvars parameter
append_computedvarsprogram(*items, **kwargs) Append code to tbl.computedvarsprogram parameter
append_groupby(*items, **kwargs) Append variable names to tbl.groupby parameter
append_orderby(*items, **kwargs) Append orderby parameters
append_where(*items, **kwargs) Append code to where parameter
as_matrix([columns, n]) Convert the CASTable to its Numpy-array representation
boxplot([column, by]) Make a boxplot from the table data
clip([lower, upper, axis]) Clip values at thresholds
clip_lower(threshold[, axis]) Clip values at lower threshold
clip_upper(threshold[, axis]) Clip values at upper threshold
copy([deep, exclude]) Make a copy of the CASTable object
corr([method, min_periods]) Compute pairwise correlation of columns
count([axis, level, numeric_only]) Return total number of non-missing values in each column
css([casout]) Return the corrected sum of squares of the values of each column
cv([casout]) Return the coefficient of variation of the values of each column
datastep(code[, casout]) Execute Data step code against the CAS table
del_action_params(*names) Delete parameters for specified action names
del_param(*keys) Delete parameters
del_params(*keys) Delete parameters
describe([percentiles, include, exclude, stats]) Get descriptive statistics
drop(labels[, axis, level, inplace, errors]) Return a new CASTable object with the specified columns removed
dropna([axis, how, thresh, subset, inplace]) Drop rows that contain missing values
eval(expr[, inplace, kwargs]) Evaluate a CAS table expression
fillna([value, method, axis, inplace, …]) Fill missing values using the specified method
from_csv(connection, path[, casout]) Create a CASTable from a CSV file
from_dict(connection, data[, casout]) Create a CASTable from a dictionary
from_items(connection, items[, casout]) Create a CASTable from a (key, value) pairs
from_records(connection, data[, casout]) Create a CASTable from records
get_action_names() Return a list of available CAS actions
get_action_params(name, *default) Return parameters for specified action name
get_actionset_names() Return a list of available actionsets
get_connection() Get the registered connection object
get_dtype_counts() Retrieve the frequency of CAS table column data types
get_fetch_params() Return options to be used during the table.fetch action
get_ftype_counts() Retrieve the frequency of CAS table column data types
get_groupby_vars() Return a list of By group variable names
get_inputs_param() Return the column names for the inputs= action parameter
get_param(key, *default) Return the value of a parameter
get_params(*keys) Return the values of one or more parameters
get_value(index, col, **kwargs) Retrieve a single scalar value
groupby(by[, axis, level, as_index, sort, …]) Specify grouping variables for the table
has_groupby_vars() Does the table have By group variables configured?
has_param(*keys) Return a boolean indicating whether or not the parameters exist
has_params(*keys) Return a boolean indicating whether or not the parameters exist
head([n, columns, bygroup_as_index, casout]) Retrieve first n rows
hist([column, by]) Make a histogram from the table data
info([verbose, buf, max_cols, memory_usage, …]) Print summary of CASTable information
invoke(_name_, **kwargs) Invoke an action on the registered connection
iteritems() Iterate over column names and CASColumn objects
iterrows([chunksize]) Iterate over the rows of a CAS table as (index, pandas.Series) pairs
itertuples([index, chunksize]) Iterate over rows as tuples
kurt([axis, skipna, level, numeric_only, casout]) Return the kurtosis of the values of each column
kurtosis([axis, skipna, level, …]) Return the kurtosis of the values of each column
lookup(row_labels, col_labels) Retrieve values indicated by row_labels, col_labels positions
max([axis, skipna, level, numeric_only, casout]) Return the maximum value of each column
mean([axis, skipna, level, numeric_only, casout]) Return the mean value of each column
median([axis, skipna, level, numeric_only, …]) Return the median value of each numeric column
merge(right[, how, on, left_on, right_on, …]) Merge CASTable objects using a database-style join on a column
min([axis, skipna, level, numeric_only, casout]) Return the minimum value of each column
mode([axis, numeric_only, max_tie, skipna]) Return the mode of each column
next() Return next item in the iteration
nlargest(n, columns[, keep, casout]) Return the n largest values ordered by columns
nmiss([axis, level, numeric_only, casout]) Return total number of missing values in each column
nsmallest(n, columns[, keep, casout]) Return the n smallest values ordered by columns
nth(n[, dropna, bygroup_as_index, casout]) Return the nth row
pop(colname) Remove a column from the CASTable and return it
probt([casout]) Return the p-value of the T-statistics of the values of each column
quantile([q, axis, numeric_only, …]) Return values at the given quantile
query(expr[, inplace, engine]) Query the table with a boolean expression
replace([to_replace, value, inplace, limit, …]) Replace values in the data set
reset_index([level, drop, inplace, …]) Reset the CASTable index
retrieve(_name_, **kwargs) Invoke an action on the registered connection and retrieve results
sample([n, frac, replace, weights, …]) Returns a random sample of the CAS table rows
select_dtypes([include, exclude, inplace]) Return a subset CASTable including/excluding columns based on data type
set_action_params(name, **kwargs) Set parameters for specified action name
set_connection(connection) Set the connection to use for action calls
set_param(*args, **kwargs) Set paramaters according to key/value pairs
set_params(*args, **kwargs) Set paramaters according to key/value pairs
skew([axis, skipna, level, numeric_only, casout]) Return the skewness of the values of each column
skewness([axis, skipna, level, …]) Return the skewness of the values of each column
slice([start, stop, columns, …]) Retrieve the specified rows
sort(by[, axis, ascending, inplace, kind, …]) Specify sort parameters for data in a CAS table
sort_values(by[, axis, ascending, inplace, …]) Specify sort parameters for data in a CAS table
std([axis, skipna, level, ddof, …]) Return the standard deviation of the values of each column
stderr([casout]) Return the standard error of the values of each column
sum([axis, skipna, level, numeric_only, casout]) Return the sum of the values of each column
tail([n, columns, bygroup_as_index, casout]) Retrieve last n rows
to_clipboard(*args, **kwargs) Write the CAS table data to the clipboard
to_csv(*args, **kwargs) Write CAS table data to comma separated values (CSV)
to_datastep_params() Create a data step table specification
to_dense(*args, **kwargs) Return dense representation of CAS table data
to_dict(*args, **kwargs) Convert CAS table data to a Python dictionary
to_excel(*args, **kwargs) Write CAS table data to an Excel spreadsheet
to_frame([sample_pct, sample_seed, sample, …]) Retrieve entire table as a SASDataFrame
to_gbq(*args, **kwargs) Write CAS table data to a Google BigQuery table
to_hdf(*args, **kwargs) Write CAS table data to HDF
to_html(*args, **kwargs) Render the CAS table data to an HTML table
to_json(*args, **kwargs) Convert the CAS table data to a JSON string
to_latex(*args, **kwargs) Render the CAS table data to a LaTeX tabular environment
to_msgpack(*args, **kwargs) Write CAS table data to msgpack object
to_outtable() Create a copy of the CASTable object with only output table paramaters
to_outtable_params() Create a copy of the CASTable parameters using only the output table parameters
to_params() Return the parameters as a dictionary
to_pickle(*args, **kwargs) Pickle (serialize) the CAS table data
to_records(*args, **kwargs) Convert CAS table data to record array
to_sparse(*args, **kwargs) Convert CAS table data to SparseDataFrame
to_sql(*args, **kwargs) Write CAS table records to SQL database
to_stata(*args, **kwargs) Write CAS table data to Stata file
to_string(*args, **kwargs) Render the CAS table to a console-friendly tabular output
to_table() Create a copy of the CASTable object with only input table paramaters
to_table_name() Return the name of the table
to_table_params() Create a copy of the table parameters containing only input table parameters
to_view(*args, **kwargs) Create a view using the current CASTable parameters
to_xarray(*args, **kwargs) Return an numpy.xarray() from the CAS table
tvalue([casout]) Return the T-statistics for hypothesis testing of the values of each column
uss([casout]) Return the uncorrected sum of squares of the values of each column
var([axis, skipna, level, ddof, …]) Return the variance of the values of each column
xs(key[, axis, level, copy, drop_level]) Return a cross-section from the CASTable