swat.cas.table.CASTable¶
-
class
swat.cas.table.
CASTable
(name, **table_params)¶ Bases: swat.cas.utils.params.ParamManager, swat.cas.utils.params.ActionParamManager
Object for interacting with CAS tables
CASTable objects can be used in multiple ways. They can be used as simply a container of table parameters and used as CAS action parameter values. If a connection is associated with it (either by instantiating it from CAS.CASTable() or using set_connection()), it can be used to call CAS actions on the table. Finally, it supports much of the pandas.DataFrame API, so it can interact with CAS tables in much the same way you interact with local data.
The parameters below are a superset of all of the available parameters. Some CAS actions may not support all parameters. You will need to see the help for each CAS action on what it supports.
Parameters: - name : string or CASTable
specifies the name of the table to use.
- caslib : string, optional
specifies the caslib containing the table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.
- where : string, optional
specifies an expression for subsetting the input data.
- groupby : list of dicts, optional
specifies the names of the variables to use for grouping results.
- groupbyfmts : list, optional
specifies the format to apply to each group-by variable. To avoid specifying a format for a group-by variable, use “” (no format). Default: []
- orderby : list of dicts, optional
specifies the variables to use for ordering observations within partitions. This parameter applies to partitioned tables or it can be combined with groupBy variables when groupByMode is set to REDISTRIBUTE.
- computedvars : list of dicts, optional
specifies the names of the computed variables to create. Specify an expression for each parameter in the computedvarsprogram parameter.
- computedvarsprogram : string, optional
specifies an expression for each variable that you included in the computedvars parameter.
- groupbymode : string, optional
specifies how the server creates groups. Default: NOSORT Values: NOSORT, REDISTRIBUTE
- computedondemand : boolean, optional
when set to True, the computed variables specified in the compVars parameter are created when the table is loaded instead of when the action begins. Default: False
- singlepass : boolean, optional
when set to True, the data does not create a transient table in the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs. Default: False
- importoptions : dict, optional
specifies the settings for reading a table from a data source.
- ondemand : boolean, optional
when set to True, table access is less aggressive with virtual memory use. Default: True
- vars : list of dicts, optional
specifies the variables to use in the action.
- timestamp : string, optional
specifies the timestamp to apply to the table. Specify the value in the form that is appropriate for your session locale. Used only on output table definitions.
- compress : boolean, optional
when set to True, data compression is applied to the table. Used only on output table definitions. Default: False
- replace : boolean, optional
specifies whether to overwrite an existing table with the same name. Used only on output table definitions. Default: False
- replication : int32, optional
specifies the number of copies of the table to make for fault tolerance. Larger values result in slower performance and use more memory, but provide high availability for data in the event of a node failure. Used only on output table definitions. Default: 1 Note: Value range is 0 <= n < 2147483647
- threadblocksize : int64, optional
specifies the number of bytes to use for blocks that are read by threads. Increase this value only if you have a large table and CPU utilization by threads shows thread starvation. Used only on output table definitions. Note: Value range is 0 <= n < 9223372036854775807
- label : string, optional
specifies the descriptive label to associate with the table.
- maxmemsize : int64, optional
specifies the maximum amount of physical memory, in bytes, to allocate for the table. After this threshold is reached, the server uses temporary files and operating system facilities for memory management. Used only on output table definitions. Default: 0
- promote : boolean, optional
when set to True, the output table is added with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope. Used only on output table definitions. Default: False
- ondemand : boolean, optional
when set to True, table access is less aggressive with virtual memory use. Used only on output table definitions. Default: True
Returns: Examples
Create a CASTable registered to conn.
>>> conn = swat.CAS() >>> iris = conn.CASTable('iris')
Use the table as a CAS action parameter.
>>> summ = conn.summary(table=iris) >>> print(summ)
Call a CAS action directly on the CASTable.
>>> summ = iris.summary() >>> print(summ)
Use a CASTable as an output table definition.
>>> summout = conn.summary(table=iris, ... casout=swat.CASTable('summout', replace=True)) >>> print(summout)
Use a CASTable like a pandas.DataFrame
>>> print(iris.head()) >>> print(iris[['petal_length', 'petal_width']].describe())
-
__init__
(name, **table_params)¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__(name, **table_params) Initialize self. abs() Return a new CASTable with absolute values of numerics all([axis, bool_only, skipna, level]) Return whether all elements in the column are True any([axis, bool_only, skipna, level]) Return whether any elements in the column are True append(other[, ignore_index, …]) Append rows of other to self append_columns(*items, **kwargs) Append variable names to action inputs parameter append_computed_columns(names, code[, inplace]) Append computed columns as specified append_computedvars(*items, **kwargs) Append variable names to tbl.computedvars parameter append_computedvarsprogram(*items, **kwargs) Append code to tbl.computedvarsprogram parameter append_groupby(*items, **kwargs) Append variable names to tbl.groupby parameter append_orderby(*items, **kwargs) Append orderby parameters append_where(*items, **kwargs) Append code to where parameter as_matrix([columns, n]) Convert the CASTable to its Numpy-array representation boxplot([column, by]) Make a boxplot from the table data clip([lower, upper, axis]) Clip values at thresholds clip_lower(threshold[, axis]) Clip values at lower threshold clip_upper(threshold[, axis]) Clip values at upper threshold copy([deep, exclude]) Make a copy of the CASTable object corr([method, min_periods]) Compute pairwise correlation of columns count([axis, level, numeric_only]) Return total number of non-missing values in each column css([casout]) Return the corrected sum of squares of the values of each column cv([casout]) Return the coefficient of variation of the values of each column datastep(code[, casout]) Execute Data step code against the CAS table del_action_params(*names) Delete parameters for specified action names del_param(*keys) Delete parameters del_params(*keys) Delete parameters describe([percentiles, include, exclude, stats]) Get descriptive statistics drop(labels[, axis, level, inplace, errors]) Return a new CASTable object with the specified columns removed dropna([axis, how, thresh, subset, inplace]) Drop rows that contain missing values eval(expr[, inplace, kwargs]) Evaluate a CAS table expression fillna([value, method, axis, inplace, …]) Fill missing values using the specified method from_csv(connection, path[, casout]) Create a CASTable from a CSV file from_dict(connection, data[, casout]) Create a CASTable from a dictionary from_items(connection, items[, casout]) Create a CASTable from a (key, value) pairs from_records(connection, data[, casout]) Create a CASTable from records get_action_names() Return a list of available CAS actions get_action_params(name, *default) Return parameters for specified action name get_actionset_names() Return a list of available actionsets get_connection() Get the registered connection object get_dtype_counts() Retrieve the frequency of CAS table column data types get_fetch_params() Return options to be used during the table.fetch action get_ftype_counts() Retrieve the frequency of CAS table column data types get_groupby_vars() Return a list of By group variable names get_inputs_param() Return the column names for the inputs= action parameter get_param(key, *default) Return the value of a parameter get_params(*keys) Return the values of one or more parameters get_value(index, col, **kwargs) Retrieve a single scalar value groupby(by[, axis, level, as_index, sort, …]) Specify grouping variables for the table has_groupby_vars() Does the table have By group variables configured? has_param(*keys) Return a boolean indicating whether or not the parameters exist has_params(*keys) Return a boolean indicating whether or not the parameters exist head([n, columns, bygroup_as_index, casout]) Retrieve first n rows hist([column, by]) Make a histogram from the table data info([verbose, buf, max_cols, memory_usage, …]) Print summary of CASTable information invoke(_name_, **kwargs) Invoke an action on the registered connection iteritems() Iterate over column names and CASColumn objects iterrows([chunksize]) Iterate over the rows of a CAS table as (index, pandas.Series) pairs itertuples([index, chunksize]) Iterate over rows as tuples kurt([axis, skipna, level, numeric_only, casout]) Return the kurtosis of the values of each column kurtosis([axis, skipna, level, …]) Return the kurtosis of the values of each column lookup(row_labels, col_labels) Retrieve values indicated by row_labels, col_labels positions max([axis, skipna, level, numeric_only, casout]) Return the maximum value of each column mean([axis, skipna, level, numeric_only, casout]) Return the mean value of each column median([axis, skipna, level, numeric_only, …]) Return the median value of each numeric column merge(right[, how, on, left_on, right_on, …]) Merge CASTable objects using a database-style join on a column min([axis, skipna, level, numeric_only, casout]) Return the minimum value of each column mode([axis, numeric_only, max_tie, skipna]) Return the mode of each column next() Return next item in the iteration nlargest(n, columns[, keep, casout]) Return the n largest values ordered by columns nmiss([axis, level, numeric_only, casout]) Return total number of missing values in each column nsmallest(n, columns[, keep, casout]) Return the n smallest values ordered by columns nth(n[, dropna, bygroup_as_index, casout]) Return the nth row pop(colname) Remove a column from the CASTable and return it probt([casout]) Return the p-value of the T-statistics of the values of each column quantile([q, axis, numeric_only, …]) Return values at the given quantile query(expr[, inplace, engine]) Query the table with a boolean expression replace([to_replace, value, inplace, limit, …]) Replace values in the data set reset_index([level, drop, inplace, …]) Reset the CASTable index retrieve(_name_, **kwargs) Invoke an action on the registered connection and retrieve results sample([n, frac, replace, weights, …]) Returns a random sample of the CAS table rows select_dtypes([include, exclude, inplace]) Return a subset CASTable including/excluding columns based on data type set_action_params(name, **kwargs) Set parameters for specified action name set_connection(connection) Set the connection to use for action calls set_param(*args, **kwargs) Set paramaters according to key/value pairs set_params(*args, **kwargs) Set paramaters according to key/value pairs skew([axis, skipna, level, numeric_only, casout]) Return the skewness of the values of each column skewness([axis, skipna, level, …]) Return the skewness of the values of each column slice([start, stop, columns, …]) Retrieve the specified rows sort(by[, axis, ascending, inplace, kind, …]) Specify sort parameters for data in a CAS table sort_values(by[, axis, ascending, inplace, …]) Specify sort parameters for data in a CAS table std([axis, skipna, level, ddof, …]) Return the standard deviation of the values of each column stderr([casout]) Return the standard error of the values of each column sum([axis, skipna, level, numeric_only, casout]) Return the sum of the values of each column tail([n, columns, bygroup_as_index, casout]) Retrieve last n rows to_clipboard(*args, **kwargs) Write the CAS table data to the clipboard to_csv(*args, **kwargs) Write CAS table data to comma separated values (CSV) to_datastep_params() Create a data step table specification to_dense(*args, **kwargs) Return dense representation of CAS table data to_dict(*args, **kwargs) Convert CAS table data to a Python dictionary to_excel(*args, **kwargs) Write CAS table data to an Excel spreadsheet to_frame([sample_pct, sample_seed, sample, …]) Retrieve entire table as a SASDataFrame to_gbq(*args, **kwargs) Write CAS table data to a Google BigQuery table to_hdf(*args, **kwargs) Write CAS table data to HDF to_html(*args, **kwargs) Render the CAS table data to an HTML table to_json(*args, **kwargs) Convert the CAS table data to a JSON string to_latex(*args, **kwargs) Render the CAS table data to a LaTeX tabular environment to_msgpack(*args, **kwargs) Write CAS table data to msgpack object to_outtable() Create a copy of the CASTable object with only output table paramaters to_outtable_params() Create a copy of the CASTable parameters using only the output table parameters to_params() Return the parameters as a dictionary to_pickle(*args, **kwargs) Pickle (serialize) the CAS table data to_records(*args, **kwargs) Convert CAS table data to record array to_sparse(*args, **kwargs) Convert CAS table data to SparseDataFrame to_sql(*args, **kwargs) Write CAS table records to SQL database to_stata(*args, **kwargs) Write CAS table data to Stata file to_string(*args, **kwargs) Render the CAS table to a console-friendly tabular output to_table() Create a copy of the CASTable object with only input table paramaters to_table_name() Return the name of the table to_table_params() Create a copy of the table parameters containing only input table parameters to_view(*args, **kwargs) Create a view using the current CASTable parameters to_xarray(*args, **kwargs) Return an numpy.xarray() from the CAS table tvalue([casout]) Return the T-statistics for hypothesis testing of the values of each column uss([casout]) Return the uncorrected sum of squares of the values of each column var([axis, skipna, level, ddof, …]) Return the variance of the values of each column xs(key[, axis, level, copy, drop_level]) Return a cross-section from the CASTable