Indexing and Data Selection¶
Indexing of CASTable objects works much in the same way as they do in pandas.DataFrame objects. You can select one or more columns based on column names or indexes, and you can select slices of columns. However, data selection does have some limitations. CAS tables can be distributed across a grid of computers and they do not have a specified order. Because of this, indexing based on a row index is not possible at this time. However, it is possible to apply where clauses to a the table parameters to filter rows based on that.
There are a few properties that allow indexing a CASTable object in various ways. These properties work just like they pandas.DataFrame counterparts (with the limitations described above).
Property / Method | Description |
---|---|
o[columns] | Subset table based on column names |
o.loc[:, columns] | Subset table based on column names |
o.iloc[:, columns] | Subset table based on column indexes |
o.ix[:, columns] | Subset table based on mixed column names and indexes |
o.xs(column, axis=1) | Select a cross-section of the table |
o[boolean-column] | Filter data rows based on boolean column values |
o.query(‘expr’) | Apply a filter to the data values |
The Basics¶
Just as with pandas.DataFrames, CASTable objects implement Python’s __getitem__ method to allow indexing using [ ]. This allows you to subset the columns that are visible in the table.
In [1]: tbl = conn.read_csv('https://raw.githubusercontent.com/'
...: 'sassoftware/sas-viya-programming/master/data/cars.csv')
...:
NOTE: Cloud Analytic Services made the uploaded file available as table TMPXUAOS9_X in caslib CASUSER(kesmit).
NOTE: The table TMPXUAOS9_X has been created in caslib CASUSER(kesmit) from binary data uploaded to Cloud Analytic Services.
In [2]: tbl.head()