Loading Data¶
There are various ways of loading data into CAS. They range from parsing client-side files from various formats into pandas.DataFrame objects to loading large data files that are stored on the server. Which method you choose depends on your needs and how large the data is. For small data sets or data that you want to use a custom parser for, you can use client-side data. For large data sets, you would use server-side data files.
Client-Side Data Files and Sources¶
Using Client-Side Parsers¶
The easiest way to get data into CAS is using the data loading methods on the CAS object that parallel data reading operations in the pandas module. This includes pandas.read_csv(), pandas.read_table(), pandas.read_excel(), etc. The same methods exist on CAS objects as well and, in fact, use the Pandas functions in the background to do the parsing. The only difference is that the result of the method is a CASTable object rather than a pandas.DataFrame. Let’s look at an example.
In this example, we are pointing to a URL that references CSV data. You could just as easily point to a local file. Just keep in mind that when using a URL, the data is downloaded from wherever it is to the client machine for parsing before it is uploaded to CAS.
In [1]: conn = swat.CAS(host, port, username, password)
In [2]: cars = conn.read_csv('https://raw.githubusercontent.com/'
...: 'sassoftware/sas-viya-programming/master/data/cars.csv')
...:
NOTE: Cloud Analytic Services made the uploaded file available as table TMPKMJX6BEE in caslib CASUSER(kesmit).
NOTE: The table TMPKMJX6BEE has been created in caslib CASUSER(kesmit) from binary data uploaded to Cloud Analytic Services.
In [3]: cars.head()