Catalog

The Catalog API allows for the discovery, creation, and integration of metadata. This allows management of information about other information.

Context

This API is used to define and manage metadata.

Background and Overview

This API provides the basic interface to discover, find, and elaborate on metadata within the environment. Metadata in the context of this API refers to descriptive information about any currently defined asset.

For more information, also see SAS Information Catalog: User's Guide or SAS Information Catalog: Administrator's Guide.

Use Cases
  • Retrieve and search metadata about assets.
  • Define metadata for information consumed, used, or related to assets.
  • Integrate metadata with third-party systems.
  • Build and manage agents to populate the catalog with table metadata.

Concepts

As part of managing metadata, catalog first requires information about the kind of metadata that will be introduced. Consumers describe the information they wish to manage by providing one or more type definitions. Metadata content is added using the Catalog API as instances that specify a type definition. The type definition is used to validate and interpret the data in the instance.

These concepts are described in more detail below.

Type Definitions

Type definitions define the kind of entities and relationships (along with any attributes) that are allowed for an asset. Type definitions are analogous to the concept of a class in an object oriented system; or that of a DDL in a relational model.

Type definition names must be unique.

There are four kinds of type definitions:

Attribute Type Definitions

Attribute Type Definitions describe properties for the remaining kinds of type definitions. For example, an Entity Type Definition that describes an individual may reference attribute type definitions for contact information such as a phone number or mailing address.

Enumerations Attribute definitions can describe a list of allowed values by specifying ENUM as the attributeKind field of the Attribute Type Definition. Enumerations have a list of elementDefinitions that describe the allowed values.

Attribute Definitions Attribute type definitions are referenced in type definitions as the type field in their attributes list; these model elements are known as attribute definitions.

A typeCriteria object can be included within an attribute definition. This inclusion enables the expression of constraints on the content that are allowed within the attribute's value at run time.

  • minLength: the shortest acceptable value; only valid for string types
  • maxLength: the longest acceptable value; only valid for string types
  • minimum: the smallest acceptable value; only valid for numeric types
  • maximum: the largest acceptable value; only valid for numeric types
  • minItems: the smallest number of items allowed in the collection; only valid for collections
  • maxItems: the largest number of items allowed in the collection; only valid for collections
  • required: indicates that the attribute value must be specified

For attribute definitions of type collection, the type criteria must include an items element that contains a list of attribute definitions that constitute the elements in the collection. Note that specifying additional collection valued attributes in this 'items' list is not supported; the model does not support collections of collections within a single attribute.

Entity Type Definitions

Entity Type Definitions describe the nouns in the system. Some examples are data sets or reports.

Entity type definitions can have a base type that simplifies modeling by incorporating properties from the base type into the set of properties of the entity type definition. This acts like inheritance of fields in an object-oriented system.

Classification Type Definitions

Classification Type Definitions describe a kind of annotation that can be related to Instances. For instance, a Security classification can be attached to a dataset to describe the sensitivity of the dataset.

Like Entity type definitions, classification type definitions can have a base type that incorporates properties from the base type into the set of properties of the classification type definition.

Classification Hierarchies Classification instances support arrangement in a hierarchy. This structure is useful, for example, when instances represent part of a taxonomy. It is important to remember, however, that this hierarchy is unrelated to the derivation hierarchy. Instances in a hierarchy of a classification need not adhere to the same hierarchy used to model the classification instances.

Relationship Type Definitions

Relationship Type Definitions describe the kinds of connections that can occur between entity and classification instances.

Relationship type definitions describe directed relationships between entity and classification instances of specific types. Entities and classifications can appear on either endpoint of a relationship type definition in any combination (e.g., entity to entity, entity to classification, classification to classification, and classification to entity). For instance, relationship type definitions may describe:

  • a hierarchical file system with a relationship between parent and child folder entities
  • a taxonomy built from classification instances
  • a security classification between a dataset entity and a 'Sensitive' classification instance

Both entity and classification type definitions support the concept of a base type. Base types specified in the relationship type definition allow any type derived from that base to participate in the relationship.

Cascade Delete Upon the delete of an endpoint of relationship instance, the default behavior is for only the given endpoint and the relationship instance to be deleted. The cascadeDelete attribute on the relationship type definition allows for the specification of whether the deletion of a given endpoint should cascade from one endpoint to the other (or both).

Preserve Reference Conversely, the preserveReference attribute on the relationship type definition allows for the specification of whether the deletion of a given endpoint should replace the endpoint instance with an unresolvedReference instance. This allows for the preservation of relationships across the lifecycle of a given instance.

Instances

Instances are used to capture the metadata of objects in the system. If we extend our object-oriented analogy from type definitions, classes are to type definitions as objects are to instances. For example, a dataSet type definition describes the properties and relationships for any dataset, but a dataset instance captures metadata for a specific dataset.

There are three kinds of instances:

Entities

Entities are records of specific objects that capture the metadata of that object. For instance, a dataset instance may specify a name of 'BASEBALL.sashdat' with row and column counts derived from a table in a particular library within the Viya system.

Uniqueness Constraints Unlike type definitions, entity instance names do not have to be unique.

Entities have a resourceId field that is used to refer to the URI for the original object. If the type definition metaCategory is set to PRIMARY, the resourceId fields must be unique among instances of that type.

Classifications

Classifications serve as annotations that are typically referenced many times. While entities tend to refer to specific resources, classifications are used to add additional dimensions to the information captured by an entity. For instance, many datasets may be associated with a 'Sensitive' security classification.

Uniqueness Constraints Classification instances can be arranged within a hierarchy by specifying a parentID. Names must be unique across classification instances with the same parent. If no parent is specified, names must be unique across all the classification instances without a parent.

Relationships

Relationships serve to identify connections between entities and classifications. For instance, our BASEBALL.sashdat table entity may have a relationship to the SASHELP library entity. The type of the relationship conveys the semantics of the connection (along with any attributes attached to the relationship).

One of the primary use cases for the catalog API is discovering assets that are available for use. The catalog supports the traditional filter semantics that are common across SAS APIs; but these tend to require very precise knowledge about the assets of interest.

To fully support the discovery use case, the Catalog API also supports information retrieval techniques in the form of a search engine that allows retrieval of information using a query syntax that returns results based on relevance to one or more search criteria.

To enable search for different kinds of SAS assets, a search index definition for some assets has been added to the Catalog API. The index definition describes which entity and relationship definitions should be included in the search index, along with the properties to index.

When updates to the catalog are made (such as creating new instances, updating existing instances, or deleting instances), indices that reference the type of the instance (or instances) being manipulated are automatically updated.

Tags

Tags provide a way to attach user-provided labels to entities.

When tags are incorporated into search index definitions, users can use tags like a virtual bookmark and rapidly recall or share entities that have been tagged.

Views

When entities, classifications and relationships are combined, the result is a directed graph that captures the structure of information within the Catalog API. It is frequently useful to be able to query the graph in a way that returns the elements of the graph all once, rather than iteratively asking the Catalog API for more information. Selecting a set of related instances is performed thru a View, which conceptually serves as a saved query.

Views are expressed using a Graph query syntax:

  • () describes an entity
  • -[]-, -[]->, <-[]- describes relationships
  • <> describes a classification.

Each of these can be given a named projection, that allows matching objects to be referenced later in the query.

Information is retrieved by matching graphs against the pattern described by the query and returning the parts that are interesting. For example, matching everything in the catalog uses the query

match (x) return x

Property constraints can further reduce the set of results. The supported properties that can be matched on include: id, definitionId, name, type, description, label, createdBy, creationTimeStamp, modifiedBy, modifiedTimeStamp and resourceId. The resourceId property is exclusive to entities. For example, to return an entity with a specific ID use:

match(x {id:7462f4c3-f225-42cf-bc3d-0fdfcf09b190})
return x

The operators equal to (=), greater than (>), less than (<), greater than or equal to (>=) and less than or equal to (<=) can be used as well. For example, to get all CAS library instances created since the start of the year 2026 use:

match (x:casLibrary {creationTimeStamp >= "2026-01-01T00:00:00-00:00"}) return x

Each of the elements can be further constrained with type information. For example, to return all of the dataSet instances use:

match(x:dataSet) return x

or to return the dataSet and its dataFields use:

match(x:dataSet)-[]->(y:dataField) return x,y

Type constraints can specify lists of types. The supported type fields are name for entities and name or role for relationships. For example:

(y:sasColumn|casColumn)<br> // either one
(x)-[r:eq(role, "Associated)]->(y)<br> // Only association rels
(x)-[r:in(role,"Associated","Composition")]->(y) // either role

Match clauses can reference projections that have been previously defined. For instance, to get a dataset, its columns, and the semantic classifications attached to the columns:

match (t:dataSet {id:"..."})-[r:dataSetDataFields]-(c:dataField)
match (c)-[sc:semanticClassifications]-><sc>
return t,r,c,scr,sc

Relationships can include a path length expression:

-[r*..2]-> // Up to two relationships away
-[r*2..4]-> // between 2 and 4 relationships away

where the lower and upper bounds are expressed in the range following the asterisk. This approach is useful when views have to traverse multiple relationships.

Support for history

Entity instances can be configured to maintain old revisions according to a policy that is configured on either the entity or the entity's type definition. Before any changes are applied to the instance, a copy of the current state is written to the catalog.

Entity instances and type definitions allow you to specify:

  • a policy that controls how history records are retained
  • a parameter that governs the policy
  • a view that describes what content is maintained as part of the history

For instance, a policy of depth and a parameter of 10 means that at most 10 different revisions will be maintained within the catalog. The view is used to determine which entities, relationships, and classifications are to be included within the history record.

The policy none indicates that no history is to be kept.

If no history configuration is set for an instance, the behavior is determined by the type definition. This also allows the configuration to be set on a base type of the type definition, which is then inherited by types derived from the base type.

Agents

Agents are used to introduce content into the catalog.

The Catalog API enables constructing, running, and monitoring agents. Agents write their results back to the catalog using the same concepts and interfaces that are documented here.