A large retail cooperative wanted to make it easier to find and explore data across the organization. While they had a centralized data infrastructure in place, end users had difficulty seeing what data was available and how they could use it for analysis. Knowledge of the data infrastructure was held by a few business intelligence specialists at the company. Without an accessible and easy-to-use data catalog, business groups needed to engage the BI team for even the simplest of questions.
The company’s data experts, despite being a valuable strategic resource, were often bogged down answering emails, looking up information in tables, attending informational meetings, and granting user access. And because business groups were reliant on the BI team’s limited availability, any projects or decisions that required data information were often delayed for multiple days until a resource was available to help.
Prior to this engagement, Keyrus had worked with the company’s data strategy team to create profitability reports using Alteryx. Our team built workflows in Alteryx Designer to extract and transform profitability data. We then published applications to Alteryx Gallery that allow end users to run reports on the data.
Alteryx Connect is a collaborative data governance platform that harvests metadata from a variety of sources to create a centralized data catalog.
Business users can find relevant assets via search functionality and data lineage, and better understand them with features such as asset descriptions, field linkage, glossary terms, and ownership information. Users can also interact by adding comments, requesting access, and certifying and sharing assets.
Our team stepped in to complete product installation, load meta-data from existing data sources, train power users, and identify strategies for the client to maximize the platform.
Installation and Metadata Load
Alteryx Connect has three main architectural components: an H2 database that stores the loaded metadata, a java application that processes web requests and indexing, and a front-end website. While these components must be installed on a stand-alone server, the bulk of the metadata load process is carried out on the Alteryx Server.
Alteryx offers pre-packaged analytic apps that connect to various data sources (including Alteryx Gallery). These are installed and run on the Alteryx Server to scrape metadata and push it from the source systems to Alteryx Connect via a REST API.
The metadata load is critical to the success of any data governance project; a platform needs to have up-to-date information that’s valuable for users. We worked with the client to understand who would be using the platform, and what information would be most valuable for them to discover their analyses.
The organization wanted to prioritize table and field descriptions. The naming conventions, while well organized, offered little insight for those unfamiliar with the database architecture. The client maintained a data dictionary, but it was stored in a database table that few had access to.
While it’s possible to bulk load descriptions from an xml or xls file, we instead customized the metadata loader to pull in the dictionary table and join the descriptions within the analytic app. By scheduling this app to run nightly, we eliminated a manual process and ensured that any changes made to the data dictionary would be reflected in Connect.
The client didn’t need all data sources loaded into Connect, so we further customized the metadata loader so that only certain tables (for example, only those in production) were brought in.
We also disabled Views across all schemas. These customizations reduced the volume of data in Connect, making it easier for users to search and find relevant data.
After the initial metadata load, we scheduled the analytic apps to run on a nightly and weekly basis (depending on the data source and how frequently it’s updated). This ensures that data within Connect is up-to-date.
While Alteryx Connect has prebuilt workflows to schedule metadata loads with the Alteryx Server, Keyrus used a batch file to allow for scheduling through the client’s third-party scheduling system.
We used the bulk uploading from xls/xml method to upload business terms into the Glossary and link them to data assets such as workflows and tables.
At the conclusion of our engagement, we left the client with a list of future opportunities and best practices for enriching data assets on the platform. These included:
Encouraging users to engage in social features such as rating, certifying, commenting, and sharing.
Further personalizing the metadata loaders. For example, bringing in row count last modified date for database tables.
Tagging assets used in common reports.
Adding glossary terms or asset connections via bulk upload.
Before putting Connect in production, Keyrus led training sessions on platform administration and deployment strategy. We covered user administration, backup and restore procedures, and troubleshooting and support resources.
We also onboarded business groups. We created customized permissions for users outside of the data strategy group that allowed user contribution such as comments and sharing, while locking down the ability to edit descriptions and lineages. We enabled SSO with Microsoft AD so that users wouldn’t need to maintain a separate username and password, and new users within the organization would automatically have an account associated with their ID.
Alteryx Connect has made it easy for business users to search for and learn about data resources across the company. Prior to Connect, a generalized question from a business group took days, sometimes weeks, to answer. A database expert from the BI team had to follow up with the business group to better understand the business context and narrow the scope of the question. They then had to find relevant data assets, which could be extremely time consuming, and provide the business group with descriptions and context. Follow-up questions and access requests added additional time to this process.
Business groups no longer rely on a few database experts to provide them with information about the database. If a question comes up regarding a table, workflow, or report, the business group is able to log onto Connect and quickly search for the asset.
They can immediately access information about what data is contained within a table, which workflows this data is used in, and if the data is up-to-date (via certification, comments, and version history). If they have further questions regarding a particular asset, they can quickly reach out to the asset owner via a comment or question within the Connect interface, or an email with a link to the particular asset. One 60-second search replaces a process that could previously take weeks, immensely improving efficiency.