Properly Staffing Your Data Team
In a prior piece, we discussed the growing amount of data in the world and the questions your team should consider before building out a data pipeline. Now that you have right sized your tech approach and constructed an effective pipeline for aggregating, organizing, and moving the data so that it can be stored, analyzed, and acted upon, it’s time to properly staff your data team.
I like to think of this as the data equivalent to UPS or FedEx. These logistics organizations are responsible for moving massive amounts of packages around the world safely and on time. To do this, they don’t assign a package to a person to shepherd it throughout their entire logistics pipeline. Instead, they have specialists at every point in this process – from intake to warehousing to driving – that make this an efficient and profitable undertaking for the company.
Your data pipeline demands this same level of specialization and precision. We see many organizations new to the field assume that anyone with the word data in their title can be plugged in at any point on the pipeline. But just as no one wants an inexperienced back office person driving a UPS delivery truck in congested traffic, neither can you afford to have a UI expert playing the role of a data engineer.
Here then is a quick primer on the basic titles and skills best suited to fulfill responsibilities along the data pipeline.
Data Architect
Just as the title implies, data architects are responsible for designing the “blueprint” of your overall data system. A data science team leans on a data architect to visualize, organize, and prepare data within a framework that optimizes it for the data scientists, engineers, and analysts that will access it along the pipeline. They must understand everyone’s role and skill set, then match them to the technology and business use cases that will produce the best results most efficiently. In the simplest sense, they are the data organizers for your team.
Data Engineer
A data engineer’s primary job responsibility is to prepare data for analytical or operational uses. If the data architect is the designer, then the data engineer is the builder. Their specific tasks might vary from organization to organization, but they are typically tasked with building data pipelines to pull together information from different source systems; integrating, consolidating and cleansing data; and then structuring it for use in individual analytics applications.
Data Scientist
Data scientists are distinct from data engineers. They use the data harnessed, organized, and delivered by data engineers to understand how it might evolve over time. They are experts in modeling data interactions in order to predict how the data might change or behave in the future.
Data Analyst
Similar to data scientists, analysts are focused on manipulating the data delivered by the pipeline. But unlike scientists, data analysts assess what it means for the business. They take mountains of data and probe it to spot trends, make forecasts, and extract information that helps their employers make better-informed business decisions.
UI or Front-End Engineer
This important role helps provide the finished product for business leaders and the broader team. They are responsible for creating the reports and tools that make the data visual for and actionable by the company.
Business Leader
While this role can span a number of working titles within a company and might not be anyone with technology or data-specific skill sets, they can serve an invaluable role on your data team. Often, their input is critical to identifying what is most important in the piles of data your organization has collected. They can also help span divides between the technology and business teams if one exists within your organization. Their job is to put the data to work for the company.
Of course, different companies might have nuances within these titles, skill sets, or roles. However, the premise remains the same. Multiple people along the pipeline will perform specialized tasks relative to their technology, analytics, or business capabilities. Every one of them is vital. And just like with UPS or FedEx, ensuring that each is in the proper place along the pipeline is essential to making the overall data system perform up to its fullest potential.