Disclaimer: I’m not a data expert. My business is international growth. Data is usually the first challenge that I stumble upon when pushing for scale. Therefore, I have had to set up the data/sales ops/BI/CRM value stream from scratch several times. Data for me is the means to a very concrete end - the business needs to prove that it can grow, that the model is resilient in different cultures, markets and industries. The data setups I do must get me there by the shortest path available.

The path starts at a point where Data is produced and consumed, but there’s a feeling of incompleteness and unreliability. There usually is confusion among managers about what data is where, what data is available. At the same time, data ownership is unclear and teams are hesitant to make decisions about gathering, managing & storing data. The confusion prevents alignment on goals and KPIs and creates data silos between functions. Usually, there is no Data team, or there is a sole engineer unsure where to start. Making a report takes a lot of time and effort.

We move forward a bit - the decision to create Data as a value stream is made. Time to collect expectations. Usually, teams do not know what to ask for, so they ask for “✨Everything✨". You will likely end up with the standard KPIs for the industry - e.g. (recurring) revenue, customer and revenue churn & retention, organic vs paid traffic, customer acquisition cost, customer lifetime value, margin & payback, etc. Starting with requirements is often painful - too many options, too detached from what you usually do. Ideally, start like Good To Great teaches us - with the team. Put together a stellar and lean team. It is not yet time to think whether it needs to be centralised or decentralised - setting up a value stream from scratch can only be done centralised. The team needs to be experienced and capable to keep all stakeholders close, switch between different hats and have a wide overview of all-things-data. When this team has the decision-making power and management support to carry out the necessary changes, then you are off to a great start.

Next step on the path is differentiating between operational use of data and analytical use of data. By moving through the value chain from the output backwards, the team will keep their focus and efficiency.

This is what I mean:

Imagine you have the data - what will you do with it? 
**this is the output**
For example: you want to make money and not lose it. 
This tells your data team that you need customer lifetime value.
**this is the intermediate product**
and they know what to capture, how to capture it, and where to store it. 
**this is the input**

We are midway now - at this point, you have your data architecture & strategy. Operational use (e.g. CRM, ERP) and analytical use (e.g. Domo, Metabase, Tableau) will complement each other, not overlap, and take data from the same clean data layer, which will avoid conflicting reports.

The next step is to start implementing the data pipeline with these guiding principles:

  • Balancing cost & speed: minimum transform time, minimum response time for each use case, minimum cost.
  • Automation to achieve 80/20 ratio of work: moving away from putting fires out and towards increasing value-added. The team should focus 20% of their time on maintenance and ad hoc analysis, and 80% of the time on needle-moving activities (eg. what products the company should be building next). This is achieved by automating ingestion of data, providing a clean data layer (model the data in a way which makes sense for the business), so that reporting can be independent/self service.
  • Streamlining data pipelines: data across teams is connected, they work on the same end goal (i.e. company growth), there is no need to customise a dataset per each use case. When the clean data layer is exposed and validated, then users can answer questions themselves. This gives them autonomy and speed.

The Data Pipeline is data being extracted from upstream sources and loaded into a data warehouse. As a first step, this process is one way. Ideally, the first source is your PSP or the payment data, through a direct integration. Once the data is in the warehouse, the transformations are defined in SQL and executed. How much time & effort these transformations will take depends a lot on the input data sources.

After this first round of work, it is time to think about the stability and compliance of the pipeline. The handover points between sources, DWH and end-user interfaces need to be stable and this is time well-spent. Staging and Production layers in the DWH are a must have. Backup of the data and the code - as well. If it makes sense, the pipeline should be able to run both ways - from data sources to, say the CRM, and from the CRM to the data sources.

We are almost at the end of the path - time to negotiate a Data Governance Framework and a clear Code of Conduct. Data lineage - another byproduct of a well set up data value stream - gives an overview of the data to answer audit trails, deletion and information requests. Data lineage depends on an accurate data inventory that can track:

Where is data stored? How is data protected? How is data updated? Who has access and who is accountable?

Typical questions that data lineage should be able to answer when an error is identified in a report are: Where did it come from? When did the error occur? Who is accountable for that? How can we solve it?

See - we are already talking about seeing errors in reports and being able to tell it is an error. This means the data value stream is set up. The business is now able to show its growth, resilience & its place on the market. It can manage the relationship with its customers efficiently and direct its actions strategically.

Where to now? Once the data is set up, it would be a great time to discuss whether the team needs to continue to be small and centralised, or move toward dedicated capacity, coordinated or independent data units. There are best practices on how to structure the team for scale, depending on the specificities of the organisation. You can also start thinking about sentiment analysis, predictive analytics, data ops, machine learning, data products, etc.