How to Use Modern Data Analytics to Create New Value Streams: Interview with Data Wiz Tareq Abedrabbo
Tareq Abedrabbo is a highly experienced consultant specialised in data engineering and microservices architectures. He was formerly CTO and CEO of OpenCredo and co-founder of CloudCredo.
Hi Tareq, how did you get involved in data and analytics?
When I was seven my family bought a computer for my brother. I didn’t know what it was but was quickly fascinated by the video games. I started learning to programme—in BASIC and Logo (a language for children)—through the books that I could find.
Studying computer science at university felt like a natural move, after which I started my professional career as a Java backend developer building enterprise systems for a telco. I couldn’t help but notice how complex these systems were, so I started looking for ways to simplify. This is when I came across the Spring framework for Java when it was first released in 2004. It was a real eye-opener to see that there are different ways of doing things.
As a result, I became an early contributor to Spring Web Services, a part of the Spring portfolio. This opened up the whole community/open-source angle, which was a new world for me. I jumped into blogging and attending conferences and eventually joined OpenCredo, an independent, vendor-agnostic, open-source software consultancy, which was founded by Spring contributors.
This was also the early days of NoSQL and the idea that you could use non-traditional databases for specific use cases. I quickly become interested in the data aspect, particularly databases like Cassandra (NoSQL), Neo4J, Redis, MongoDB and so on.
From there I got the opportunity to work with really great people designing and implementing real-time data platforms using stream processing and NoSQL databases and so on. Because of my background in Spring and service-oriented architecture over time I also got involved in the whole microservices trend as well as blogging about the point where microservices and data intersect.
How did you make the transition to Contino?
My last role before joining Contino was as CTO then CEO of an engineering consultancy. My role was trying to understand how customers run their businesses; exploring the human aspect, not just the technology aspect. At the end of the day you are solving people problems and your technical solutions need to work for them.
I had decided to move on but knew that my priority was to work with like-minded people on interesting new projects. That’s when Tom Cunliffe got in touch. I was intrigued by what Contino are trying to do, focusing on enterprise DevOps is a very powerful proposition.
I immediately felt the connection with people like Rich Wadsworth (VP) and Brendan Foxen (CTO) and could relate to their desire for growth. But most importantly I simply found the people to be genuinely passionate and friendly.
I also really appreciated that they didn’t try to pigeonhole me into a specific template. There was just sufficient understanding on both sides that I could be of some help.
How has data changed in an enterprise context over the years?
Interest in data experienced a resurgence around ten years ago with Google publishing papers around MapReduce and Amazon around DynamoDB. The open source versions of these technologies then really opened them up to a wider audience.
The NoSQL wave further piqued interest in what could be achieved with these new tools compared to traditional enterprise technologies like relational databases that had previously been used for pretty much everything.
More waves of change followed. Tools like Neo4J—a graph database management system—allowed us to look at data in new ways and approach them in a way that wasn’t practical before. Open source tools like Apache Storm then opened the door for real-time use cases like analytics and data ingestion and so on. Cloud providers got involved as well offering database technologies ‘as-a-Service’.
So the problems you could solve with these new and modern data technologies has broadened considerably.
As the value of data began to be recognised, focus has shifted to data engineering: how you build well-architected systems and distribute data across an organisation.
And how does this shift change how enterprises work with data?
The traditional way of organising data has been to create and manage different siloes. Organisations starting introducing data scientists to deal with this, but one effect of this is to create new siloes.
The traditional siloes used to be between the business, development and operations. As data and DevOps gain traction new siloes arise between the business, DevOps and data. And this is where most organisations are at: still struggling with a lack of end-to-end coherence.
They are busy introducing new technologies (stream processing engines, real-time analytics etc.) that are becoming increasingly available as the barrier to entry lowers. But the real challenge is to bring these together into a coherent system that delivers actual value for the organisation.
If you introduce all these technologies (that are each individually complex) you end up with systems that can result in something sophisticated...but more often than not only end up as something complicated.
So the same people challenges remain as before: how to address siloes across the organisation to get an end-to-end solution.
Why is it so tricky then to work with data?
The problem is that data-based pipelines are complex, while the technologies they are composed of are relatively young with varying levels of maturity. In this context, what’s most important is to build something simple but evolutionary that works for everyone, from the business through Dev and Ops to data scientists.
Building a data pipeline is not as simple as building an application! It requires significant investment. Not only financially, but in terms of taking the time to do the design work to understand how it can be used in the long term. Investment is required because a pipeline, unlike an application, doesn’t have a single use case. A data pipeline represents the capability for not only current but also future use cases that you are not aware of now.
Crucially, you need to invest at the same time as you climb a daunting learning curve because the technologies are not mature. So designing and prototyping are very important so that you can build things that are simple and flexible enough to be usable but can still evolve them as new data-related use cases emerge.
What are the data use cases now and in the future?
Firstly, for processing data for the Internet of Things, which is the constant flow of events that are generated and sent through mobile devices to be ingested and processed and organised by backend (enterprise) systems.
Another is dealing with unreliable data e.g. missing or delayed data. It’s interesting to think of architectural and modelling techniques for dealing with this data. That’s a common use case.
A third is transitioning from traditional batch-based data processing to real-time processing. There are many organisations in which data is received from external sources in infrequent batches, perhaps once a day. Having a real-time system that can break the batches down into individual events lends massive flexibility.
Lastly, building multiple views of the same data set is useful. When data events are coming in the traditional approach is to put everything in one normalised database. With modern data technologies you can create multiple views of the data with each optimised for very different use cases. So, for example, you can take one set of data and simultaneously use it to update a dashboard but also make it available to data scientists.
What does this enable businesses to do?
Modern approaches to data enable businesses to build new kinds of services to provide new value to customers.
For example, businesses with modern applications that need to ingest data from a large number of concurrent users, while simultaneously remaining available and scalable, cannot rely on traditional approaches to processing data.
Take modern logistics companies. They track tens (hundreds?) of thousands of packages in near real-time, distributed via thousands of couriers that are getting the data they need (what to deliver, where and when) via an app on their mobile connecting over the public internet. This would simply not be possible with a traditional application using traditional data back-ends.
Car-sharing and taxi-hailing apps face the same issue of coordinating vehicles, drivers and customers in real-time via thousands of mobile app instances. From the perspective of the user, these data-driven user experiences must be completely reliable, scalable and near real-time or companies would stand to lose huge amounts of money if their database crashed.
These approaches also open up other possibilities that were not practical in the past. For example, simultaneously using multiple views on the same data set across the business. This might allow you to perform machine learning while updating your dashboards, for example.
Where things get really interesting is when streaming data crosses paths with microservice architectures. Distributed microservices emerged to allow more efficient delivery and scaling of new kinds of services. Now we are finding that these modular architectures tie in very well with real-time data streams between the disparate microservices.
But for all the above very solid cloud and DevOps capability is required. It’s impossible to manage huge volumes of data across complex distributed systems without the ability to automate, deploy things quickly and so on.
Do you find that challenges vary between businesses?
With most customers the problem is finding out the real problem!
From my experience there are two things to note here:
- Each organisation is unique to a certain extent, but…
- There are always common challenges where clients can benefit from the experience that consultants bring or seeing how the wider industry is addressing a problem.
The first common challenge is how to implement a sustainable solution that will outlive whatever project we are working on with a customer.
This is linked to another common challenge that organisations lack expertise in the technology they want to adopt. Just as common is when they focus on technology as the solution to any problem, including those that are rooted in issues to do with people or process. Or the belief that the next generation of some technology will make things better when the changes need to be more fundamental. Take moving to the cloud, for example. If you lift-and-shift you are just moving your problems around. You need to have solid modern infrastructure practices in place.
There is also the unanticipated impact of new technologies in terms of how people organise themselves. For example, automation means that time is freed up to focus on other things, which then poses the question of how organisations continue to train and upskill their employees.
Every organisation has specific challenges balancing their business lifecycle from a tech perspective: whether they try to grow or stabilise their existing system or to build a new one, what technologies they are comfortable with and the scale they require.
This balance is ultimately achieved through understanding and communication, i.e. human relationship. This is why it is important to look at code but talk to people. To understand what is trying to be achieved!
What is the future of data in the enterprise?
Many organisations when trying to implement a new data platform will try to build everything from scratch. This is a massive investment due to the steep learning curve, the varying levels of maturity of the technologies involved in creating a working pipeline and the sheer breadth of the solutions available.
In addition, many focus only on the tech without looking at the end-to-end architecture. They then rush to build something without addressing more fundamental issues.
Instead, you need to focus on the data engineering and modelling aspect, on the one hand, and the business problems you’re trying to solve and how you structure your business around them, on the other.
Importantly, you’re not introducing some trivial IT system. You’re trying to build entirely new capabilities using new tools. This is where the cloud becomes very useful. Even if you can’t host data in the cloud, all public cloud providers supply their own flavour of streaming processing, NoSQL databases and other tools. These can be used to experiment and test in order to verify which technologies you need.
You can use the cloud to do very efficient prototyping for your data pipeline. For example, you can build a functioning but simple data platform using AWS Kinesis, DynamoDB, Lambda, S3 and Openshift. All these are available as-a-service so no heavy upfront investment is required.
The cloud is such a fantastic tool for learning and experimenting. I’ve seen so many customers overlook this and jump straight into a solution that ends up not being fit for purpose. My main piece of advice for enterprises would be to first test in the cloud, learn from that, then commit to a longer-term solution after.
Thanks, Tareq!