The way we build software has changed quite a lot over the years. The VMware Cloud Marketplace built its team and product using new tools and technologies. One thing that hasn’t changed? Our belief that VMware teams should use their own products. We use VMware to power VMware.
In a previous article, we looked at why developers should care about marketplaces to begin with. In that article, we spoke about how marketplaces accelerate time to market for apps by providing secure access to the components developers need to build world-class solutions. As our teams build out the VMware Cloud Marketplace, we wanted to give you some insights into our journey to build the next-generation digital marketplace for all developers.
In this blog post, we’ll cover four foundational principles that are key to our journey and our philosophy. In each of the four principles we’ll use the ACME Fitness Shop as an example to demonstrate it and we’ll show how we’ve taken that principle into our core engineering practice in the VMware Cloud Marketplace.
Principle #1: Build in MVPs
When you’re building software products, you have a choice on how to build them. You could give the development team all the requirements, and over time, you see the pieces of the product being built. The team starts from the ground, working their way upwards. This means it’s only once the last phase is completed that you’re able to use the product. The other option you have is to let the teams incrementally build the product, making sure that at the end of each increment there is a working product that has value for the business. This means you’ll start with a very minimal product (Minimum Viable Product or MVP), but it allows your most important stakeholders to directly see what you’re building and the direction you’re going,opening up the process for feedback and the chance to steer requirements to solve more business needs. The world-renowned Agile and Lean coach Henrik Kniberg summarizes this in one of his presentations with the following image.
Source: Henrik Kniberg, Crisp
For example, our Cloud Developer Advocate team built out the ACME Fitness Shop in very much the same way. We built everything incrementally. We added new services, like the recent point-of-sales app, over time. New functionality, such as JWT authentication, was also introduced later. We could have spent a few months building out the entire project and only unveil it when all services and functionality were complete, but that would have taken too much time. Instead, we built it incrementally and made sure our stakeholders (our team and other teams who use this project for their demos) could see what we built and influence the direction.
The VMware Cloud Marketplace team started with a very similar journey. As we started, we had in mind where our journey to build the single VMware Marketplace should take us. To make sure that everyone else got to see that as soon as possible, we began developing our prototypes as MVPs. At the end of every sprint, we made sure that we have a fully working product that solves pain points for our customers and ISV partners. These MVPs also help to scale the team and onboard new developers. Every developer on the team always knows what we’re building and how it impacts the product. Our microservice architecture was created in such a way that developers can add features or fix bugs in a single microservice without impacting the broader scope. As more and more microservices were needed, and more developers joined the team to build them, we saw that building MVPs not only helped keep us on track, but also helped keep the amount of rework very low. Microservices architecture gave us the ability to reduce the scope of the problem for each developer, helping them to become more productive quickly. However, this created a new set of challenges related to misalignments between various services as well as between business logic and operational concerns.
Principle #2: Avoid Misalignments
APIs play an important part in building modern applications. Without APIs, it would be incredibly difficult to share data between parts of your application. It would also be difficult to reuse parts of your code or apps in other, often larger, applications. APIs make sure that developers can work together without having to worry about reinventing the wheel.
When we built monoliths, developers already used APIs. Every function or method call in your software is an API call. The functionality for “GetProductsFromDatabase” shouldn’t be repeated in every part of the app where you need to get products. Instead, you build it once and reuse that method in other parts of your codebase.
The ideas behind service-oriented architecture promoted loosely-coupled, but tightly integrated services that perform “business logic”. The architecture describes breaking down large and monolithic applications into services that share data in a well-defined format. At the time, that SOA became popular in enterprises. The well-defined format to share data across services was called SOAP (Simple Object Access Protocol), which was described in WSDL (Web Service Definition Language) files. Later on, when microservices became increasingly more popular, we shifted to REST services with the OpenAPI specification.
In 2015, a team at Google created a new open-source project that helped large microservice landscapes communicate. The transport protocol was updated to HTTP/2 which, among other things, improves speed. Another major change was the use of Protocol Buffers to describe the messages that flow between services. Protocol Buffers, or Protobuf for short, are serialized messages,meaning that the messages cannot be read by humans, unlike JSON and XML, which reduces the amount of data that is transferred over the network.
Just like the OpenAPI specification, Protobufs describe both the messages (data structures) and operations that work on the messages. Writing down the touchpoints between services first is called API-first or contract-first development. Rather than long drawn-out development cycles where you might end up misunderstanding each other, all teams know exactly how the services will communicate.
Within the ACME Fitness Shop, we rely heavily on APIs, and to make the development easier, we’ve created the API specifications as well. This helps our team understand exactly what data and which operations exist in another service and gives our team the flexibility to choose the programming language they want to use for the job. Some of my code is in Go, while others prefer Java.
As the number of microservices grew in the VMware Cloud Marketplace, misalignment was increasingly impeding our progress. We initially started with a repo for our Go backend and a separate repo for our React frontend with gRPC as our interservices communications. Pretty soon, it was quite evident that the misalignment between backend and frontend protobufs was wasting up to 30% of our development cycles. Moving to a single repo with a common protobuf directory allowed us to significantly reduce that wastage.
Another challenge we faced early on was consistency of operational thresholds. For example, what should be the threshold for circuit breakers to trip? How do we make this consistent across all microservices? And what should be this threshold for third-party APIs? Code reviews to maintain such consistency are not scalable. So, early on we settled on a service mesh in our Kubernetes cluster to separate the developer concerns from DevOps concerns. At VMware, we believe that we should be our own customer. This means if there is a specific product within the VMware ecosystem that you could use to solve your problem, you should. For us that meant we got to use the VMware Tanzu Service Mesh as customer zero. That adoption allowed us to separate concerns for developers. Our developers can focus solely on adding value to our marketplace, while the service mesh takes care of operational functionality, like logging, circuit breaking, routing, and service discovery.
Principle #3: Have an SRE mindset
Companies that have very separate development and operations teams tend to see more conflicts between those two teams. Developers want to push their code to production as fast as possible, while the operations team wants to make sure the services don’t break, especially when they’re the ones that are on call.
Google took a different approach to running its services in production,calling it Site Reliability Engineering (SRE). Google describes SRE as “what happens when you ask a software engineer to design an operations team”. The fundamental idea behind SRE is that these teams should design and implement automation to replace the human labor of keeping services running.
The SRE team is usually responsible for things like performance management, emergency response, monitoring, and a lot more tasks that come from the traditional operations side of running software. With the goal of SRE being to automate most, if not all, of those tasks, they are software engineers at heart too. To be able to automate that, the apps will need to send metrics, logs, and trace data. These three categories of data are the main pillars of observability:
- Metrics are a numeric representation of data measured over intervals of time. You can use mathematical modeling and prediction to derive the behavior of the system
- Logs are easy human-readable, structured bits of information from a specific point in time
- Tracing tells an end-to-end story for an application by “stitching” bits of a flow into an easily digestible format
While these terms themselves sound great and perhaps simple to implement, a lot of companies are struggling to do just that. To make a change in the mindset of not only developers, but the entire business, takes time and effort. You’ll have to make a conscious choice and potentially hire new talent to make this shift a success.
The VMware Cloud Marketplace team has taken the “you build it, you run it” adage to heart. All our dev teams are responsible for the microservices they’ve built and deployed to production. There are clear escalation paths that help mitigate issues and help set expectations for both the team and our customers.
For observability, we heavily rely on VMware Tanzu Observability by Wavefront and VMware Tanzu Service Mesh. Our microservices emit data to a collector in the Kubernetes cluster we run, and that collector agent sends the data to VMware Tanzu Observability. From there, our teams have a complete overview of what goes on in the application. It’s also set up to send alerts to our internal Slack channels in case something does go wrong. Being “VMware on VMware” means that we feel the same pain as our customers, but in this case it definitely means we get the same power and functionality from our observability platform as a lot of our customers do.
Principle #4: It’s all about people
I realize that I could fill an entire blog post with management quotes on how every business is always about the people. In DevOps, that’s fundamentally true as well. DevOps isn’t a role that you can hire for, but rather a culture where developers and systems engineers work together to not only keep the service running, but also provide value to the business.
Peter Drucker’s world-famous quote that “culture eats strategy for breakfast”is very much true for the mindset of DevOps and one of the most common reasons why companies struggle to implement DevOps and SRE mindsets. When “this is the way we do things around here” is a common response to questions or ideas for improvement, there won’t be any culture where people want to work towards the greater benefit of the company. When your teams are remote or distributed, having a good culture is even more important. Just do a Bing search to see how many articles there are on building and maintaining a great culture in remote teams.
One of the pillars of any culture is a shared set of values. These values should align with the core values of the company, but they also should reflect what defines the team. At the Copenhagen Techfestival, there is a think tank that convenes every year to discuss specific topics. The result of those discussions is captured in poster format in the Copenhagen Catalog. If you need inspiration or discussion starters for your culture, this is a great place to start.
At VMware, our core values are quite literally epic, or actually EPICC (there are two Cs in there). The VMware Cloud Marketplace team fundamentally believes in the power of teamwork, and with a distributed team and varying skill levels, that can be a challenge. The engineering leadership team started a discussion to find out what was important to the team members to do their best work. In the end, that resulted in using two posters from the Copenhagen Catalog and one important quote from a movie.
Execution is an important part of the culture at VMware, and to us, great execution starts with being a great team. The ideas that we think of together with our amazing customers to improve the marketplace with every single release are nothing without a team to work on those.
As software engineers, we all have a passion to build the best possible product for our customers, and with passion comes discussion. As a team, we foster discussion and we always believe that every single person on the team acts with positive intent. It also means that, regardless of their position, everyone can suggest ideas and improvements for the marketplace.
One of the most prominent themes from the movie Wonder is being kind. One of the Cs in our core values represents the word community. To us, the community of VMware encompasses every customer, every single person that uses or works with our technology, and our employees. With such a large community, it’s important to be kind, especially with regards to our cultural differences. That’s why we believe you don’t always have to be right, but you always need to be kind!
We wanted to give you a little glimpse of how we built the VMware Cloud Marketplace. Over time, we’ll share more details, including exciting technical details. In the meantime, let us know your thoughts by sending Arun, Leon, or the team a note on Twitter.
Looking to better understand VMware’s unique approach to multi-cloud architecture? Get the definitive guide here.