back to home

devops lessons from the trenches

AUGUST 15, 2024 DEVOPS

As a DevOps intern at Turkcell, I've been immersed in a world where software development and IT operations converge. Recently, I've been reading "The Phoenix Project" by Gene Kim, Kevin Behr, and George Spafford, which has provided remarkable insights into the challenges and solutions in modern IT organizations. This novel, presented as a business fiction, mirrors many of the real-world scenarios I'm encountering in my internship, making it an invaluable resource for understanding DevOps principles in action.

The Book in Context

"The Phoenix Project" follows Bill Palmer, an IT manager at the fictional company Parts Unlimited, who is unexpectedly promoted to VP of IT Operations. The company is struggling with a critical project (the Phoenix Project) that's over budget, behind schedule, and threatening the company's future. Through Bill's journey, the book illustrates how DevOps principles can transform dysfunctional IT organizations into high-performing teams.

What makes this book particularly powerful is that it doesn't just preach theory—it shows DevOps principles applied in situations that IT professionals face daily. As someone at the beginning of my DevOps career at Turkcell, I found several parallels between the fictional Parts Unlimited and real-world enterprise environments.

The Three Ways of DevOps

The book introduces "The Three Ways," which are the underlying principles of DevOps. These principles have shaped my understanding of the work we do at Turkcell:

The First Way: Systems Thinking

This emphasizes the performance of the entire system, not just a specific silo or department. It's about understanding how work flows from development to operations to customers.

At Turkcell, I've observed how our teams are organized to promote end-to-end visibility. Rather than having completely siloed development and operations teams, we have cross-functional teams responsible for specific services. This approach helps everyone understand the impact of their work on the entire system and encourages collaboration.

For example, our CI/CD pipelines provide visibility into code changes from commit to production deployment. This allows us to identify bottlenecks in the delivery process and optimize the flow of work through the system. Implementing monitoring tools that track the entire customer journey, rather than just individual components, has been eye-opening for understanding system-wide performance.

// Example monitoring pattern we use for tracking a request through our system
{
  "requestId": "a1b2c3d4",
  "service": "customerAPI",
  "timestamp": "2024-08-10T13:45:22Z",
  "eventType": "requestReceived",
  "data": {
    "endpoint": "/users/profile",
    "method": "GET",
    "sourceIP": "10.0.0.1"
  }
}

The Second Way: Amplify Feedback Loops

This focuses on creating fast feedback loops at all stages of the software development lifecycle, allowing teams to detect and address issues quickly.

One of the most impactful implementations at Turkcell has been our automated testing and quality gates. When developers commit code, they get immediate feedback on build status, test results, code quality, and security vulnerabilities. This fast feedback prevents defects from propagating downstream and reduces the time to fix issues.

We've also implemented ChatOps to centralize notifications from our CI/CD pipelines, monitoring systems, and incident management tools. This provides the team with real-time feedback on system health and deployment status, allowing us to respond quickly to issues.

Implementation Example: Feedback Dashboards

At Turkcell, we built dashboards that show real-time metrics about our systems, including:

  • Deployment frequency and success rate
  • Mean time to detect (MTTD) and mean time to resolve (MTTR) for incidents
  • Test coverage and test failure trends
  • Performance metrics like response time and error rates

These dashboards create visibility and promote a culture of continuous improvement by making it easy to spot trends and identify areas for optimization.

The Third Way: Culture of Experimentation and Learning

This emphasizes continuous experimentation, taking risks, and learning from success and failure to improve daily work.

At Turkcell, we've embraced this principle by allocating time for innovation and experimentation. Every two weeks, we have a "DevOps Dojo" session where team members can explore new tools, techniques, or improvements to our existing processes.

We've also implemented a blameless post-mortem process for incidents. Rather than pointing fingers, we focus on understanding what happened, why it happened, and how we can improve our systems to prevent similar incidents in the future. This approach has fostered a culture where team members feel safe to take calculated risks and learn from failures.

The Four Types of Work

Another crucial concept from "The Phoenix Project" is the four types of work:

  1. Business Projects: Initiatives that drive business outcomes
  2. Internal Projects: Improvements to internal systems and processes
  3. Changes: Updates to existing systems
  4. Unplanned Work: Firefighting, incidents, and other unplanned activities

Understanding and balancing these types of work has been crucial in my role at Turkcell. As an intern, I've witnessed how unplanned work can disrupt planned work and lead to a cycle of technical debt and more unplanned work. To break this cycle, our team has implemented several practices:

  • Allocating dedicated capacity for internal projects and technical debt reduction
  • Implementing change management processes to reduce risk
  • Using incident patterns to identify systemic issues
  • Visualizing all work in progress to ensure we're not overcommitting

Applying "The Phoenix Project" Lessons at Turkcell

Constraint Identification

In the book, Erik (the mysterious mentor) introduces Bill to the Theory of Constraints, teaching him to identify and manage the bottlenecks in his IT systems. At Turkcell, we've applied this concept by:

  • Mapping our value streams to identify bottlenecks
  • Collecting metrics at each stage to quantify flow and identify constraints
  • Implementing WIP (work-in-progress) limits to prevent overloading constrained resources
  • Continuously reassessing our constraints as we optimize processes
// Example deployment pipeline analysis showing our constraint identification
[Code Commit] → [Build] → [Automated Tests] → [Security Scan] → [Deployment]
    5 min        3 min      25 min (!)         10 min          2 min

// The automated test stage is our current constraint at 25 minutes
// We're focusing on optimizing test execution through parallelization and 
// selective testing based on changed components

Automating Repetitive Tasks

One of the transformative aspects of the Phoenix Project story is how the team gradually automates manual, error-prone processes. At Turkcell, we've been on a similar journey:

  • Creating self-service portals for common requests
  • Implementing infrastructure as code (IaC) using Terraform
  • Building automated deployment pipelines with Jenkins and GitLab CI
  • Using Ansible for configuration management

These automations have significantly reduced lead time for changes and decreased the number of incidents caused by manual errors.

Security as a Shared Responsibility

In the book, security (represented by John) is initially seen as an adversary, blocking progress. Eventually, John becomes integrated into the development process. At Turkcell, we've worked to shift security left by:

  • Integrating security scanning into our CI/CD pipelines
  • Implementing policy-as-code with tools like OPA (Open Policy Agent)
  • Conducting regular security training for developers
  • Creating reusable security patterns that developers can easily adopt

This approach has improved our security posture while maintaining development velocity.

Challenges and Lessons Learned

While "The Phoenix Project" provides an excellent framework for DevOps transformation, implementing these principles in a large organization like Turkcell comes with challenges:

Cultural Resistance

Like Parts Unlimited in the book, cultural resistance has been one of the biggest challenges. Teams comfortable with traditional ways of working are often hesitant to adopt new practices. We've addressed this by:

  • Starting with small wins to demonstrate value
  • Identifying and empowering champions within each team
  • Providing training and support for new tools and practices
  • Celebrating successes and sharing lessons learned

Tool Proliferation

As we've automated more processes, we've faced the challenge of tool proliferation. To manage this complexity, we've:

  • Created a DevOps platform team to standardize tools and practices
  • Implemented an internal developer platform for self-service capabilities
  • Documented our toolchain and integration points
  • Regularly evaluated and consolidated tools where appropriate

Balancing Speed and Stability

Finding the right balance between speed and stability is an ongoing challenge. We've implemented several practices to maintain this balance:

  • Feature flags to separate deployment from release
  • Canary deployments to limit the impact of changes
  • Automated rollback capabilities for deployments
  • SLOs (Service Level Objectives) to objectively measure reliability

Conclusion

As a DevOps intern at Turkcell, reading "The Phoenix Project" has given me a broader perspective on the challenges and opportunities in enterprise IT. The principles outlined in the book—The Three Ways, the Theory of Constraints, and the Four Types of Work—have provided a framework for understanding and improving our processes.

What makes "The Phoenix Project" powerful is that it's not just a technical manual; it's a story about people, processes, and culture. As I continue my internship at Turkcell, I'm applying these lessons to become a more effective DevOps practitioner, focusing not just on tools and technology but on the human and organizational aspects of IT transformation.

If you're working in IT, especially in a large enterprise environment, I highly recommend reading "The Phoenix Project." It will provide you with both the technical understanding and the vocabulary to drive meaningful change in your organization.

And if you're an intern like me, this book offers an invaluable shortcut to understanding the complex dynamics of enterprise IT and DevOps transformation—knowledge that might otherwise take years of experience to acquire.