Highlights
- What this covers: around 30 real cloud interview questions that apply across AWS, Azure, and GCP.
- Format: each answer is what a strong candidate says, plus what the interviewer is really testing.
- Who it is for: aspiring cloud engineers and anyone at Cloud Practitioner level and up.
- Provider-agnostic: concepts first, with AWS/Azure/GCP only as examples, so the knowledge transfers.
- The big one: the shared responsibility model comes up in almost every cloud interview, so it is here in detail.
- How to use it: understand the concepts, then map them to your target provider's names. The ideas outlast the service menus.
The job ad says "AWS" or "Azure" or "GCP", and you worry you studied the wrong one. Then the interview starts and the questions are not really about any single provider: what is the shared responsibility model, when does elasticity matter, how do you design for high availability. The vendors differ in naming, but the concepts underneath are the same, and those concepts are what cloud interviews actually test.
That is the good news for preparing. Learn the provider-agnostic fundamentals (the service models, the shared responsibility boundary, scaling, storage types, availability, security, and cost) and you can speak intelligently about any cloud, then map the specifics to whichever one the job uses. The questions below are the ones that come up across providers and roles, grouped from the fundamentals you must not fumble to the design scenarios that show you can think like a cloud engineer. Each answer is written the way you would say it in the room, with a note on what the interviewer is really probing. These are conceptual answers, verified against authoritative cloud references rather than command output.
How Cloud Interviews Actually Work
Cloud interviews tend to come in three shapes, often mixed together.
Fundamentals. Cloud models, benefits, the shared responsibility model, scaling. Crisp, one or two sentences each, and they filter out people who only know one provider's buttons.
Concept depth and selection. When to use object versus block storage, high availability versus disaster recovery, serverless versus containers. This is where the signal is, because the terms sound similar until you understand them.
Architecture scenarios. "Design a scalable web app", "a cost bill spiked, what do you do", "how would you secure this environment". These reward judgment and naming trade-offs.
One anchor before the questions: cloud is about renting capability on demand and being deliberate about the trade-offs (cost versus resilience, control versus convenience, speed versus durability). The strong answers name those trade-offs. If you are early in the journey, KodeKloud's What Is Cloud Computing? guide is a solid grounding.
Fundamentals
Q1. What is cloud computing?
Cloud computing is the delivery of computing resources (servers, storage, databases, networking, software) over the internet on demand, where you pay only for what you use instead of buying and running your own hardware. Rather than waiting weeks for a server, you provision one in minutes and release it when you are done. The framing that lands: the cloud turns computing into a utility you rent and scale on demand, shifting a big upfront investment into a flexible operating cost.
What they're really testing: that you understand the on-demand, pay-as-you-go model, not just "servers on the internet."
Q2. What is the difference between IaaS, PaaS, and SaaS?
Three levels of how much the provider manages. IaaS gives you raw infrastructure (virtual machines, storage, networking) and you manage the OS and up. PaaS gives you a managed platform to deploy apps without managing the OS or runtime. SaaS is finished software you simply use. As you move from IaaS to SaaS you trade control for less operational burden. Concrete examples (a cloud VM, a managed app platform, a web email service) make it land, and this model is the foundation for the shared responsibility question (Q6).
Q3. What is the difference between public, private, hybrid, and multi-cloud?
Public cloud uses a provider's shared infrastructure. Private cloud is dedicated to one organization (on-premises or hosted). Hybrid connects public and private so workloads can span both, common when some systems must stay on-premises for compliance. Multi-cloud means using more than one public provider, often to avoid lock-in or use each one's strengths. The nuance to add: hybrid is about public plus private, multi-cloud is about multiple public providers, and they are not the same thing, which candidates often blur.
Q4. What are the main benefits of the cloud?
A handful worth naming: elasticity (scale up and down with demand), agility (provision in minutes, experiment cheaply), global reach (deploy near users worldwide), no capital expense (pay as you go instead of buying hardware), and managed reliability (the provider runs the data centers). The one that matters most depends on the business, but the through-line is converting fixed cost and slow provisioning into flexible, on-demand capacity. Tying benefits to business outcomes (speed, cost, scale) is stronger than listing adjectives.
Q5. What is the difference between CapEx and OpEx, and how does the cloud change it?
CapEx (capital expenditure) is large upfront spending on owned assets, like buying servers. OpEx (operating expenditure) is ongoing spending on consumed services, like a monthly cloud bill. The cloud shifts IT from CapEx to OpEx: no big purchase, you pay for usage and stop paying when you stop using. The business angle interviewers like: this lowers the barrier to start, matches spend to actual demand, and avoids over-provisioning for a peak that may never come.
Q6. What is the shared responsibility model?
It is the framework that splits security duties between the provider and you. The provider is responsible for the security of the cloud (the physical data centers, hardware, and core infrastructure), and you are responsible for security in the cloud (your data, access management, configuration, and, depending on the model, the OS and applications). Crucially, the boundary moves with the service model: with IaaS you secure the OS and up, with PaaS the provider handles the runtime, and with SaaS your responsibility narrows to data and user access. Most cloud breaches are customer-side misconfigurations, not provider failures, which is exactly why this is asked so often. Our shared responsibility model explainer goes deeper.
Q7. What is the difference between scalability and elasticity?
They sound alike but differ. Scalability is the ability to handle growth by adding capacity, which can be planned and gradual. Elasticity is automatically scaling out and back in in real time as demand fluctuates, so you add instances during a traffic spike and remove them after. The distinction interviewers want: scalability is about being able to grow, elasticity is about doing it automatically and in both directions, which is what keeps you from paying for idle capacity. Elasticity is one of the defining advantages of cloud over a fixed data center.
Q8. What is the difference between vertical and horizontal scaling?
Vertical scaling (scaling up) means a more powerful machine, more CPU or memory, which is simple but capped and often needs downtime. Horizontal scaling (scaling out) means adding more machines behind a load balancer, which scales much further and is the cloud-native approach, usually paired with auto-scaling. The trade-off: vertical is easy but limited, horizontal is more complex (your app must support running as multiple instances) but far more scalable and resilient.
Q9. What is the difference between a region and an availability zone?
A region is a geographic location containing data centers. An availability zone is one or more physically isolated data centers within a region, each with independent power, cooling, and networking. You deploy across multiple availability zones for high availability (surviving a single data center failure) and across regions for disaster recovery or to serve users closer to where they are. The pattern to state: multiple zones for resilience within a region, multiple regions for geographic reach and disaster recovery.
Compute, Storage, and Networking
Q10. What are the main compute options in the cloud?
Three broad models. Virtual machines give you full control of an OS, the most flexible and the most to manage. Containers package an app with its dependencies to run consistently anywhere, usually orchestrated (Kubernetes) for scale. Serverless (functions) runs your code on demand with no servers to manage at all, scaling automatically. The choice is about control versus operational overhead: VMs for full control or legacy apps, containers for portable microservices, serverless for event-driven workloads where you want zero infrastructure management.
Q11. What is the difference between object, block, and file storage?
Three storage types for different needs.
The one-liner: object for unstructured data at scale, block for the disk behind a VM or database, file for a shared filesystem multiple machines mount. Picking the right one by data shape is the whole point.
Q12. What is a CDN?
A content delivery network is a globally distributed set of edge servers that cache your content close to users, so a request is served from a nearby location instead of crossing the world to your origin. The benefits: lower latency, less load on your origin servers, and resilience to traffic spikes. You use it for static assets (images, video, scripts) and increasingly for dynamic content too. The point to make: a CDN improves user experience and offloads your backend by serving cached copies from the edge.
Q13. What does a load balancer do?
It distributes incoming traffic across multiple servers so no single one is overwhelmed, which is what makes horizontal scaling and high availability work. It also health-checks targets and stops sending traffic to unhealthy ones, so a failed instance is bypassed automatically. The two wins to name: it spreads load for scale, and it routes around failures for availability. It is the piece that sits in front of your fleet and makes many instances look like one endpoint.
Q14. What is auto-scaling?
Auto-scaling automatically adjusts the number of running instances based on real-time metrics like CPU, memory, or request count, adding capacity when demand rises and removing it when demand falls. It is how you achieve elasticity (Q7) in practice. The benefits are both performance (you handle spikes without manual intervention) and cost (you are not paying for idle capacity off-peak). Pairing it with a load balancer is the standard pattern for a scalable, cost-efficient tier.
Q15. What is a VPC (virtual private cloud)?
A VPC is your own logically isolated network within the public cloud, where you control the IP address range, subnets, routing, and gateways, and place resources like VMs securely. It is the networking foundation: you decide what is reachable from the internet and what stays private. Every provider has its own name (VPC on AWS and GCP, Virtual Network on Azure), but the concept is the same, a private slice of the cloud network that you define and secure.
Q16. What is the difference between a public and a private subnet?
It comes down to internet reachability. A public subnet has a route to an internet gateway, so resources in it (a load balancer, a bastion host) can be reached from or reach the internet. A private subnet has no direct internet route, so resources in it (databases, application servers) are isolated from inbound internet traffic and reach out only through a NAT gateway if needed. The design principle: put only what must be internet-facing in public subnets and keep everything else private, which is a core security pattern (Q30).
Q17. What is serverless computing?
Serverless lets you run code without provisioning or managing any servers: you deploy a function, the provider runs it on demand, scales it automatically (including down to zero when idle), and you pay only per execution. It is ideal for event-driven and spiky workloads. The trade-offs to name honestly: you give up control of the environment, there can be cold-start latency when a function spins up after being idle, and long-running or steady high-throughput workloads can be cheaper on containers or VMs. "No servers to manage, pay per use, but watch cold starts" is the balanced answer.
Reliability, Security, and Cost
Q18. What is the difference between high availability, fault tolerance, and disaster recovery?
Three related but distinct goals. High availability minimizes downtime through redundancy and failover, so the system stays up through most failures (but a failover may cause a brief blip). Fault tolerance is stronger: the system continues with zero interruption even when a component fails, usually through full redundancy, at higher cost. Disaster recovery is the plan and tooling to restore service after a major event (a region outage), often involving backups and a secondary region. The progression: HA reduces downtime, fault tolerance eliminates it for covered failures, DR gets you back after a catastrophe.
Q19. What is the difference between RTO and RPO?
Two disaster-recovery targets people swap. RTO (recovery time objective) is the maximum acceptable time to restore service after an outage, how long you can be down. RPO (recovery point objective) is the maximum acceptable data loss, measured as a time window, how much recent data you can afford to lose. A four-hour RTO means recover within four hours; a 15-minute RPO means lose at most 15 minutes of data, which dictates how often you back up or replicate. Getting these the right way round (RTO is time to recover, RPO is data loss) is the whole test.
Q20. What are the basics of cloud security and IAM?
Identity and access management controls who can do what on which resources, and the governing principle is least privilege: grant only the permissions needed, nothing more. Beyond IAM, the basics include using roles and temporary credentials instead of long-lived keys, enabling multi-factor authentication, segmenting networks (Q16), encrypting data (Q21), and logging everything for audit. The mindset to convey: in the cloud, identity is the new perimeter, so tight, least-privilege IAM is the single highest-impact security control.
Q21. What is the difference between encryption at rest and in transit?
At rest protects stored data (in databases, object storage, disks) by encrypting it on disk, so a stolen drive or storage account is unreadable. In transit protects data moving across the network (between client and server, or service to service) using TLS, so it cannot be intercepted or tampered with in flight. You want both, since they defend different attack points, and most providers make at-rest encryption easy to enable by default. Naming both, and that they cover different threats, is the complete answer.
Q22. How do you control cloud costs?
Several levers, and a good answer names a few. Right-size over-provisioned resources and shut down non-production environments off-hours. Use auto-scaling so you pay for capacity only when needed. Buy reservations or savings plans for steady workloads and spot/preemptible instances for interruptible ones. Move cold data to cheaper storage tiers. And set up cost monitoring, budgets, and alerts with tagging so you can see where money goes. The mindset to convey: cost optimization is continuous and everyone's responsibility, not a one-time cleanup, because the same on-demand ease that helps you scale also makes it easy to overspend.
Q23. What is infrastructure as code, and why does it matter in the cloud?
Infrastructure as code means defining your cloud resources in version-controlled configuration files (Terraform, CloudFormation, Bicep) instead of clicking through a console. You describe the desired state and a tool provisions it. It matters because it makes infrastructure repeatable, reviewable, and auditable: you can recreate an identical environment from a file, see every change in version history, and avoid the "it works in staging because someone configured prod by hand" drift. In the cloud, where you create and destroy resources constantly, IaC is what keeps it all reproducible.
Q24. What is the difference between AWS, Azure, and GCP at a high level?
They are the three major public clouds offering similar core services (compute, storage, networking, databases) under different names. AWS is the largest with the broadest service catalog and longest track record. Azure is strong in the enterprise and hybrid space with deep Microsoft ecosystem integration. GCP is known for data, analytics, and Kubernetes (which Google originated). The smart framing: the fundamentals transfer across all three, the differences are naming, ecosystem fit, and specific strengths, and which is "best" depends on the organization's existing stack and needs. Our cloud certification roadmap compares the paths, and for a provider-specific deep dive our Azure interview guide drills into one stack.
Advanced and Scenarios
Q25. What is cloud-native, and how do microservices differ from a monolith?
Cloud-native describes applications designed to exploit the cloud: built as small, independent services, packaged in containers, scaled elastically, and automated through CI/CD. A monolith is a single deployable unit where all functionality is tightly coupled, simple to start but hard to scale and change as it grows. Microservices split the app into independently deployable services that scale and fail in isolation, at the cost of operational and networking complexity. The honest framing: microservices are not automatically better, they trade simplicity for independent scaling and deployment, and a monolith is often the right call early on.
Q26. What is a managed service, and why would you use one?
A managed service is one where the provider handles the operational heavy lifting (provisioning, patching, scaling, backups, availability) so you consume the capability without running the underlying software, for example a managed database instead of installing one on a VM. You use it to offload undifferentiated operational work and reduce risk, trading some control and potentially higher cost for far less maintenance. The judgment to show: managed services let a small team punch above its weight, and the trade-off is less control and possible lock-in, which you weigh per workload.
Q27. What are the common cloud migration strategies?
The well-known set is the "R" strategies. Rehost ("lift and shift") moves an app to the cloud largely unchanged, fast but it does not gain cloud benefits. Replatform makes a few optimizations (a managed database) without rearchitecting. Refactor (re-architect) redesigns the app to be cloud-native, the most effort but the most benefit. There are others (repurchase, retire, retain), but those three are the spectrum to know. The reasoning to convey: you choose per application based on its value and how much benefit justifies the effort, and most real migrations are a mix.
Q28. Design a scalable, highly available web application in the cloud. What are the building blocks?
Walk the layers. Put your app instances (or containers) behind a load balancer, spread across multiple availability zones so one data center failing does not take you down. Add auto-scaling so capacity follows demand. Use a managed, replicated database (multi-AZ) for the data tier, and object storage plus a CDN for static assets. Keep app and database tiers in private subnets, exposing only the load balancer. Define it all with infrastructure as code, and for disaster recovery replicate to a second region. The structure they want: redundancy at every layer, automatic scaling, and no single point of failure. You can build patterns like this hands-on in the AWS sandbox playground.
Q29. A cloud bill suddenly spiked. How do you investigate?
Methodically, using the provider's cost tooling. Open the cost management dashboard and break the spend down by service, region, and tag to find what jumped. Common culprits: a forgotten resource left running, auto-scaling that ran away under load or attack, data egress charges, a misconfigured backup, or untagged resources from a new deployment. Once you find it, fix the immediate cause, then prevent recurrence with budgets and alerts, tagging policies, and guardrails. The signal here is a calm, data-driven process (look at the breakdown first) rather than guessing, plus the instinct to add alerts so it cannot silently happen again.
Q30. How would you secure a cloud environment?
Layer the controls, because security is not one setting. Start with identity: least-privilege IAM, roles over long-lived keys, and MFA. Then network: keep databases and app servers in private subnets, expose only what must be public, and use security groups and firewalls to restrict traffic. Protect data with encryption at rest and in transit. Manage secrets in a dedicated vault, never in code. And monitor and audit with logging and alerting so you can detect and respond. Tie it back to the shared responsibility model (Q6): the provider secures the infrastructure, but configuring all of this correctly is on you, which is where most breaches actually happen.
Quick-Revision Cheat Sheet
The night before, scan this instead of rereading the guide.
Conclusion
The thread through every answer is that cloud is about renting capability on demand and choosing trade-offs deliberately: cost against resilience, control against convenience, simplicity against scale. The vendors rename things, but the fundamentals (the service models, the shared responsibility boundary, scaling and elasticity, storage types, availability, and security) are stable and provider-agnostic, which is exactly why interviews lean on them. Learn the concepts and the provider specifics become a vocabulary you map onto them.
In the last 48 hours, do not try to memorize one provider's whole service list. Make sure you can explain the shared responsibility model and walk through one highly available design out loud, because those two come up constantly. Then get hands on a free tier and actually build a small piece of it. KodeKloud's 100 Days of Cloud challenge is built for exactly that, with daily hands-on tasks that build AWS and Azure side by side, and the Cloud learning path and AWS Cloud Practitioner course build the fundamentals with hands-on labs. This KodeKloud explainer is a good starting point:
Ready to Build in the Cloud, Not Just Read About It?
Reading cloud answers is one thing. Standing up a network, deploying across availability zones, and locking down access with least-privilege IAM are different skills, and they only come from doing the work. KodeKloud's Cloud learning path takes you from fundamentals through hands-on labs across the major providers, so these concepts become things you have actually built, not just terms you can define.
Create your free KodeKloud account ->
FAQs
Q1: Should I learn AWS, Azure, or GCP first?
Pick the one the jobs you want use most (AWS has the largest market share, Azure dominates Microsoft-heavy enterprises). The good news from this guide is that the fundamentals transfer, so once you know one well, learning the next is mostly mapping new names onto concepts you already understand.
Q2: Do I need a certification to get a cloud job?
It helps, especially for breaking in: a Cloud Practitioner or Associate-level cert proves baseline knowledge and gets you past some filters. But interviews still test whether you can apply it, so pair the cert with hands-on practice on a free-tier account. A cert plus a small project you can talk about beats a cert alone.
Q3: How technical do cloud interviews get?
It depends on the role. Cloud-adjacent roles stay conceptual (models, benefits, security basics), while cloud engineer and architect roles go deep into design, networking, and trade-offs, often with a "design this system" scenario. Match your depth to the role, and always be ready to reason about trade-offs rather than recite service names.
Q4: Are these enough on their own?
They cover the concepts that come up most across providers, but cloud rewards hands-on practice. Spin up a free-tier account, build a small architecture, and secure it. An interviewer can tell within one follow-up whether you have actually deployed something or only read about it.
Sources: Cloud Shared Responsibility Model Explained; What Is Cloud Computing?; Cloud Certification Roadmap: AWS vs Azure vs GCP; Cloud Learning Path; AWS Cloud Practitioner course. Concepts verified against the NIST cloud definition and major-provider shared-responsibility documentation.
Discussion