Serverless Computing for HPC and AI: A Systematic Review of 122 Research Studies

📌 Key Takeaways

  • 122 Studies Reviewed: The most comprehensive systematic literature review covering serverless computing for HPC, AI, and big data from 2018 to early 2025, published in IEEE Access.
  • Eight Research Directions: A taxonomy identifies eight primary research areas including resource management, workflow orchestration, accelerator integration, and performance optimization.
  • Nine Use Case Domains: Applications span scientific simulation, machine learning training, genomics, video processing, linear algebra, data analytics, graph processing, geospatial computing, and IoT pipelines.
  • Cloud-HPC Convergence: Cloud providers now integrate GPU accelerators, high-speed interconnects, and HPC-tailored services like AWS Parallel Computing Service, blurring the boundary with traditional supercomputing.
  • Growing Research Interest: Publication trends show exponential growth in serverless HPC research, with increasing collaboration networks among academic and industry researchers worldwide.

What Is Serverless Computing for HPC?

Serverless computing for high-performance computing represents a paradigm shift in how compute-intensive workloads are deployed and executed. Traditional HPC systems rely on rigid resource allocation, dedicated supercomputers, and batch scheduling systems that often lead to inefficient utilization and prolonged job queuing times. Serverless computing flips this model by offering automatic scaling, pay-per-use billing, and complete infrastructure abstraction—allowing researchers and engineers to focus entirely on their computational logic.

The concept, first popularized by Amazon with the launch of AWS Lambda in 2014, has evolved dramatically over the past decade. What began as a simple event-driven function execution platform has matured into a versatile computing paradigm capable of supporting massively parallel workloads. In the context of HPC, serverless computing enables applications to scale from zero to thousands of concurrent function instances on demand, then scale back to zero when computation completes—eliminating the idle resource costs that plague traditional cluster deployments.

A landmark systematic literature review published in IEEE Access by researchers at the University of Pisa and Italy’s National Research Council analyzed 122 research articles to map this emerging field. The review, conducted by Valerio Besozzi, Matteo Della Bartola, Patrizio Dazzi, and Marco Danelutto, introduces the term “High-Performance Serverless Computing” to describe research at the intersection of serverless and HPC paradigms. As organizations increasingly rely on AI, machine learning, and big data analytics, understanding how serverless architectures can support these workloads becomes essential for both cloud architects and HPC practitioners. The convergence between cloud and HPC infrastructures—exemplified by Microsoft’s Azure Eagle cloud supercomputer appearing on the TOP500 supercomputer list—signals a fundamental transformation in how we approach compute-intensive problems.

Why Serverless Computing Matters for AI and Big Data

The explosive growth of artificial intelligence and big data applications has created unprecedented demand for elastic, scalable computing resources. Traditional approaches require organizations to provision and maintain dedicated GPU clusters, manage complex scheduling systems, and absorb significant costs for idle resources. Serverless computing addresses these pain points by providing three core characteristics that align perfectly with modern AI and big data workloads: no operational logic (NoOps), utilization-based billing, and automatic scaling from zero to near-infinite capacity.

For AI workloads specifically, serverless computing enables researchers to parallelize hyperparameter searches across thousands of concurrent functions, distribute model training across elastic compute resources, and deploy inference endpoints that scale automatically with demand. As the Stanford AI Index Report 2025 documents, the scale of AI model training continues to grow exponentially, making elastic infrastructure increasingly critical.

Big data processing benefits similarly from the serverless model. Map-reduce operations, ETL pipelines, and real-time analytics can leverage thousands of parallel function invocations, processing terabytes of data in minutes rather than hours. The pay-per-invocation model means organizations only pay for actual computation time, eliminating the waste inherent in maintaining always-on Hadoop or Spark clusters for intermittent workloads. The review identifies big data analytics as one of the most active application domains in serverless HPC research, with frameworks like Lithops and PyWren specifically designed to bring serverless parallelism to data-intensive scientific computing.

Systematic Review Methodology and Scope

The systematic literature review follows the rigorous methodology proposed by Kitchenham et al., a well-established framework for conducting structured reviews in software engineering and computer science. The researchers analyzed 122 research articles published between 2018 and early 2025, applying strict inclusion and exclusion criteria to ensure comprehensive coverage of the high-performance serverless computing landscape.

The review scope encompasses three primary application domains: high-performance computing (HPC), artificial intelligence (AI), and big data processing. Articles were sourced from major academic databases and conference proceedings, covering venues such as IEEE, ACM, and top cloud computing conferences. The methodology enabled the researchers to propose a comprehensive taxonomy comprising eight primary research directions and nine targeted use case domains—providing the first structured map of this emerging field.

What makes this review particularly valuable is its bibliometric analysis component. Beyond classifying research directions, the authors analyzed publication trends over the seven-year period, revealing exponential growth in serverless HPC research. They also mapped collaboration networks among authors, identifying key research clusters and influential contributions that are shaping the field. This dual approach—combining technical classification with community analysis—offers readers both a practical understanding of current capabilities and insight into where the field is heading. For practitioners evaluating serverless architectures for compute-intensive applications, this review provides the most current and comprehensive reference available, as detailed in the McKinsey State of AI 2025 analysis of enterprise cloud adoption trends.

Explore the full systematic review as an interactive experience — navigate findings, taxonomy, and trends visually.

Try It Free →

Taxonomy of Serverless HPC Research Directions

The review’s most significant contribution is a comprehensive taxonomy that classifies the 122 analyzed studies into eight primary research directions. This classification framework provides researchers and practitioners with a structured understanding of where efforts are concentrated and where gaps remain in the serverless HPC landscape.

The eight research directions identified in the taxonomy encompass the full spectrum of challenges in adapting serverless computing for high-performance workloads. Resource provisioning and management addresses how serverless platforms allocate compute, memory, and storage resources for demanding HPC applications—a fundamentally different challenge than managing lightweight event-driven functions. Workflow orchestration focuses on composing and coordinating complex computational pipelines expressed as directed acyclic graphs (DAGs), enabling multi-step scientific simulations and AI training pipelines.

Performance optimization represents a critical direction, tackling issues like cold start mitigation, execution time limits, and inter-function communication overhead that directly impact HPC workload viability. Accelerator integration examines how serverless platforms can leverage GPUs, FPGAs, and other hardware accelerators essential for AI training and scientific computing. Additional research directions include programming models and abstractions for expressing parallel computations, data management strategies for handling large datasets across stateless functions, scheduling and placement algorithms optimized for compute-intensive tasks, and security and isolation mechanisms that balance protection with performance in multi-tenant serverless environments.

This taxonomy reveals that while resource management and workflow orchestration are well-explored areas with numerous published solutions, accelerator integration and heterogeneous computing support remain comparatively underdeveloped—representing significant opportunities for future research and commercial innovation.

Serverless Platforms and Architectures for HPC

The serverless platform ecosystem has evolved significantly since AWS Lambda’s introduction, with both commercial and open-source solutions offering varying capabilities for compute-intensive workloads. Understanding the architectural differences between these platforms is essential for organizations evaluating serverless approaches for HPC, AI, and big data applications.

Among commercial platforms, AWS Lambda remains the most widely used, offering integration with the broader AWS ecosystem including GPU instances, S3 storage, and Step Functions for workflow orchestration. Azure Functions provides similar capabilities within Microsoft’s cloud ecosystem, while Google Cloud Run and Cloud Run functions offer container-based serverless execution with strong Kubernetes integration. Each platform imposes its own constraints—memory limits typically range from 128MB to 10GB, execution timeouts from 5 to 15 minutes, and billing granularity from 1ms to 100ms increments.

The open-source ecosystem offers critical alternatives for organizations requiring customization or avoiding vendor lock-in. Apache OpenWhisk provides an event-driven platform deployable on private infrastructure. Knative extends Kubernetes with serverless capabilities, making it ideal for hybrid HPC-cloud deployments. Nuclio specifically targets high-performance data science workloads with optimized function routing and GPU support. OpenFaaS offers a developer-friendly platform with support for any containerized workload.

The review also highlights three serverless service models relevant to HPC: Function-as-a-Service (FaaS), Backend-as-a-Service (BaaS), and Container-as-a-Service (CaaS). While FaaS dominates current research, serverless-enabled CaaS platforms like AWS Fargate and Azure Container Apps are gaining traction for HPC workloads because they support longer execution times and provide more control over the execution environment. Execution environments have also diversified beyond traditional Linux containers, with emerging alternatives including WebAssembly (WASM) for lightweight sandboxed execution and unikernels for enhanced isolation without kernel-sharing overhead.

Use Cases Across Nine Application Domains

The systematic review identifies nine distinct application domains where serverless computing has been applied to compute-intensive workloads, providing concrete evidence of the paradigm’s versatility and growing adoption across scientific and industrial computing.

Scientific simulation represents one of the most explored domains, with researchers leveraging serverless parallelism for Monte Carlo simulations, computational fluid dynamics, and molecular dynamics. These workloads benefit from serverless elasticity because they involve embarrassingly parallel computations that can be distributed across thousands of independent function invocations.

Machine learning and AI workloads encompass distributed model training, hyperparameter optimization, and inference serving. Frameworks like Cirrus and specialized serverless ML platforms enable researchers to parallelize training across cloud functions while managing model state through external storage. The Gemini 2.5 Technical Report illustrates the scale of modern AI infrastructure that drives demand for elastic serverless approaches.

Genomics and bioinformatics has emerged as a particularly active domain, with serverless architectures powering genome assembly, variant calling, and large-scale sequence alignment. The bursty nature of genomics workloads—requiring massive parallelism during analysis phases followed by minimal compute during data preparation—makes them ideal candidates for serverless execution.

Additional domains include video and image processing for tasks like transcoding and computer vision pipelines, linear algebra computations for matrix operations fundamental to scientific computing, big data analytics for ETL and real-time stream processing, graph processing for social network analysis and knowledge graphs, geospatial computing for satellite imagery and climate modeling, and IoT data pipelines for processing sensor streams at scale. Each domain presents unique requirements for execution time, memory, communication patterns, and data locality that influence serverless platform selection and architecture design.

Transform complex research papers into engaging interactive experiences your team will actually read.

Get Started →

Cold Start and Serverless Performance Challenges

Despite its promise, serverless computing for HPC faces several significant technical challenges that the review systematically catalogues. Understanding these limitations is critical for architects designing serverless solutions for compute-intensive applications.

The cold start problem remains the most widely discussed challenge. When a serverless function has not been invoked recently, its execution environment must be provisioned from scratch—pulling container images, initializing runtimes, and loading application dependencies. This startup latency, which can range from hundreds of milliseconds to several seconds depending on the platform and function complexity, is particularly problematic for HPC workloads where thousands of functions must start simultaneously for parallel computations. The review identifies cold start mitigation as one of the most active research areas, with solutions including pre-warming strategies, snapshot-based restoration, and lightweight execution environments using WebAssembly or unikernels.

Execution time limits pose another fundamental constraint. Most commercial FaaS platforms cap function execution at 5-15 minutes, which is insufficient for many HPC workloads. While serverless-enabled CaaS platforms offer longer timeouts, they sacrifice some of the fine-grained scaling benefits of pure FaaS. Researchers have developed checkpointing and task decomposition strategies to work within these constraints, but execution limits remain a significant barrier for long-running scientific computations.

Inter-function communication overhead is a third critical challenge. Traditional HPC applications rely on high-speed interconnects like InfiniBand for inter-process communication, with latencies measured in microseconds. Serverless functions communicate through external services like message queues or cloud storage, introducing latencies measured in milliseconds—orders of magnitude slower. This makes serverless computing currently unsuitable for tightly-coupled parallel applications that require frequent synchronization, though research on optimized communication layers is progressing rapidly.

Additional challenges include limited GPU and accelerator support in most serverless platforms, the overhead of managing state externally for stateless functions, vendor lock-in with proprietary platforms, and the difficulty of debugging distributed serverless applications. The review notes that while these challenges are significant, the research community is making steady progress, with each successive year bringing new solutions and architectural innovations. For context on how AI infrastructure challenges are being addressed more broadly, the NVIDIA FY2025 Annual Report details the hardware acceleration landscape that underlies these serverless computing advances.

Cloud-HPC Convergence and Future Trends

The review reveals a clear and accelerating trend toward convergence between cloud computing and high-performance computing infrastructures—a development that positions serverless computing as a key enabling technology for next-generation compute-intensive applications.

Cloud providers are increasingly integrating HPC capabilities into their infrastructure. AWS now offers Parallel Computing Service for simplified HPC deployment, while Microsoft’s Azure Eagle cloud supercomputer demonstrates that cloud infrastructure can compete with purpose-built supercomputers. GPU instances, high-speed networking, and HPC-optimized storage are now standard offerings across major cloud platforms. This infrastructure evolution makes serverless HPC increasingly viable as the underlying hardware capabilities close the gap with traditional supercomputing centers.

Simultaneously, HPC practitioners are adopting cloud-native approaches to improve resource efficiency. Traditional batch scheduling systems like Slurm often leave significant compute capacity idle between jobs. Serverless architectures, with their ability to scale to zero and provision resources on-demand, offer a path to dramatically improved utilization rates. The review documents growing interest from national research laboratories and academic HPC centers in hybrid approaches that combine traditional batch scheduling with serverless elasticity.

Publication trends analyzed in the review show exponential growth in serverless HPC research, with the number of published studies roughly doubling every two years from 2018 to 2025. Collaboration network analysis reveals increasing cross-pollination between cloud computing and HPC research communities, with several active research clusters spanning multiple institutions and countries. This suggests the field is maturing from isolated experiments to a coordinated research program with shared frameworks and benchmarks.

Looking forward, the review identifies several promising directions: deeper integration of hardware accelerators (GPUs, TPUs, FPGAs) with serverless platforms, development of serverless-native parallel programming models that abstract away distributed system complexity, improved support for stateful computation through disaggregated memory architectures, and standardization efforts to reduce vendor lock-in. Meta’s deployment of hyperscale serverless infrastructure for both general and HPC workloads offers an early glimpse of what this converged future might look like at industry scale.

Key Takeaways for Practitioners and Researchers

This systematic literature review of 122 studies provides the most comprehensive mapping yet of serverless computing’s application to HPC, AI, and big data workloads. For practitioners evaluating serverless architectures for compute-intensive applications, several actionable insights emerge from the analysis.

Start with embarrassingly parallel workloads. The review demonstrates that serverless computing delivers the strongest value for workloads that can be decomposed into independent parallel tasks—hyperparameter searches, Monte Carlo simulations, map-reduce operations, and batch inference. These applications naturally fit the stateless, event-driven execution model and benefit most from elastic auto-scaling.

Choose platforms strategically. For short-running, highly parallel tasks, FaaS platforms like AWS Lambda provide the finest granularity and fastest scaling. For workloads requiring longer execution times or custom runtime environments, serverless-enabled CaaS platforms like AWS Fargate or Google Cloud Run offer more flexibility. Open-source platforms like Knative provide escape from vendor lock-in for organizations with private infrastructure. Understanding how reinforcement learning models like DeepSeek R1 require massive distributed training helps illustrate why platform choice matters for AI workloads.

Plan for data locality. The stateless nature of serverless functions means data must be managed externally. For data-intensive workloads, choosing storage services with low-latency access (like in-memory caches or region-local object storage) can significantly reduce the overhead of external state management.

Monitor the convergence trend. The cloud-HPC convergence documented in this review is accelerating. Organizations investing in serverless HPC today will benefit from rapidly improving platform capabilities, including better GPU support, longer execution times, and optimized inter-function communication. The review’s framework for risk management in cloud computing, complementing the NIST AI Risk Management Framework, provides guidance for evaluating these emerging architectures.

For researchers, the taxonomy identifies accelerator integration, heterogeneous computing support, and serverless-native programming models as the areas with the greatest need for further investigation. The bibliometric analysis in the review can help identify potential collaborators and active research groups working on complementary problems.

The field of high-performance serverless computing stands at an inflection point. With cloud infrastructure approaching HPC-grade performance, serverless platforms maturing rapidly, and research output growing exponentially, the next five years will likely see serverless architectures become a standard approach for many classes of compute-intensive applications. This review provides the essential roadmap for navigating that transition.

Turn this 46-page research paper into an interactive experience your team can explore in minutes.

Start Now →

Frequently Asked Questions

What is serverless computing for high-performance computing?

Serverless computing for HPC adapts the serverless execution model—featuring automatic scaling, pay-per-use billing, and zero infrastructure management—to support compute-intensive workloads such as scientific simulations, AI training, and big data analytics. It enables elastic resource allocation that scales from zero to thousands of parallel functions on demand.

Can serverless handle AI and machine learning workloads?

Yes. Research shows serverless platforms are increasingly used for AI workloads including distributed model training, hyperparameter tuning, and ML inference pipelines. Platforms like AWS Lambda and specialized frameworks such as Lithops enable parallel execution of AI tasks with automatic scaling and fine-grained billing.

What are the main challenges of serverless computing for HPC?

Key challenges include cold start latency that impacts time-sensitive computations, execution time limits on commercial platforms (typically 5-15 minutes), limited support for GPU and hardware accelerators, stateless architecture requiring external storage for large datasets, and vendor lock-in with proprietary cloud platforms.

How does Function-as-a-Service differ from traditional cloud computing?

FaaS fully abstracts infrastructure management, offers automatic scaling from zero to near-infinite capacity, and uses utilization-based billing measured in milliseconds. Traditional cloud computing requires manual provisioning, charges for allocated resources even when idle, and demands ongoing operational management of VMs or containers.

What serverless platforms support HPC and scientific computing?

Commercial platforms include AWS Lambda, Azure Functions, and Google Cloud Run. Open-source alternatives like Nuclio, Apache OpenWhisk, Knative, and OpenFaaS offer self-hosted options. Specialized frameworks such as Lithops, PyWren, and MARLA extend these platforms specifically for parallel scientific and HPC workloads.

Is serverless computing cost-effective for big data processing?

Serverless can be highly cost-effective for bursty or intermittent big data workloads because you pay only for actual compute time. For sustained, long-running processing jobs, traditional provisioned infrastructure may be more economical. The pay-per-invocation model eliminates idle resource costs that plague conventional cluster deployments.

Your documents deserve to be read.

PDFs get ignored. Presentations get skipped. Reports gather dust.

Libertify transforms them into interactive experiences people actually engage with.

No credit card required · 30-second setup