Optimal Overbooking Strategy of Airlines using Statistical/Machine Learning Algorithms & their use in post pandemic situation – an expert view

Airlines across the globe during pre-COVID situation were striving to achieve profitable growth in a highly volatile and competitive business environment, not to mention declining yield/RASK (RASM as in some different metric). Growth in passenger bookings has also taken an upsurge leading to capacity crunch for major airlines across the world. Aviation Industry, like many others, leverages data driven analytics for efficient business strategy across multiple functions (like planning, commercial, operations, engineering etc.) to overcome such hindrances.

Adopt ML-based optimal booking strategy to maximize the yield and revenues

Overbooking flights is a common practice in the aviation industry based on the expectation that some fraction of the booked passengers will either cancel their booking at the eleventh hour or fail to show up at the departure gate (commonly known as no-show). Accurate forecasts of the number of no-show passengers for each flight can help the airline to accrue more revenue by simultaneously reducing the number of spoiled seats and lowering the cases of involuntary denied boarding which produce significant cost penalty. Optimal booking policies seek to maximize the yield as a tradeoff between the revenue due to additional sales offset by the cost of any denied boarding that might happen. Flights flying fully occupied are not only of economic interest but are of ecological importance as well.

Implementation of relevant statistical/machine learning based algorithms may ensure an advanced and robust prediction of the no-show numbers for each scheduled flight, which can aid the airlines company to accept optimum number of additional bookings over the inventory capacity. The solution comprised of two steps – prediction of cabin-level no show rates using specific information on the individual passengers booked on each flight and later employing these predicted no-show rates along with features extracted from historically similar flights to arrive at the final no-show estimate.

Build a robust prediction framework to identify no-shows

At the beginning, individual passenger details from historical flights were utilized to capture their booking trend and understand their travel pattern. Different classification models were employed to capture the different bias-variance structures and then ensembled together to come up with a more robust prediction framework. Identification of individual learners that will eventually lead to lower misclassification is the key. Below is the mis classification table.



Predicted flown


A1 A2


B1 B2

Here B2 are the cases where a passenger predicted as flown have actually flown. A1 are the cases where one predicted as no-show didn’t turn up. The passengers who have been predicted to be flown passenger but in reality didn’t turn up are A2. B1 are the passengers who have been predicted to not turn up but eventually did. Though both A2 and B1 cases are misclassified, an increased number of B1 cases will result into higher instances of denied boarding which will escalate the cost in the form of penalty.

At the next step some generalized linear model concepts were employed to come up with the estimated no show numbers. A GLM model for the unique combination of ith Flight Market Departure Date is of the form:

g(yi) = f(xi) + εi

where g() is the appropriate link function applied on the response variable, that would help to model it as a linear function of the explanatory variables. In this case, f() is a linear function and ε is the corresponding error vector.

While experts will agree that individual passenger information for a particular flight is of immense importance, airlines usually accept bookings till few hours before departure, and hence any model based on the same data will be too volatile, and always be based out of incomplete data. Rather, we assume that the pattern of travelers in a particular Flight- Market would be similar in certain Days of Week. So, we utilize the cabin level no show rate from the first model as an auxiliary variable for the passenger level information in a GLM (Generalized Linear Model).  We used GLM over other because of following reasons

  • The response variable is a count variable (more than 0 but less than a certain number, which may vary from Market to Market)
  • The variance structure of the error will guide on the selection of the appropriate link function
  • The response variable might also be taken to be of type ratio (rate of no show) and may guide to selection of a totally different link function

A GLM yielding lower RMSE (Root Mean Square Error) values is considered as an appropriate prediction equation.

The General workflow can be summarized by the below diagram:

Improved inventory visibility and intelligent network planning for airlines

Some benefits reaped by the Airlines those have implemented Optimal Overbooking Strategy are:

  • Reduction of spoiled seats as well as the number of involuntary denied boarding that not only maximized the revenue of the airlines due to additional bookings but also curtailed additional cost due to compensation due to DNBs and gave a boost to the reputation of the company
  • Accurate prediction teamed with historical booking pattern helped the client in network planning and inventory management

How scenario changes post pandemic

COVID-19 has changed how industries operate and Aviation is no exception. Most of the airlines worldwide had been grounded and very few are operating across limited routes. Some Indian airlines have started their domestic and international operations, but the number of flights deployed is just a fraction of their pre-pandemic frequency. Center for Aviation (CAPA) India has indeed stated that, “Indian Aviation Sector may lose $4 billion in FY 21.”* Currently, with limited transportation options, airlines is the only viable mode of transportation as people stranded in places away from home has an urge to return to near and dear ones.

In such trying situations having an Optimal Booking Policy at place can truly help sustain airlines. Though prediction of overbooking becomes more difficult in such an unprecedented situation, the airlines deems it to be even more important to have a robust and accurate demand forecast along with an Overbooking Strategy. Smart deployment of inventory along with the decision to set the Optimum Booking Limit, might help in the yield management more than ever.

New consideration of factors like instances of relaxation or implementation of sudden lockdown due to increasing cases of Corona outbreak at respective cities are important as this may result into cancellation of tickets for some passengers due to lack of transportation facilities giving room for overbooking.

Though one might say that the future of aviation looks bleak in current scenario, with the right changes applied at the right time to the existing model, we have the potential to maximize the revenue of an airline today and in the future.

We are building capabilities like these using AI/ML for optimal overbooking strategy in our platform of intelligence for airlines. Our platform can help airlines accelerate the value realization – 2% incremental PLF, 4% savings in fuel 3% upside in crew utilization and 2% improvement in TAT – using airline intelligence platform.

Stay tuned to know more about platform of intelligence for airlines.


Devangana Dasgupta
Data Scientist, DATA

Akshay Vijay Medhane
Junior Data Scientist, DATA

Anindya Neogi
GM, Chief Data Scientist, Digital Experience



Get in touch with our customer success team for any queries.
Choose Language »