In the last post, the data from the on-time flight database was
loaded in a column-orientated storage engine. Now the numbers can
be crunched.
The original goal of this exercise was to find the flight from
Los Angeles International Airport, LAX, to Dallas Fort Worth
International Airport, DFW, that was the most likely to arrive
on-time.
The data is 'opportunity rich' in that there is a lot information
in there. It is easy to start wondering about the various nuggets
of information in there. Are their certain aircraft (tail
numbers) that are routinely bad performers? Are some days of the
week better than others? Do national holidays have an effect on
the on-time performance? If you are delayed, is there a 'regular
amount' of delay? Does early departure make for an early arrival?
Can the flight crew make up for a late departure? How much time
is usually spend on runways?
But to look for the flight from LAX …
This will be a quick tutorial on looking at on-time flight
analysis. This material will be part of a lab for a class on
InfiniDB
that I am developing. The information is from Data.Gov Website and you are
free to follow the steps presented.
What I want to know is what flight from a certain airport arrives
at my local airport on time the most frequently. Traveling from
LAX to DFW can often be a combination of cancellations, flight
delays, and being the nth plane in line for takeoff. So what is
the best flight choice for that route?
The first step is getting the data. And is is available for free
from Airline On-Time Performance and Causes of Flight
Delays. Be sure to select the check box for documentation so
that there will be a readme.html to described the file …