In France, mobility is managed by numerous transport operators, each handling various modes (metro, RER, bus, tram, etc.) on shared infrastructure. To ensure smooth travel, operators must collaborate to offer coordinated and integrated services.
Close collaboration is essential to define data sharing conditions, financial agreements, and regulatory modalities. Particularly with the opening up of competition in the transport sector, coordination is expected to be strengthened between new and existing contributors.
Effective data sharing enables MaaS (Mobility as a Service) platforms to provide accurate information and promote multimodal transport use. However, the presence of diverse sources, formats and tools complicates this process. Indeed, the integration of diverse public transport networks often introduces redundancies, inconsistencies, and errors, ultimately undermining the reliability of the information provided.
To ensure public transport data accuracy, various tools, like GTFS Validator, can detect format errors and structural issues. However, these tools cannot identify temporal anomalies, such as fluctuations in journey volumes or time-based trends. To address this, we developed an AI-Based Transit Data Validator, a tool that detects such irregularities early, enabling transport operators to take timely action.
1. Use case
RATP Smart Systems is a MaaS operator offering a wide range of transport options on a single application. We integrate multiple transport offers for all train, RER, metro, tramway and bus lines in the Île-de-France region. A theoretical transport offer is represented by a set of files describing both public transport network structure as well as scheduled journeys planned for the next 30 days. We have observed that the theoretical offer can present issues, e.g., the absence of service on a line during short periods or unexplained variations in the number of trips on a line. These errors result in inaccurate route searches and a poorer user experience on our MaaS application. To address this, we propose an AI-based automated anomaly detection tool designed to ensure the quality of the theoretical offer.
2. AI-Based Transit Data Validator
Our solution is designed to identify unusual behaviors, outliers, and deviations on transport line schedules. It is an anomaly detector designed to monitor the public transport planning. To detect anomalies, we propose an initial model that compares the planned service of each transport line over the offer period (30 days). Any date that shows differences compared to the others is then considered an anomaly. Furthermore, we propose a second model that analyzes the overall behavior of each line by comparing it to other lines within the same mode of transport. If a line shows a significantly different offer compared to the others, it is then reported as an anomaly. The following figure shows the different steps of our pipeline.
Anomaly Detection Pipeline
2.1. Data preparation
The first step in our pipeline is data preparation. After data extraction, we proceed with the calculation of time series. For each day, for each line, we construct a signal representing the number of trips for every 30-minute interval.
The figure shows the variations in the number of trips on Metro Line 12 throughout the day for each day of December 2024. We can observe that working days generally have a similar number of trips, while weekends follow a different schedule. From this, we can conclude that there is a regular daily pattern that should not deviate significantly.
2.2. Model 1 – Outlier dates
The Outlier dates model analyzes each transport line individually. For a specific line, it evaluates the discrepancy between the service offered on a given date and that on other days for the same line. The greater the difference from the values observed on other days, the more likely it is to be identified as an anomaly.
Distance measure
To identify anomalies, defining an appropriate distance measure is crucial, as it directly influences the accuracy of outlier detection. In our approach, we employed Dynamic Time Warping (DTW) to calculate the distances between pairs of time series.
For each transport line, we constructed a distance matrix that quantifies the differences between daily time series, representing the number of trips recorded every 30 minutes throughout each day. This matrix was then used to identify atypical days for each transport line.
Dates anomalies detection
Once the distance matrix is established, we calculate the average distance between the time series of a given day and those of all other days for the same transport line. This calculation takes into account the nature of the day in question, whether it is a weekday, weekend, public holiday, or during school vacations.
Subsequently, analyzing the average distances allows us to identify the most similar days. If the behavior of a particular day deviates significantly from that of its neighboring days, it is flagged as an anomaly. A threshold is set to determine what qualifies as an anomaly: if the distance of a day to its neighbors exceeds this threshold, the day is classified as an anomaly.
The daily trip counts for the public transport line reflect the typical weekday and weekend patterns in the theoretical schedule. However, during the weekend of November 23, there is an increase in service levels compared to other weekends in the schedule. As a result, November 23 and 24 are identified as anomalies.
2.3. Model 2 – Outlier lines
For a set of lines within the same mode of transport (train, tram, metro, etc.), the Outlier Lines model identifies the lines whose service differs from that of other lines within the same mode.
This model accounts for the full service of the lines over the next 30 days. The time series used therefore represents the number of trips per 30-minute interval across the entire service period.
By comparing the theoretical services of Parisian trams in October 2024, we observe that Tramway T1 exhibits behavior different from the other lines. Unlike the others, it ceases operation starting October 23. Consequently, it is identified as an anomaly.
Prophet
Prophet is a forecasting tool developed by Facebook, designed for time series data that exhibits seasonal trends and potential holiday effects. By decomposing the time series into trend and seasonality components, Prophet can help in anomaly detection and trend analysis. By using Prophet, the transport service for the next 30 days of a line is decomposed into four distinct components:
- general trend
- weekly seasonality
- weekday daily seasonality
- weekend daily seasonality
Anomaly Detection with Unsupervised Machine Learning
For each mode of transport (train, tram, etc.), we assess the similarity between lines by analyzing the components derived from Prophet for each line. For each of the four components, we compute the average DTW distance between a given line and the other lines within the same transport mode. Using these three calculated features, we then apply DBSCAN (Density-Based Spatial Clustering of Applications with Noise) to identify lines whose behaviors are anomalous relative to the others.
DBSCAN is a clustering algorithm that groups data based on their density. DBSCAN identifies high-density regions as clusters and marks isolated points with low density as noise or anomalies.
3. The Time to Act is Now
Our AI-Based Transit Data Validator has been successfully employed for several months across various theoretical public transport offers. We have implemented intuitive dashboards that provide easy access to the results. This has allowed for the early detection of irregularities on multiple occasions, enabling operational teams to take timely, proactive action.
4. Try it out now!
The proposed AI-Based Transit Data Validator easily integrates into various applications and adapts to different transport data formats. With this tool, you can not only optimize your services but also significantly enhance the user experience by providing accurate and reliable information.