This experimental study was to be conducted in multiple dental hospitals and clinics to evaluate the accuracy of supervised ML in predicting treatment durations compared to actual durations. The study also assessed the impact of ML prediction on clinical workflow efficiency. All methods were performed in accordance with the relevant guidelines and regulations of the Ethics Committee at the College of Dentistry, University of Sulaimani. The ethical committee of the College of Dentistry-University of Sulaimani approved the research project with Code No. (COD-EC-25-0077) on March 24, 2025.
Inclusion criteria
Patients undergoing common dental treatments such as fillings, root canals, periodontal treatment, tooth preparation, implant visits, orthodontic treatment, extractions, or any other dental procedures were included. Eligible participants should be 18 years or older and give informed consent for their data to be used for research purposes. At the same time, some patients under 18 were included, and informed consent was taken from the parents.
Exclusion criteria
Patients with incomplete records, emergency cases requiring immediate attention, or treatments involving complex or rare procedures were excluded.
The study uses a total of 2500 patients for machine learning and 250 cases for assessing the accuracy of the model (Fig. 1).

A flow diagram illustrating the patient selection and data analysis.
ML model description
This study presents a supervised and optimized hybrid system for predicting dental procedure durations. It combines machine learning models with real-time online data retrieval and clinical domain knowledge. The system architecture consists of three interconnected modules designed for accuracy, efficiency, and clinical practicality.
Machine learning core
The prediction engine employs a two-tiered modeling approach using the sklearn library and Linear Regression function from that library:
X_encoded = pd.get_dummies(X, columns = cat_cols, drop_first = True).
self.models[proc_name] = LinearRegression().fit(X_encoded, y).
Features include
-
Numerical: Dentist experience (years), patient age.
-
Categorical: Patient sex, dentist specialty (one-hot encoded).
Lookup Table Fallback: Procedures with sparse data (< 5 records) use precomputed averages, avoiding unreliable model fits.
Efficiency optimizations:
-
1.
Column Alignment: Prediction inputs are dynamically reindexed to match training schema (reindex(columns = model_columns, fill_value = 0)), eliminating full dataset reprocessing.
-
2.
Memory Management: Minimal Data Frames are constructed during real-time prediction.
-
3.
Parallel Training: Models are built independently per procedure, enabling scalable additions to the procedure catalog.
Online data retrieval module
The system integrates live dental guideline searches via SerpApi:
params = { “q”: f’“{procedure_name}”dental procedure duration minutes”,
“engine”: “google”,
“api_key”: api_key}
time_pattern = re.compile(r‘(\d{1,3}(?:\s*to\s*|-)?\d{1,3})\s*(minute|hour)s?‘, re.IGNORECASE)
Clinical safety mechanisms
Procedure Minimums Dictionary:
Enforces biologically plausible times regardless of model output.
Specialty-Adjusted Predictions:
-
Endodontists receive a 40%-time reduction for root canals.
-
General dentists use baseline estimates.
-
Mismatched specialties trigger conservative multipliers.
User interface & workflow integration
(Fig. 2).
The graphical user interface of the prediction software.
The Tkinter GUI implements:
-
Dynamic Form Validation: Real-time checks for input completeness.
-
Progressive Disclosure: Interface elements enable/disable based on system state.
-
Comparative Insights: Displays model vs. online estimates simultaneously.
Computational performance
Benchmarking on an Intel i7-1185G7 showed:
-
Model Training: 120- 450ms per procedure (depending on sample size, the study sample size was 2500 records).
-
Prediction Latency: <15ms for dedicated models, and < 2ms for lookup table.
-
Memory Footprint: <45 MB with 10,000 procedure records.
Validation Framework.
The system employs three-tier validation:
-
1.
Input Sanitization: Type checking and range validation for numerical inputs.
-
2.
Model Confidence Checking: Fallback to averages when prediction variance exceeds thresholds.
-
3.
Clinical Plausibility Gates: Final predictions are constrained by:
final_time = max(self._get_safety_minimum(proc_name), prediction).
This architecture demonstrates how hybrid AI systems can balance computational efficiency with clinical reliability in healthcare applications. The modular design permits seamless integration of additional data sources (e.g., EHR systems) while maintaining sub-second response times critical for clinical workflows.
Data collection
Data will be sourced from direct computational entry from public and private dental clinics considering different specialties. The following variables will be collected:
-
Actual treatment duration (observed and recorded by the clinician).
-
Type of dental procedure.
-
The operator’s specialty (Specialist in the field of the procedure, General practitioner, Postgraduate student, Undergraduate student).
-
Dentist’s years of experience.
-
Patient demographic information (Age and Gender).
The actual treatment durations will be calculated. Dentists or clinic workers manually record treatment duration. All dentists will be trained to record treatment duration, from history taking to patient dismissal; patients with continuous visits to the specified clinic have been excluded from the history-taking process. This data will be the training material for the software. After collecting 2500 cases, the software has been trained on these records. Another 250 cases have used to be predicted by the software and measured by the clinician to obtain two different readings: one reading is the actual duration, and the software predicts the other.
Manual data preprocessing has been conducted to unify the type of treatments and any missing fields or misspellings, the missing fields has been dealt with by two methods:
-
1.
Irrecoverable missing data have been excluded like missing any field of the row data.
-
2.
Minor gaps, misspelled words, and unifying different names for one procedure has been addressed computationally by finding and replacing words, for example the words filling and restoration are synonymous and have been entered into the dataset.
Statistical analysis
The data will be collected using Microsoft Excel, saved as a.csv file, and analyzed using the IBM SPSS statistical package version 29, and the DATAtab Team (2024). DATAtab: Online Statistics Calculator. DATAtab e.U. Graz, Austria. The following statistical methods will be employed:
-
Descriptive statistics to summarize predicted and actual treatment durations.
-
Paired t-tests to determine the significance of differences between predicted and actual durations regarding sex, age of the patients, years of practitioner’s experience, and the practitioner’s specialty.
-
R² Score: This metric shows how much of the prediction variance is explained by the regression model.
-
Mean Absolute Error (MAE): The average difference between predictions and the actual values.
-
A p-value of < 0.05 was considered statistically significant.
link

