Forward Head Posture Classification Using Deep Learning Models on Facial Recognition Data

— Forward Head Posture (FHP) refers to a condition where the head protrudes forward, significantly contributing to neck pain and being associated with decreased productivity and psychological distress. This study investigates the nuanced classification of FHP and proposes a universally applicable methodology for its identification and analysis using deep learning. Leveraging the Korean Facial Image (K-FACE) dataset, rigorous image preprocessing with the Yolo-v8 model was conducted to facilitate accurate measurement of the CranioVertebral Angle (CVA) from various perspectives. The study meticulously evaluated the classification effectiveness of three advanced deep learning models: EfficientNet-B7, NFNet-F7, and ResNet-152. Among these, EfficientNet-B7 demonstrated superior performance with an accuracy of 0.69 and a recall score of 0.69 compared to other models. Additionally, comparisons based on camera angles within the EfficientNet-B7 model highlighted its excellence, particularly at the ±75° angle. The importance of image regions in EfficientNet-B7 was confirmed through Grad-CAM analysis, emphasizing the critical role of the neck region in accurately classifying FHP images. This comprehensive performance comparison and the proposed detailed classification methodology underscore the potential for generalization in FHP classification. Furthermore, by leveraging a unique dataset and employing state-of-the-art classification techniques, this research offers a novel perspective on the discourse surrounding FHP. Future research could integrate expanded facial image datasets and apply transfer learning techniques to further enhance the precision of FHP classification, thereby improving diagnostic accuracy and offering targeted interventions for individuals experiencing neck pain associated with FHP.


I. INTRODUCTION
Neck pain is a prevalent musculoskeletal issue globally, emerging as a significant public health concern [1].This pain is primarily linked to Forward Head Posture (FHP), detrimentally impacting quality of life and occupational productivity [2].FHP characterizes a common upper-body postural problem, denoting a forward shift in the head's center of gravity.When maintaining a prolonged incorrect posture while performing computer work or using smartphones, the Forward Head Posture (FHP) can exacerbate.This can lead to the development of Text-neck syndrome.Text-neck syndrome refers to the issues arising from maintaining a bent neck and looking down posture for extended periods during tasks such as computer work and smartphone usage.Such posture imposes strain on the muscles and structure of the neck and can result in long-term pain and disability associated with neck deformities.Therefore, recognizing the potential consequences of neglecting FHP is crucial [3], [4], [5].
A systematic review has emphasized the correlation between the increase in FHP and the incidence of neck pain, highlighting the longevity of such pain and underscoring the urgency for prevention [6].Neck pain occurs more frequently, especially in sedentary jobs where individuals are exposed to sustained improper postures.Additionally, it has been reported that Forward Head Posture (FHP) posture may be influenced by the duration of smartphone usage [7].According to literature, neck pain caused by FHP is strongly associated with psychological disorders such as stress, mood disorders, anxiety, depression, and emotional disturbances [8].This has emerged as a significant social issue as well.FHP is measured through the CranioVertebral Angle (CVA), defined by the angle between the 7th Cervical Vertebra (C7) and the Tragus of the ear; a diminishing CVA angle corresponds to an escalating FHP [9], [10], [11].Fig. 1 illustrates the utilization of the CVA in previous studies, showcasing a systematic review of the methodologies adopted to measure and analyze the impact of FHP on individuals [17].Many studies have attempted to assess the effectiveness of visual feedback-based approaches or exercise interventions in trying to correct posture, specifically targeting FHP [12], [13], [14].Recent advancements in deep learning have given rise to models aimed at FHP prevention.Preliminary studies exploited smartphone cameras and apps, leveraging CNNbased models like EfficientNet-B0 to detect FHP, while others utilized the Kinect webcam, initially designed for facial and posture recognition, for similar purposes [15], [16].Nevertheless, previous research has its limitations.First, these studies merely discerned whether the posture was forwardleaning or not.Second, FHP detection was attempted solely from a frontal or lateral perspective.Lastly, while exhibiting high accuracy, these studies often relied on a small sample size of less than 20 participants, limiting their generalizability.Considering real-world work settings where the neck progressively protrudes forward and the diverse positioning of webcams, Neck pain occurs more frequently, particularly in sedentary jobs where individuals are exposed to sustained improper postures.Therefore, it becomes imperative to comprehend FHP in detail.Thus, this study aims to verify the feasibility of a detailed FHP classification and to propose and validate a universally adaptable model-building methodology.

A. Dataset
This study utilized the Korean Facial Image Data collected by Ai-hub, initially established by the 'Korea Intelligence Information Society Promotion Agency'.The dataset comprised images of individuals aged 20 to 60, including both upper body and facial features, facilitating the measurement of the CVA angle.The dataset was curated by adjusting parameters such as lighting, shooting angles, accessories (like glasses and sunglasses), and resolution.Approximately 32,000 images (864x576 pixels) were amassed per individual.For this research, data from a total of 318 participants was employed.Fig. 2 presents a sample line graph demonstrating the classification performance of various models across different camera angles, focusing on contrasting colors to enhance readability on-screen and a black-and-white hardcopy.The camera equipment is configured to capture images from at least 20 angles, with the camera centered on the subject at a vertical direction of 0 degrees.This includes 13 horizontal directions (±90°, ±75°, ±60°, ±45°, ±30°, ±15°, 0°) relative to the vertical direction, 5 horizontal directions (±45°, ±15°, 0°) relative to a vertical direction of 30 degrees upward, and 2 horizontal directions (±40°) relative to a vertical direction of 15 degrees downward.As a result, the cameras are synchronized to acquire images from multiple angles simultaneously in a single capture session.Fig. 3 depicts the equipment used for capturing the Korean Facial Images dataset, highlighting the camera's strategic positioning to acquire images from multiple angles.

B. Data Annotation
The 'Korean Facial Recognition Image' is characterized by its inclusion of both the face and upper body, and its distinctive feature is the simultaneous capture of the face from multiple angles.In this study, a physical therapist researcher annotated two side-view images (±90°) out of the 13 shots taken at horizontal camera angles (±90°, ±75°, ±60°, ±45° +30°, ±15°, 0°).with its preprocessed counterpart, illustrating the effectiveness of the preprocessing techniques in enhancing image clarity and focus for FHP analysis.Markers were placed on the C7/Tragus for annotation, and the angle formed by connecting the C7-Tragus termed the CVA angle, was labeled.Consequently, all 13 shots from a single capture session, spanning various angles, were consistently labeled [18].

C. Data Preprocessing and Sampling
Data preprocessing was executed using the Yolo-v8 model to segment the upper body of individuals in the images, with the remaining background set to black.Subsequently, through the Yolo-v8 object detection model, the photos were cropped to fit the bounding box dimensions of the individuals and then resized to 633x600 pixels.Any images that were damaged during preprocessing or had insufficient lighting were excluded.Approximately 110,000 images were used for the study.For the detailed classification of FHP posture, the CVA angle was grouped in intervals of 5 degrees.Classes with a frequency of more than 10% of the entire dataset were selected, classifying them based on the CVA angles into three categories: 55°-59°, 60°-64°, and 65°-69°.Under-sampling was conducted based on the least frequent class to address data imbalance.Training was then carried out by dividing the dataset into 60% for training, 20% for validation, and 20% for testing.

D. Model Selection
The 'Korean Facial Recognition Image' is characterized by its inclusion of both the face and upper body, and its distinctive feature is the simultaneous capture of the face from multiple angles.In this study, a physical therapist researcher annotated two side-view images (±90°) out of the 13 shots taken at horizontal camera angles (±90°, ±75°, ±60°, ±45° +30°, ±15°, 0°).Markers were placed on the C7/Tragus for annotation, and the angle formed by connecting the C7-Tragus termed the CVA angle, was labeled.Consequently, all 13 shots from a single capture session, spanning various angles, were consistently labeled [18].[19]: The deep learning-based Yolo model operates using an object detection approach, using the Oneshot method, which simultaneously predicts object locations and classifications by observing the image only once.This leads to enhanced processing speeds in image analysis.This research adopted the latest version, Yolo-v8, to segment the human form and conduct object detection tasks.

1) Yolo-v8
2) EfficientNet-B7 [20]: EfficientNet, grounded in deep learning, provides optimal classification performance by adjusting the depth, width, and resolution of the neural network.It offers models of various sizes (from B0 to B7) designed for achieving high accuracy with minimal computational cost on complex image datasets.The B7 model, which can handle large image input sizes and substantial parameter computations, was chosen for the classification tasks in this study.
NFNet provides models of varying sizes (from F1 to F7) that deliver both high accuracy and computational efficiency.For this research, the F7 model, suited for managing extensive image input sizes and significant parameter operations, was applied to the classification tasks.[22]: ResNet(Residual Network), rooted in deep learning techniques, is characterized by its utilization of residual connections which add the output of a previous layer to the input of the subsequent layer.ResNet architectures vary based on the depth of the layers (e.g., 18,50,150).By leveraging these residual connections, the network mitigates the vanishing gradient issue, enabling stable learning even as the depth of the network increases.In this study, considering the image input size and extensive parameter operations, the ResNet152 model was chosen and applied for the classification task.

E. Training Image Classification Models
All classification models were standardized to accept images of 633x600 pixels as input.The optimization was performed using the Adam optimizer, with a dynamically adjustable learning rate set to adapt between 0.0001 and 0.00001, contingent upon instances where validation performance showed no improvement.For multi-class categorization, the Categorical Crossentropy loss function was employed.An early stopping criterion was applied, terminating the training if the validation performance failed to exhibit enhancements over ten consecutive iterations.

F. Image Importance Confirmation Using Grad-CAM
Grad-CAM (Gradient-weighted Class Activation Mapping) is a technique that allows visualizing which parts of an image are crucial for a deep learning model to make classification decisions.By leveraging the gradients of the trained model, Grad-CAM highlights important regions within the image.Using Grad-CAM enables a better understanding of the decision-making process of classification models, particularly enhancing comprehension of complex CNN-based classification problems such as Forward Head Posture (FHP) [23].

III. RESULTS AND DISCUSSION
This study conducted a classification of the FHP based on different camera angles (±90°, ±75°, ±60°, ±45° +30°, ±15°, 0°).We used three models, EfficientNet-B7, NFNetF7, and ResNet-152.The CVA angles were categorized into three classes (55°, 60°, 65°) for multi-class image classification purposes.Additionally, the Grad-CAM (Gradient-weighted Class Activation Mapping) was applied to pinpoint the significant regions in the image classification process visually.1 represents the average performance metrics across all camera angles.Upon comparing the mean performance metrics of the three models, EfficientNet-B7 demonstrated the highest performance across all indicators.Moreover, NFNetF7 outperformed ResNet-152 in terms of efficacy.2) Comparison of Three Model Performances: The performance metrics of the three models were compared based on the camera angles.Figure 5 provides a detailed comparison of model performance by camera angle, showcasing a graphical representation that elucidates the selected deep learning models' accuracy, precision, recall, and F1-score across varying perspectives.Using the best-performing model, EfficientNet-B7, as a reference, Table 2 presents the comparative performance results across various camera angles.The accuracy peaked at ±75° with a value of 0.75 and was lowest at 0° with a score of 0.63.In terms of precision, the highest value was achieved at ±90°, registering 0.76, and the lowest at 0° with 0.69.Recall was most pronounced at ±75° with 0.75 and least at 0°, with a value of 0.63.The F1-Score was highest at ±75°, registering 0.75, lowest at 0°, and 0.63.Fig 6 shows the significant regions activated in the image classification using the EfficientNet-B7 model by Grad-CAM analysis.For most camera angles, the models assigned importance to the neck region.However, for the 0° (frontal) model, significance and activation were observed in areas outside the neck, like the background.However, images with worn accessories or specific facial expressions exhibited a pattern where importance was distributed.This study aims to confirm the feasibility of a detailed classification of FHP and proposes a generalized approach.Utilizing Korean facial recognition data, we trained models to classify FHP posture from various perspectives, using camera angles from ±90° to 0°.The EfficientNet-B7 model showed superior performance, especially at the ±75° angles, while the 0° angle performed less efficiently, possibly due to inherent image characteristics.The Grad-CAM analysis highlighted the neck region's importance in the classification process.However, images taken from a 0°(frontal) angle or those with accessories presented challenges, with activations detected outside the neck region, impacting classification accuracy.

A. Performance Metrics Evaluation 1) Comparison of Three Model Performances: Table
Previous study utilized the POM-Checker to evaluate the forward neck tilt angle of subjects through the sagittal plane, alongside obtaining radiographic images for comparison.The findings demonstrated a high accuracy, with a 95.35% agreement rate between the level of FHP and the radiographic diagnosis.However, as Study classified FHP posture occurring in clinical settings using inspection equipment rather than webcams, there were discrepancies between that study and ours [24].
Another existing study, utilizing the CranioVertebral Angle (CVA) to classify the severity of Forward Head Posture (FHP), evaluated the posture of the head and neck in standing position using a digital camera.It attempted to classify FHP into mild and moderate-severe categories.The results showed a high sensitivity (0.93) and area under the curve (0.88) in distinguishing between the two FHP groups.While this study aimed to classify FHP using not only the CranioVertebral Angle (CVA) but also the Forward Shoulder Angle (FSA), the latter could not distinctly differentiate between the two groups.In contrast, our study attempted to classify FHP into three classes, which may bring distinctiveness to the classification [25].
Recent research has been actively conducted on posture correction feedback systems applicable to office work settings.These systems utilize smartphones, computers, and wearable posture correction sensors to alert users, encouraging them to maintain proper posture, including correcting Forward Head Posture (FHP).The FHP model developed in this study could be applied to such systems, serving as a criterion for providing feedback [26], [27], [28], [29].
In particular, the method of detecting FHP through wearable sensors relies on a 3-axis magnetometer to track directional changes of the magnetic field emanating from a magnet, while integrating with an accelerometer to precisely adjust head posture.The data collected by the sensors are processed by machine learning algorithms to determine the risk level of FHP, demonstrating an accuracy of 95.6%.Through this, it is speculated that with further development, utilizing the FHP classification model from this study, operable via webcams, and collecting data on FHP by wearing wearable sensors, detailed classification of CVA angles could be achievable [30].
This study addresses several previous challenges by using a unique dataset.Yet, there remain limitations, including a need for comparable studies for benchmarking, the need for under-sampling due to data imbalance, and the constraints of the CVA angle range used for classifications.

IV. CONCLUSION
This study proposed and validated a generalization method for classifying FHP using Korean facial recognition data and deep learning techniques.Our methodology, centered around the precise measurement of the CVA using the Korean Facial Image dataset and sophisticated image preprocessing with the Yolo-v8 model, has provided a robust framework for assessing FHP from diverse camera perspectives.The comparative analysis of three state-of-the-art deep learning models-EfficientNet-B7, NFNet-F7, and ResNet-152revealed EfficientNet-B7 as the standout model, offering superior classification accuracy.This finding was corroborated by Grad-CAM analysis, which identified the neck region as a pivotal factor in the classification process, highlighting the practical utility of our approach in recognizing and addressing posture-related issues.
The contributions of this research to the domain of FHP are manifold, presenting a significant leap forward in our understanding and ability to classify posture-related disorders accurately.Looking ahead, the potential enhancements through incorporating an expanded dataset and applying transfer learning techniques are vast.Such advancements are anticipated to refine the classification accuracy further, thereby improving diagnostic precision and paving the way for developing more effective interventions for individuals suffering from neck pain attributable to FHP.This work not only enriches the current literature but also lays the groundwork for future investigations to combat the pervasive issue of posture-induced neck pain.

Fig. 1
Fig. 1 Utilization of CranioVertebral Angle in Previous Studies

Fig. 2 A
Fig.2A sample line graph using colors which contrast well both on screen and on a black-and-white hardcopy

Fig. 3
Fig. 3 Equipment Used for Capturing the Dataset of Korean Facial Images

Fig. 5
Fig. 5 Comparison of Model Performance by Camera Angle

Fig. 6
Fig. 6 Assessment of Image Feature Importance

TABLE II PERFORMANCE
METRICS OF EFFICIENTNET-B7 FOR EACH CAMERA ANGLE