Benchmarking Encoders and SSL for HAR

Graphical Abstract - Large-scale evaluation of self-supervised learning for smartphone-based HAR using the DAGHAR benchmark dataset. 11,232 models were evaluated, across four SSL techniques, six encoders, two refinement strategies, and six datasets. We show that SSL can outperform supervised learning with fewer labeled samples. Graphical abstract: HAR SSL benchmark pipeline

Abstract

Smartphone-based Human Activity Recognition (HAR) typically relies on deep learning models. However, performance varies with encoder architecture and the availability of labeled data. To address label scarcity, Self-Supervised Learning (SSL) exploits unlabeled data. However, existing benchmarks evaluate narrow combinations of encoders, SSL techniques, and refinement strategies, leaving the joint effects of encoder, SSL paradigm, and data availability underexplored under standardized conditions. We address this gap with a benchmark on the curated DAGHAR dataset, enabling reproducible evaluation across supervised and SSL settings from few-shot to full-data regimes.

Across 11,232 trained models combining six encoders and four SSL techniques, we find that full fine-tuning consistently outperforms freezing, and that encoder architecture and SSL technique have comparable influence on accuracy, highlighting the importance of selecting effective combinations. While CNN-PFF with Time-Frequency Consistency (TF-C) achieves peak performance in most configurations, ResNet-SE-5 is the most robust overall encoder, delivering consistent results across datasets, SSL techniques, and data regimes, particularly in few-shot scenarios.

Among the evaluated SSL techniques, TF-C demonstrates substantial data efficiency, with models reaching over 95% of peak accuracy using only 25–50 labeled samples per class, a performance level unattainable by supervised models. At full-data regimes, the best encoder varies by dataset, alternating between CNN-PFF and TS2Vec, while SSL, typically using TF-C or Learning from Randomness (LFR), can still outperform supervised learning. This standardized large-scale benchmark showed that SSL can outperform conventional supervised approaches with fewer labeled training data. In the future, it can be extended with new encoders, SSL techniques, and datasets.

Human Activity Recognition Deep Learning Representation Learning Self-Supervised Learning (SSL)

Benchmark Analysis Tool

Explore 11,232 combinations: encoders, SSL techniques, datasets, and refinement strategies with interactive splits.

Explore Data

If you find this work useful, cite:

@article{daluz2026benchmarking,
  title={Benchmarking Encoders and Self-Supervised Learning for Smartphone-Based Human Activity Recognition},
  author={da Luz, Gustavo P. C. P. and Soto, Darlinne H. P. and Napoli, Otávio O. and Rocha, Anderson and Boccato, Levy and Borin, Edson},
  journal={IEEE Access},
  year={2026},
  publisher={IEEE}
}