Linguistik online
 
     

Information:
Deutsch
English
Français
Español  
 
 

Impressum

 

Étudier l'écrit SMS – Un objectif du projet sms4science

Louise-Amélie Cougnon (UCLouvain, ILC, Cental) &
Thomas François (Aspirant FNRS, UCLouvain, ILC, Cental)


 

Abstract

This paper details an international project called sms4science that aims to collect text message corpora (hereafter referred to as "SMS corpora") from across the globe for scientific research. The project already has ten participating regions, including Belgium, Réunion, Switzerland and Quebec. This article first presents the initial corpora collected from these four areas (resulting in a combined total of 116'000 text messages) and the accompanying methodology. It then exposes the research possibilities related to it: the corpus-based studies pertain as much to linguistics and sociolinguistics as they do to natural language processing and statistics. A specific statistical study is thus presented here and its possible conclusions outline the differences in SMS practices between regions, notably when you consider abbreviation rate or message length. Finally, the paper delineates the project obstacles and correspondingly proposes fresh perspectives for the ongoing year (2011).


full text