Guest editors: Marie Candito (Université Paris Diderot) and Mark Liberman (University of Pennsylvania)
Annotated corpora are crucial as resources both for NLP and computational linguistics, whether these resources were created by hand or by partly or completely automatic methods.Such annotated corpora are required for supervised learning of statistical NLP systems and useful for semi-supervised learning. Furthermore, even unsupervised methods usually need gold standard data for evaluation. If for some applications such as machine translation or speech recognition, linguistically annotated corpora compete with data that contain the application’s input/output pairs only, without any linguistic annotation, it remains that linguistic annotations are intended to be more generic and enable to train models that can be more easily interpreted. In linguistics research corpus annotation can help validate a given formalization, and statistical modeling of linguistic phenomena is based on annotated corpora.The increasing use of distributed representations to represent linguistic atomic symbols (i.e. word embeddings, POS embeddings, dependency label embeddings) has an impact on the kind of annotated resources used in NLP and may, in turn, promote a quantitative assessment of the boundaries traditionally drawn for linguistic concepts.
We welcome submissions on any aspect of annotated corpora for NLP and linguistics. Discussion of French-language resources is especially welcome, or multilingual resources containing French, in order to provide an insight on current available resources for researchers working on French.
We encourage submissions that either present innovative research or provide an overview and comparison of previous research, and pertain to the following topics (non- exhaustive list):
- Creation of corpora annotated with any kind of meta-linguistic information, especially those including French data
- Interoperability of annotations
- Multilinguality issues
- Annotation procedures, including projection from resources in other languages
- Qualitative and quantitative annotation evaluation
- Linguistic issues (pros and cons of annotation schemes from the point of view of linguistic description, cost of linguistic simplification)
- Corpora augmented with context-aware distributed representations of words (e.g. sense embeddings)
- Comparison of linguistic discrete annotations and distributed representations
- Comparison of linguistic formalization in corpora annotation and in linguistic theories
- Comparison and evaluation of annotation schemes for specific tasks
- Challenges in annotation (e.g. cross-sentence annotation, multilingual annotation, code-switching, construction annotation, multimodal annotations, annotation of non-canonical language…)
- Maintenance of annotated corpora
- Innovative use of annotated corpora
- Annotation tools and tools to explore annotated resources (development, evaluation)
INFORMATION
IMPORTANT DATES
Submission deadline: November, 2nd9th 2018
Notification to the authors after first review: February 8th, 2019
Notification to the authors after second review: April 12th, 2019
Publication : September 2019
THE JOURNAL
TAL (Traitement Automatique des Langues / Natural Language Processing) is an international journal published by ATALA (French Association for Natural Language Processing, http://www.atala.org) since 1959 with the support of CNRS (National Centre for Scientific Research). It has moved to an electronic mode of publication, with printing on demand.