TAL Journal: Special Issue on annotated corpora

Guest editors: Marie Candito (Université Paris Diderot) and Mark Liberman (University
of Pennsylvania)

Annotated corpora are crucial as resources both for NLP and
computational linguistics, whether these resources were created by hand
or by partly or completely automatic methods.Such annotated corpora are
required for supervised learning of statistical NLP systems and useful
for semi-supervised learning. Furthermore, even unsupervised methods
usually need gold standard data for evaluation. If for some applications
such as machine translation or speech recognition, linguistically
annotated corpora compete with data that contain the application’s
input/output pairs only, without any linguistic annotation, it remains
that linguistic annotations are intended to be more generic and enable
to train models that can be more easily interpreted.  In linguistics
research corpus annotation can help validate a given formalization, and
statistical modeling of linguistic phenomena is based on annotated
corpora.The increasing use of distributed representations to represent
linguistic atomic symbols (i.e. word embeddings, POS embeddings,
dependency label embeddings) has an impact on the kind of annotated
resources used in NLP and may, in turn, promote a quantitative
assessment of the boundaries traditionally drawn for linguistic

We welcome submissions on any aspect of annotated corpora for NLP and
linguistics. Discussion of French-language resources is especially
welcome, or multilingual resources containing French, in order to
provide an insight on current available resources for researchers
working on French.

We encourage submissions that either present innovative research or
provide an overview and comparison of previous research, and pertain to
the following topics (non- exhaustive list):

- Creation of corpora annotated with any kind of meta-linguistic
  information, especially those including French data

- Interoperability of annotations

- Multilinguality issues

- Annotation procedures, including projection from resources in other

- Qualitative and quantitative annotation evaluation

- Linguistic issues (pros and cons of annotation schemes from the point
  of view of linguistic description, cost of linguistic simplification)

- Corpora augmented with context-aware distributed representations of
  words (e.g. sense embeddings)

- Comparison of linguistic discrete annotations and distributed

- Comparison of linguistic formalization in corpora annotation and in
  linguistic theories

- Comparison and evaluation of annotation schemes for specific tasks

- Challenges in annotation (e.g. cross-sentence annotation, multilingual
  annotation, code-switching, construction annotation, multimodal
  annotations, annotation of non-canonical language…)

- Maintenance of annotated corpora

- Innovative use of annotated corpora

- Annotation tools and tools to explore annotated resources
  (development, evaluation)




  • Submission deadline: November, 2nd 9th 2018
  • Notification to the authors after first review: February 8th, 2019
  • Notification to the authors after second review: April 12th, 2019
  • Publication : September 2019


TAL (Traitement Automatique des Langues / Natural Language Processing) is an international journal published by ATALA (French Association for Natural Language Processing, http://www.atala.org) since 1959 with the support of CNRS (National Centre for Scientific Research). It has moved to an electronic mode of publication, with printing on demand.

Online user: 1