Universal Dependencies

Universal Dependencies, frequently abbreviated as UD, is an international cooperative project to create treebanks of the world's languages.[1] These treebanks are openly accessible and available. Core applications are automated text processing in the field of natural language processing (NLP) and research into natural language syntax and grammar, especially within linguistic typology. The project's primary aim is to achieve cross-linguistic consistency of annotation, while still permitting language-specific extensions when necessary. The annotation scheme has it roots in three related projects: Stanford Dependencies,[2] Google universal part-of-speech tags,[3] and the Interset interlingua[4] for morphosyntactic tagsets. The UD annotation scheme uses a representation in the form of dependency trees as opposed to a phrase structure trees. At the present time (January 2022), there are just over 200 treebanks of more than 100 languages available in the UD inventory.

  1. ^ de Marneffe, Marie-Catherine; Manning, Christopher D.; Nivre, Joakim; Zeman, Daniel (13 July 2021). "Universal Dependencies". Computational Linguistics. 47 (2): 255–308. doi:10.1162/coli_a_00402. S2CID 219304854.
  2. ^ "Stanford Dependencies". nlp.stanford.edu. The Stanford Natural Language Processing Group. Retrieved 8 May 2020.
  3. ^ Petrov, Slav (11 Apr 2011). "A Universal Part-of-Speech Tagset". arXiv:1104.2086 [cs.CL].
  4. ^ "Interset". cuni.cz. Institute of Formal and Applied Linguistics (Czech Republic). Retrieved 8 May 2020.

From Wikipedia, the free encyclopedia · View on Wikipedia

Developed by Tubidy