Header

Search

Upgrading the linguistic ORD-ecosystem - UpLORD

About this project

The project “Upgrading the linguistic ORD ecosystem” (March 2023 - June 2025) tackles the issue of making Swiss language data and the corresponding infrastructure components compliant with Open Science principles. Concretely, it proposes technical implementations, along with a series of community-oriented measures.

Scientific summary

The UpLORD project identifies a series of drawbacks of existing infrastructures for language data management in the era of Open Science and FAIR principles and proposes concrete solutions of enhancement to answer the needs of researchers from the CLARIN-CH and NCCR Evolving Language scientific communities. The solutions proposed are on the one hand, technical developments of existing infrastructure components, and on the other hand, documentation, tutorials and training offered to the members of the target scientific communities.

Challenges and goals

The main challenges addressed by the UpLORD project are related to the application of the FAIR principles to Swiss language resources and the infrastructure components necessary for their management. More specifically,

  • most Swiss language corpora do not satisfy all four FAIR principles
  • individual corpus platforms across Switzerland do not satisfy the interoperability principle
  • lack of APIs necessary for automatic accessibility of data
  • lack of meaningful metadata and of infrastructures to manage a diversity of annotations of linguistic data, this being linked to the findable and the reusable principles
  • proper management of sensitive data, informed consent, copyright and intellectual property issues, which are necessary so that the sets of data adhere to the FAIR requirements
  • need of training for the target scientific communities to adopt standards for best practices, good habits and frames of mind in linguistic data management and collaboration

Results and Output

The UpLORD project enhances the national technology platform LiRI and its services, as well as the national repository for publishing and archiving linguistic data LaRS@SWISSUbase, by:

  • building a national corpus platform – the LiRI Corpus Platform –  thanks to which Swiss corpora of several types (text, audio and video) become findable and accessible (public launch planned for August 2024)
  • building of data converters, such as (TEI-)XML, generic XML and CWB format to CoNLL-U+
  • automatizing workflows, namely API between NCCR data infrastructure and SWISSUbase, as well as LiRI LCP and SWISSUbase
  • ensuring interoperability of existing infrastructure services at the national and European levels at the metadata level and implementing the harvesting of SWISSUbase by the CLARIN Virtual Language Observatory (planned for November 2024)
  • establishing national working groups to foster exchange, to inform and to produce documentation, such as the CLARIN-CH WG on the management of sensitive and personal data, legal and ethical issues and on Swiss learner corpora
  • documenting and promoting best practices, for instance for each step of the data lifecycle, copyright, licenses, data access and security, data and metadata standards
  • raising awareness and training about ORD practices in the context of teaching and research, specifically in relation to data protection, as well as copyright and licenses, both from technical and legal perspectives
  • building a robust practice of data curation by the LaRS@SWISSUbase Data Service Unit by providing documentation and tutorials

Impact on Open Science practices

This project represents a crucial steppingstone towards applying a sustainable ORD strategy for linguistic data in Switzerland. The technical implementations to increase the degree of FAIRness, both in what concerns the infrastructure and in what concerns language data, along with the community-oriented measures, represent a solid basis for carrying out research projects in accordance with Open Research Data principles. Thanks to the UpLORD project, researchers benefit of information, documentation, support and infrastructure components to help them in their management of data according to ORD principles, and in making their language data findable, accessible, interoperable and reusable.

Additional Information

Contact

CLARIN-CH
https://clarin-ch.ch/contact 

Linguistic Research Infrastructure LiRI
https://www.liri.uzh.ch/en.html

LaRS@SWISSUbase
https://www.lars.uzh.ch/en.html

Further information

LiRI LCP:
https://lcp.linguistik.uzh.ch/

LaRS@SWISSUbase Documentation and tutorials:
https://resources.swissubase.ch/help/user-guide/linguistics/

CLARIN-CH Documentation platform:
https://clarin-ch.ch/documentation-platform/start

CLARIN-CH Day 2024: Open Research Data – Challenges and Opportunities
https://clarin-ch.ch/clarin-day-2024

Related project

Moving towards a national FAIR-compliant ecosystem of Federated Infrastructure for Language Data (FAIR-FI-LD)
https://www.liri.uzh.ch/en/projects/FAIR-FI-LD.html