Corpus Juridisch Nederlands

About the Corpus application

The corpus application is developed by the Dutch Language Institute (Instituut voor de Nederlandse Taal or INT). The backend of the application is the BlackLab Lucene based search engine developed for corpora with token-based annotation (https://blacklab.ivdnt.org/). The web-based frontend is a further development of the corpus-frontend application developed by INT (https://github.com/instituutnederlandsetaal/blacklab-frontend) in CLARIN and CLARIAH projects. Its design is inspired by the first version of the OpenSoNaR user interface by Tilburg and Radboud University (https://github.com/Taalmonsters/WhiteLab2.0).

About the Corpus Juridisch Nederlands

The Corpus Juridisch Nederlands comprises a collection of 5.856 legal texts that could be consulted from the mid-1980s until 1992 as N-Lex, a database of current Dutch legislation. The material has been made available by the Centre for Informatics and Law of the Erasmus University in Rotterdam. The files have been compiled per year and run from 1814 to 1989. Only a few French-language texts and some undated texts have not been included in the corpus. [Note that the current website N-Lex contains the consolidated Dutch legislation which is or has been in force since 1 May 2002.]

The documents that now make up the Corpus Juridisch Nederlands were originally part of the Corpus Hedendaags Nederlands. Because these texts date from 1814 to 1989, they are out of place in the latest version of the Corpus Contemporary Dutch. This is why these documents have been incorporated in a separate Corpus Juridisch Nederlands.

Linguistic Annotation

The Part of Speech tagging has been done using the tagset and tagging principles for the annotation of diachronic corpora of historical Dutch, developed in the context of the CLARIAH+ project. This annotation layer has been added to the corpus, and can also be used to search the online corpus. A detailed description can be found here.

The 17th and 18th century word forms all have a modern Dutch lemma. For words no longer used in modern Dutch, a modern lemma has been constructed using the same linguistic principles applicable to still existing words. More information about the used lemmatisation principles can be found in Lemmatiseerprincipes voor GiGaNT, het centrale lexicon van het INT.

Credits

When referring to the Corpus Juridisch Nederlands, please use the following reference:

Corpus Juridisch Nederlands (Version 1.0) (September 2021) [Online service]. Available at the Dutch Language Institute: http://hdl.handle.net/10032/tm-a2-u2

For BlackLab:

Software available at https://github.com/instituutnederlandsetaal/BlackLab

Does, Jesse de, Jan Niestadt & Katrien Depuydt (2017), Creating research environments with BlackLab. In: Jan Odijk and Arjan van Hessen (eds.) CLARIN in the Low Countries, pp. 151-165. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi

For the corpus frontend:

Software available at: https://github.com/instituutnederlandsetaal/blacklab-frontend

Logo provenance:

Antonio Canova (1757–1822), La Giustizia (1792), via Wikimedia Commons.