Analyzing Historical Legal Textcorpora: German VET and CVET regulations
dc.contributor.author | Reiser, Thomas | |
dc.contributor.author | Dörpinghaus, Jens | |
dc.contributor.author | Steiner, Petra | |
dc.contributor.editor | Klein, Maike | |
dc.contributor.editor | Krupka, Daniel | |
dc.contributor.editor | Winter, Cornelia | |
dc.contributor.editor | Gergeleit, Martin | |
dc.contributor.editor | Martin, Ludger | |
dc.date.accessioned | 2024-10-21T18:24:18Z | |
dc.date.available | 2024-10-21T18:24:18Z | |
dc.date.issued | 2024 | |
dc.description.abstract | The digitization of historical documents has gained particular interest in recent years. The majority of research endeavors aim at digitizing historical documents by extracting text from scanned images. A pipeline that transcribes scanned documents into fully structured texts was utilized to digitize over 900 German VET and CVET regulations. As a preliminary investigation, a basic corpus analysis was conducted to assess the usability of the digitized documents and the necessity for document digitization methods that can generate transcripts that maintain the logical text structure and hierarchy. This paper focuses on the processing of the transcripts created from German VET and CVET regulation images to demonstrate the advantages of fully structured text over plain OCR results and to illustrate that even simple analyses require more information for more comprehensive document understanding. | en |
dc.identifier.doi | 10.18420/inf2024_174 | |
dc.identifier.eissn | 2944-7682 | |
dc.identifier.isbn | 978-3-88579-746-3 | |
dc.identifier.issn | 2944-7682 | |
dc.identifier.pissn | 1617-5468 | |
dc.identifier.uri | https://dl.gi.de/handle/20.500.12116/45152 | |
dc.language.iso | en | |
dc.publisher | Gesellschaft für Informatik e.V. | |
dc.relation.ispartof | INFORMATIK 2024 | |
dc.relation.ispartofseries | Lecture Notes in Informatics (LNI) - Proceedings, Volume P-352 | |
dc.subject | Document digitization | |
dc.subject | OCR | |
dc.subject | Legal texts | |
dc.subject | Corpus analysis | |
dc.title | Analyzing Historical Legal Textcorpora: German VET and CVET regulations | en |
dc.type | Text/Conference Paper | |
gi.citation.endPage | 2018 | |
gi.citation.publisherPlace | Bonn | |
gi.citation.startPage | 2007 | |
gi.conference.date | 24.-26. September 2024 | |
gi.conference.location | Wiesbaden | |
gi.conference.sessiontitle | Digitalization and AI for and in Education and Educational Research (DAI-EaR'24) |
Dateien
Originalbündel
1 - 1 von 1
Lade...
- Name:
- Reiser_et_al_Analyzing_Historical_Legal_Textcorpora.pdf
- Größe:
- 1.37 MB
- Format:
- Adobe Portable Document Format