<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href='static/style.xsl' type='text/xsl'?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-04-09T10:20:36Z</responseDate><request verb="GetRecord" identifier="oai:clarin-pl.eu:11321/932" metadataPrefix="oai_dc">https://clarin-pl.eu/oai/request</request><GetRecord><record><header><identifier>oai:clarin-pl.eu:11321/932</identifier><datestamp>2024-05-22T06:55:52Z</datestamp><setSpec>hdl_11321_3</setSpec><setSpec>hdl_11321_4</setSpec></header><metadata><oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:doc="http://www.lyncode.com/xoai" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://purl.org/dc/elements/1.1/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>The LnNor Corpus: A spoken multilingual corpus of non-native and native Norwegian, English and Polish (Part 2)</dc:title>
<dc:creator>Wrembel, Magdalena</dc:creator>
<dc:creator>Hwaszcz, Krzysztof</dc:creator>
<dc:creator>Pludra, Agnieszka</dc:creator>
<dc:creator>Skałba, Anna</dc:creator>
<dc:creator>Weckwerth, Jarosław</dc:creator>
<dc:creator>Malarski, Kamil</dc:creator>
<dc:creator>Cal, Zuzanna</dc:creator>
<dc:creator>Kędzierska, Hanna</dc:creator>
<dc:creator>Czarnecki-Verner, Tristan</dc:creator>
<dc:creator>Balas, Anna</dc:creator>
<dc:creator>Kaźmierski, Kamil</dc:creator>
<dc:creator>Żychliński, Sylwiusz</dc:creator>
<dc:creator>Gruszecka, Justyna</dc:creator>
<dc:subject>L2 English</dc:subject>
<dc:subject>L3 Norwegian</dc:subject>
<dc:subject>L1 Polish</dc:subject>
<dc:subject>spoken data</dc:subject>
<dc:description>The LnNor corpus was created as part of the data collection in two projects: CLIMAD&#xd;
(Crosslinguistic influence in multilingualism across domains: phonology and syntax) and&#xd;
ADIM (Across-domain Investigations in Multilingualism: Modeling L3 Acquisition in Diverse&#xd;
Settings), led by Prof. Magdalena Wrembel at Adam Mickiewicz University in Poznań, Poland&#xd;
and by Prof. Marit Westergaard at the Arctic University of Norway, from December 2021 to&#xd;
April 2024 with funding from the National Science Centre (NCN) in Poland and Norway&#xd;
Grants.&#xd;
&#xd;
The CLIMAD and ADIM projects explored cross-linguistic influence (CLI) in the&#xd;
acquisition, processing, and use of a third language (L3/Ln) across various language domains&#xd;
and focused on different settings and stages of acquisition from a multilingual perspective. A&#xd;
range of sophisticated methodologies, such as perception and production tests, grammaticality&#xd;
judgement tasks and online brain imaging techniques like EEG, were leveraged to unravel the&#xd;
intricacies of multilingual processing. By capturing real-time insights into the interplay of&#xd;
cross-linguistic influences, the projects not only provided valuable contributions to the&#xd;
understanding of L3/Ln acquisition but also advanced theoretical frameworks in this field.&#xd;
&#xd;
Corpus data collection covered a broad range of speech elicitation tasks. The recordings&#xd;
consist of word, sentence and text reading, picture story description, video story retelling,&#xd;
spontaneous speech and socio-phonetic interviews in Polish, English and Norwegian. The&#xd;
corpus contains metadata based on the Language History Questionnaire (Li et al. 2020) such as&#xd;
age, gender, native languages, proficiency level, length of language exposure, age of onset.&#xd;
&#xd;
Data was collected from different groups of speakers:&#xd;
• L1 Polish learners of Norwegian as L3/Ln, attending Scandinavian studies at Poznań College&#xd;
of Modern Languages and the University of Szczecin (instructed learners);&#xd;
• L1 Polish learners of Norwegian as L3/Ln, living in Norway (naturalistic learners)&#xd;
• L1 English natives as controls&#xd;
• L1 Norwegian natives as controls&#xd;
&#xd;
Six types of speech tasks were recorded in Norwegian, English and Polish:&#xd;
• word reading&#xd;
• sentence reading&#xd;
• text reading (“The North Wind and the Sun”)&#xd;
• story telling (spontaneous)&#xd;
• picture description&#xd;
• picture story telling&#xd;
• video story telling&#xd;
• translation from Polish/English to Norwegian&#xd;
&#xd;
Metadata corresponding to the recordings include the following information:&#xd;
• speaker ID, age, gender, education, current residence, speaker status&#xd;
(instructed/naturalistic/native), native language, additional languages spoken&#xd;
• recording ID&#xd;
• language: PL (Polish), EN (English), NO (Norwegian)&#xd;
• status: L1, L2, L3/Ln&#xd;
• speech task: WR (word reading), SR1/2/... (sentence reading), TR1/2/... (text reading), PD&#xd;
(picture description), ST (story telling), VT (video story telling)&#xd;
• recording date, recording place, iteration, recording environment, recording device, type of&#xd;
microphone, noise level, etc.&#xd;
&#xd;
The labels of the recordings adhere to a structured format: PROJECT_SPEAKER&#xd;
ID_LANGUAGE STATUS_TASK, wherein:&#xd;
• PROJECT corresponds to the project within which the data were collected (A for ADIM, C&#xd;
for CLIMAD)&#xd;
• SPEAKER ID corresponds to a unique speaker ID consisting of 8 characters&#xd;
• LANGUAGE STATUS represents the language in which the task was recorded and its status&#xd;
for the speaker (e.g., L1PL, L2EN, L3NO)&#xd;
• TASK corresponds to the type of speech task recorded (e.g., TR, SR, WR, etc.)&#xd;
&#xd;
The LnNor corpus has been created to represent multilingual speech with a focus on L3/Ln&#xd;
Norwegian learners as well as native controls of Norwegian, English and Polish. The corpus is&#xd;
designed to study linguistic variation in learners acquiring Norwegian as a foreign language in&#xd;
instructed and naturalistic settings. Additionally, a subcorpus of native speech patterns is&#xd;
provided to serve as a benchmark, against which the learners' productions could be compared.&#xd;
Furthermore, part 2 of the corpus contains word alignment with orthographic transcriptions of&#xd;
speech to facilitate subsequent analyses across various linguistic domains.&#xd;
&#xd;
All speech samples were recorded with the use of Shure SM-35 unidirectional cardioid&#xd;
head-worn condenser microphones, using portable Marantz PMD620 solid state recorders with&#xd;
signal digitized at 48 kHz, 16-bit. This set-up was selected to minimize ambient noise and&#xd;
provide clear and focused recordings.&#xd;
&#xd;
The LnNOR corpus part 2 consists of 1671 annotated files from 164 speakers. The&#xd;
speakers included 113 L1 Polish, 33 L1 Norwegian and 18 L1 speakers of English. The total&#xd;
recording time is approximately 59 hours and the full size is 26 GB. The recordings in the&#xd;
released LnNor corpus part 2 cover data collected between 2023-2024.</dc:description>
<dc:date>2024-05-15</dc:date>
<dc:type>corpus</dc:type>
<dc:identifier>http://hdl.handle.net/11321/932</dc:identifier>
<dc:language>nor</dc:language>
<dc:language>eng</dc:language>
<dc:language>pol</dc:language>
<dc:rights>Creative Commons - Attribution 4.0 International (CC BY 4.0)</dc:rights>
<dc:rights>https://creativecommons.org/licenses/by/4.0/</dc:rights>
<dc:rights>CC</dc:rights>
<dc:format>application/pdf</dc:format>
<dc:format>application/vnd.openxmlformats-officedocument.spreadsheetml.sheet</dc:format>
<dc:format>application/zip</dc:format>
<dc:format>application/zip</dc:format>
<dc:format>downloadable_files_count: 4</dc:format>
<dc:publisher>Adam Mickiewicz University</dc:publisher>
<dc:source>https://adim.web.amu.edu.pl/en/</dc:source>
</oai_dc:dc>
</metadata></record></GetRecord></OAI-PMH>