DSC Multilingual Mystery 2: Beware, Lee and Quinn!
Contents
DSC Multilingual Mystery 2: Beware, Lee and Quinn!#

By Lee Skallerup-Bessette and Quinn Dombrowski
February 27, 2020
Quinn#
The differences leap out at you before you even open any of the books. “Christine a une idée géniale”. Or was the Baby-Sitters Club “L’idée géniale de Valérie:? Does “Bruno aime Mélanie”, or is he instead “Un amoureux pour Anne-Marie”? There’s a case study on the Dutch translation of the Baby-Sitters Club books in Babysitting the reader: translating English narrative fiction for girls into Dutch (1946-1995) by Mieke K T Desmet that gets into strategies for localizing a story that takes place in a different cultural context, and the article “Cultural Understanding in the Indonesian Translation of The Baby-sitters Club” by Halida Aisyah talk about how the Indonesian translation took a different approach, maintaining the protagonists’ foreign names and locations, and only adopting Indonesian cultural references when the American equivalent would’ve been incomprehensible without some kind of extensive explanation. But I hadn’t come across any scholarly literature on translation strategies for The Baby-Sitters Club in French (in any of the translations: Québécois, Belgian, or French from France).
I had questions, and not just “what did they decide to call Mallory?” (Spoiler alert: in Québécois it’s Marjorie, and just like in English, it’s the most frequently screwed up name when OCRing the books.) The ghostwriters in the US were working with an extensive “BSC Bible” that had the description and background of every character in Stoneybrook, and further afield in the BSC universe. (This was adapted and published as The complete guide to The Baby-sitters Club.) But in DSC Mystery #1: Lee and the Missing Metadata, Lee discovered that at least in Quebec, they were throwing Baby-Sitters Club books at multiple translators, who turned them around in no time at all. How careful were the translators about consistency, in terms of what they called various peripheral characters and places? This was the making of another Data-Sitters Club Multilingual Mystery. (Who are the data-sitters? So glad you asked. Check out Chapter 2.)
Lee and I put our heads together about how we’d start looking into this mystery. We needed a book that we had on hand in all the translations: Québécois, Belgian, French from France (the last of these being the source of the recent French re-releases). We settled on Jessi’s Secret Language, on the thought that all the major characters had been established by that point, as well as many peripheral ones. We’d need to compare with some of the other translations, but that would be our starting point.
Here’s the thing, though: Lee reads French. I don’t. I mean, I could probably pick my way through the text and come up with a list of characters and places, but I had other ideas. I wanted to see how French named-entity recognition performed compared to English, when applied to The Baby-Sitters Club.
What’s named-entity recognition?#
Named-entity recognition (often abbreviated NER) is a kind of information extraction task – basically, trying to identify particular things (like names of people, places, and organizations) in unstructured text, like a novel. (Yeah, I know that novels have structure, but your average plain-text file of a novel’s text – even if it maintains chapter headers and such – doesn’t have the kind of structure that a computer can easily read. I mean, it’s not like it’s a spreadsheet or something.) There are two major technical approaches: one uses grammar-based rules to identify the things of interest, and the other uses statistical models like machine learning, and requires a ton of labeled data (e.g. texts where a human has already gone through and correctly identified all the things of interest) upfront. Particularly for statistical models, the more your texts resemble the example texts that the model was trained on, the better the NER will perform. These models are most commonly trained on news corpora, or Wikipedia – not 80’s and 90’s girls’ literature. This sort of thing is a problem in DH more broadly, not just for us Data-Sitters. David Bamman’s LitBank project (a dataset of annotated excerpts from public domain literature) is one example of how DH scholars can significantly improve the effectiveness of natural-language processing (NLP) by training models on data that looks more like what we’re trying to apply it to. But I’ll save the question of how, exactly, one goes about training a model for a future Data-Sitters Club Multilingual Mystery. For the moment, let’s see how some commonly-used tools perform “out of the box”.
The tools#
The two major NLP tools with multilingual coverage are spaCy and Stanford NLP. To use spaCy, you load it into Python and run it that way. While there’s a Python version of Stanford NLP, as of February 2020 it doesn’t cover everything – and entities are one thing that’s currently left out. To get entities with Stanford NLP, you have to run a memory-hungry Java program from the command line, with all the joy that comes from setting that up. To make matters worse, Stanford NLP doesn’t have an NER model for French: just English, Spanish, German, and Chinese. It’s a better comparison to look at English vs. French with the same tool, rather than English with one and French with the other, so for this mystery, we’ll be using spaCy.
The texts#
To make it easier to compare the entities from each text, I split up each translation plus the English original into 15 plain text files, one from every chapter. Everything else I left as I got it from ABBYY FineReader (as discussed in DSC #2: Katia and the Phantom Corpus), plus the corrections to my (often bad) attempt to transcribe the “handwritten text” portions. I didn’t appreciate some implications of that – I’ll get back to it in a bit.
Getting started with spaCy#
SpaCy is run via Python, so it can seem a little intimidating if you’ve never worked with a programming language before. For this mystery, I set up a Jupyter notebook in the Data-Sitters Club GitHub repo that you can download and use for your own texts. (If you’re not familiar with Jupyter notebooks, here’s a Programming Historian tutorial.)
You can’t run the exact same experiment I did without access to the same texts I have (which I can’t share for copyright reasons), but the Jupyter notebook on GitHub has all the output I got running it on the Baby-Sitters Club corpus, so you can see the results of the process one step at a time.
1. Downloading spaCy models#
The first step is to download the spaCy models; this notebook has been updated for spaCy 3. These models have been pre-trained on annotated French and English corpora, respectively. You only have to run these code cells below the first time you run the notebook; after that, you can skip right to step 2 and carry on from there. (If you run them again later, nothing bad will happen; it’ll just download again.) You can also run spaCy in other notebooks on your computer in the future, and you’ll be able to skip the step of downloading the models.
#Imports the module you need to download and install the spaCy French and English models
import sys
#Installs the French spaCy model
!{sys.executable} -m spacy download fr_core_news_sm
Collecting fr-core-news-sm==3.0.0
Downloading https://github.com/explosion/spacy-models/releases/download/fr_core_news_sm-3.0.0/fr_core_news_sm-3.0.0-py3-none-any.whl (17.2 MB)
?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/17.2 MB ? eta -:--:--
╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.3/17.2 MB 8.0 MB/s eta 0:00:03
━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.6/17.2 MB 8.5 MB/s eta 0:00:02
━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.8/17.2 MB 7.8 MB/s eta 0:00:03
━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.8/17.2 MB 5.7 MB/s eta 0:00:03
━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/17.2 MB 5.8 MB/s eta 0:00:03
━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/17.2 MB 4.4 MB/s eta 0:00:04
━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/17.2 MB 4.5 MB/s eta 0:00:04
━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.4/17.2 MB 4.7 MB/s eta 0:00:04
━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/17.2 MB 4.9 MB/s eta 0:00:04
━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.2/17.2 MB 5.7 MB/s eta 0:00:03
━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.5/17.2 MB 6.0 MB/s eta 0:00:03
━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.0/17.2 MB 6.5 MB/s eta 0:00:03
━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.6/17.2 MB 7.1 MB/s eta 0:00:02
━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.0/17.2 MB 7.4 MB/s eta 0:00:02
━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.2/17.2 MB 7.7 MB/s eta 0:00:02
━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.7/17.2 MB 7.6 MB/s eta 0:00:02
━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.6/17.2 MB 8.5 MB/s eta 0:00:02
━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━ 6.4/17.2 MB 9.2 MB/s eta 0:00:02
━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━ 6.7/17.2 MB 9.2 MB/s eta 0:00:02
━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━ 7.4/17.2 MB 9.6 MB/s eta 0:00:02
━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━ 7.8/17.2 MB 9.9 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━ 8.4/17.2 MB 10.0 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━ 9.2/17.2 MB 10.3 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━ 10.2/17.2 MB 11.0 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━ 10.7/17.2 MB 11.3 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━ 11.3/17.2 MB 13.6 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━ 11.5/17.2 MB 13.6 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 12.5/17.2 MB 14.7 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━ 13.3/17.2 MB 15.5 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━ 14.4/17.2 MB 16.4 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 15.5/17.2 MB 18.3 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━ 16.2/17.2 MB 18.2 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━ 16.4/17.2 MB 17.1 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 17.2/17.2 MB 17.7 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 17.2/17.2 MB 17.7 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 17.2/17.2 MB 17.7 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.2/17.2 MB 14.4 MB/s eta 0:00:00
?25h
Requirement already satisfied: spacy<3.1.0,>=3.0.0 in /Users/qad/anaconda3/lib/python3.8/site-packages (from fr-core-news-sm==3.0.0) (3.0.6)
Requirement already satisfied: pathy>=0.3.5 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->fr-core-news-sm==3.0.0) (0.6.0)
Requirement already satisfied: typer<0.4.0,>=0.3.0 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->fr-core-news-sm==3.0.0) (0.3.2)
Requirement already satisfied: catalogue<2.1.0,>=2.0.3 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->fr-core-news-sm==3.0.0) (2.0.4)
Requirement already satisfied: numpy>=1.15.0 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->fr-core-news-sm==3.0.0) (1.19.5)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->fr-core-news-sm==3.0.0) (1.0.5)
Requirement already satisfied: blis<0.8.0,>=0.4.0 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->fr-core-news-sm==3.0.0) (0.7.4)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->fr-core-news-sm==3.0.0) (2.27.1)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.4 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->fr-core-news-sm==3.0.0) (3.0.6)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->fr-core-news-sm==3.0.0) (4.64.0)
Requirement already satisfied: srsly<3.0.0,>=2.4.1 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->fr-core-news-sm==3.0.0) (2.4.1)
Requirement already satisfied: pydantic<1.8.0,>=1.7.1 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->fr-core-news-sm==3.0.0) (1.7.4)
Requirement already satisfied: setuptools in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->fr-core-news-sm==3.0.0) (52.0.0.post20210125)
Requirement already satisfied: wasabi<1.1.0,>=0.8.1 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->fr-core-news-sm==3.0.0) (0.8.2)
Requirement already satisfied: thinc<8.1.0,>=8.0.3 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->fr-core-news-sm==3.0.0) (8.0.7)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->fr-core-news-sm==3.0.0) (2.0.5)
Requirement already satisfied: packaging>=20.0 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->fr-core-news-sm==3.0.0) (20.9)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->fr-core-news-sm==3.0.0) (3.0.5)
Requirement already satisfied: jinja2 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->fr-core-news-sm==3.0.0) (2.11.3)
Requirement already satisfied: pyparsing>=2.0.2 in /Users/qad/anaconda3/lib/python3.8/site-packages (from packaging>=20.0->spacy<3.1.0,>=3.0.0->fr-core-news-sm==3.0.0) (2.4.7)
Requirement already satisfied: smart-open<6.0.0,>=5.0.0 in /Users/qad/anaconda3/lib/python3.8/site-packages (from pathy>=0.3.5->spacy<3.1.0,>=3.0.0->fr-core-news-sm==3.0.0) (5.1.0)
Requirement already satisfied: charset-normalizer~=2.0.0 in /Users/qad/anaconda3/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.1.0,>=3.0.0->fr-core-news-sm==3.0.0) (2.0.11)
Requirement already satisfied: idna<4,>=2.5 in /Users/qad/anaconda3/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.1.0,>=3.0.0->fr-core-news-sm==3.0.0) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in /Users/qad/anaconda3/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.1.0,>=3.0.0->fr-core-news-sm==3.0.0) (2020.12.5)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/qad/anaconda3/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.1.0,>=3.0.0->fr-core-news-sm==3.0.0) (1.26.4)
Requirement already satisfied: click<7.2.0,>=7.1.1 in /Users/qad/anaconda3/lib/python3.8/site-packages (from typer<0.4.0,>=0.3.0->spacy<3.1.0,>=3.0.0->fr-core-news-sm==3.0.0) (7.1.2)
Requirement already satisfied: MarkupSafe>=0.23 in /Users/qad/anaconda3/lib/python3.8/site-packages (from jinja2->spacy<3.1.0,>=3.0.0->fr-core-news-sm==3.0.0) (1.1.1)
✔ Download and installation successful
You can now load the package via spacy.load('fr_core_news_sm')
#Installs the English spaCy model
!{sys.executable} -m spacy download en_core_web_sm
Collecting en-core-web-sm==3.0.0
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0-py3-none-any.whl (13.7 MB)
?25l ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/13.7 MB ? eta -:--:--
╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.3/13.7 MB 9.5 MB/s eta 0:00:02
━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.5/13.7 MB 10.2 MB/s eta 0:00:02
━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.8/13.7 MB 7.9 MB/s eta 0:00:02
━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.4/13.7 MB 9.6 MB/s eta 0:00:02
━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/13.7 MB 10.1 MB/s eta 0:00:02
━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/13.7 MB 9.5 MB/s eta 0:00:02
━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.4/13.7 MB 9.6 MB/s eta 0:00:02
━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.1/13.7 MB 10.6 MB/s eta 0:00:02
━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.3/13.7 MB 9.9 MB/s eta 0:00:02
━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.9/13.7 MB 10.8 MB/s eta 0:00:01
━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.7/13.7 MB 11.8 MB/s eta 0:00:01
━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━ 5.3/13.7 MB 12.1 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━ 6.0/13.7 MB 12.9 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━ 6.5/13.7 MB 12.8 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━ 7.4/13.7 MB 13.6 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━ 7.4/13.7 MB 12.9 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━ 8.0/13.7 MB 13.2 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━ 9.0/13.7 MB 13.9 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━ 9.6/13.7 MB 14.2 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━ 10.7/13.7 MB 15.1 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━ 11.9/13.7 MB 17.4 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━ 12.9/13.7 MB 19.5 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━ 13.3/13.7 MB 19.1 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 13.7/13.7 MB 20.7 MB/s eta 0:00:01
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.7/13.7 MB 18.2 MB/s eta 0:00:00
?25h
Requirement already satisfied: spacy<3.1.0,>=3.0.0 in /Users/qad/anaconda3/lib/python3.8/site-packages (from en-core-web-sm==3.0.0) (3.0.6)
Requirement already satisfied: pydantic<1.8.0,>=1.7.1 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (1.7.4)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (3.0.5)
Requirement already satisfied: packaging>=20.0 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (20.9)
Requirement already satisfied: wasabi<1.1.0,>=0.8.1 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (0.8.2)
Requirement already satisfied: setuptools in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (52.0.0.post20210125)
Requirement already satisfied: catalogue<2.1.0,>=2.0.3 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (2.0.4)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (1.0.5)
Requirement already satisfied: jinja2 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (2.11.3)
Requirement already satisfied: blis<0.8.0,>=0.4.0 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (0.7.4)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (2.0.5)
Requirement already satisfied: srsly<3.0.0,>=2.4.1 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (2.4.1)
Requirement already satisfied: pathy>=0.3.5 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (0.6.0)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (4.64.0)
Requirement already satisfied: typer<0.4.0,>=0.3.0 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (0.3.2)
Requirement already satisfied: numpy>=1.15.0 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (1.19.5)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (2.27.1)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.4 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (3.0.6)
Requirement already satisfied: thinc<8.1.0,>=8.0.3 in /Users/qad/anaconda3/lib/python3.8/site-packages (from spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (8.0.7)
Requirement already satisfied: pyparsing>=2.0.2 in /Users/qad/anaconda3/lib/python3.8/site-packages (from packaging>=20.0->spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (2.4.7)
Requirement already satisfied: smart-open<6.0.0,>=5.0.0 in /Users/qad/anaconda3/lib/python3.8/site-packages (from pathy>=0.3.5->spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (5.1.0)
Requirement already satisfied: idna<4,>=2.5 in /Users/qad/anaconda3/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (2.10)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/qad/anaconda3/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (1.26.4)
Requirement already satisfied: charset-normalizer~=2.0.0 in /Users/qad/anaconda3/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (2.0.11)
Requirement already satisfied: certifi>=2017.4.17 in /Users/qad/anaconda3/lib/python3.8/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (2020.12.5)
Requirement already satisfied: click<7.2.0,>=7.1.1 in /Users/qad/anaconda3/lib/python3.8/site-packages (from typer<0.4.0,>=0.3.0->spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (7.1.2)
Requirement already satisfied: MarkupSafe>=0.23 in /Users/qad/anaconda3/lib/python3.8/site-packages (from jinja2->spacy<3.1.0,>=3.0.0->en-core-web-sm==3.0.0) (1.1.1)
✔ Download and installation successful
You can now load the package via spacy.load('en_core_web_sm')
2. Importing spaCy and setting up NLP#
Run the code cell below to import the spaCy module, and create two functions: one which loads the French model and runs the NLP algorithms ( includes named-entity recognition), and one which does the same for the English.
#Imports spaCy
import spacy
#Imports the French model
import fr_core_news_sm
#Sets up a function so you can run the French model on texts
frnlp = fr_core_news_sm.load()
#Imports the English model
import en_core_web_sm
#Sets up a function so you can run the English model on texts
ennlp = en_core_web_sm.load()
3. Importing other modules#
There’s various other modules that will be useful in this notebook. The code comments explain what each one is for. This code cell imports all of those.
#io is used for opening and writing files
import io
#glob is used to find all the pathnames matching a specified pattern (here, all text files)
import glob
#os is used to navigate your folder directories (e.g. change folders to where you files are stored)
import os
4. Diretory setup#
Assuming you’re running Jupyter Notebook from your computer’s home directory, this code cell gives you the opportunity to change directories, into the directory where you’re keeping your French text files. (This notebook is designed to deal with one language at a time, and assumes your French text files are in one folder, and English are in another.)
Replace /Users/qad/Documents/dsc/dscm2
with the full path to the directory with your files.
For instance, the default path to the Documents directory is (substituting your user name on the computer for YOUR-USER-NAME):
On Mac: ‘/Users/YOUR-USER-NAME/Documents’
On Windows: ‘C:\Users\YOUR-USER-NAME\Documents’
#Define the file directory here
filedirectory = '/Users/qad/Documents/dsc/dscm2'
#Change the working directory to the one you just defined
os.chdir(filedirectory)
Running spaCy#
5. French NER, first try#
The code cell in step 5 in the Jupyter notebook iterates through the files in the folder you specified up in step 4, after sorting them alphabetically. For every file that ends in .txt (an important limitation – you’ll get an error if you try to have Python open a file that isn’t a text file, including those pesky invisible .DS_STORE files in just about every Mac folder), the code defines an output file name that involves appending ‘_ner_per.txt’ to the end of the input filename.
Opening the input file (i.e. each file in turn, one at a time) and the newly-created, empty output file, the code reads in the text of the input file, and runs the spaCy French NLP. Then, for every word recognized as an entity, as long as it’s an entity labeled ‘PER’ (a person), the entity is written to the screen (with a print command) and to the output file. I thought it’d be easiest to work through the entities one type at a time, starting just with the character names.
I wrote this code, a couple times pulling up previous notebooks I’d written that did similar things, and consulting the spaCy documentation and examples for how to display the entities.
#Sort all the files in the directory you specified above, alphabetically.
#For each of those files...
for filename in sorted(os.listdir(filedirectory)):
#If the filename ends with .txt (i.e. if it's actually a text files)
if filename.endswith('.txt'):
#Write out below the name of the file
print(filename)
#The file name of the output file adds _ner_per to the end of the file name of the input file
outfilename = filename.replace('.txt', '_ner_per.txt')
#Open the input filename
with open(filename, 'r') as f:
#Create and open the output filename
with open(outfilename, 'w') as out:
#Read the contents of the input file
chaptertext = f.read()
#Do French NLP on the contents of the input file
chapterner = frnlp(chaptertext)
#For each recognized entity
for ent in chapterner.ents:
#If that entity is labeled as a person
if ent.label_ == 'PER':
#Print the entity, and the label (which should be PER)
print(ent.text, ent.label_)
#Write the entity to the output file
out.write(ent.text)
#Write a newline character to the output file
out.write('\n')
016_01_bg.txt
Maman PER
Roseline PER
Victor PER
Gringalet PER
Justine Victoire PER
Johanna PER
Marjorie Levêque PER
Johanna PER
Marjorie PER
Roseline PER
Roseline PER
Gringalet PER
J' PER
Justine PER
J' PER
Coppélia PER
J' PER
Coppélia PER
Roseline PER
Coppélius PER
Coppélia PER
Franz PER
Roseline PER
Franz PER
Coppélia PER
Coppé PER
Franz PER
Roseline PER
Allez PER
Roseline PER
Maman PER
Roseline PER
Roseline PER
Roseline PER
016_01_bg_ner_loc.txt
016_01_bg_ner_loc_ner_loc.txt
016_01_bg_ner_loc_ner_loc_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per_ner_per_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per.txt
016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per.txt
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-7-a9f4cbae4497> in <module>
11 with open(filename, 'r') as f:
12 #Create and open the output filename
---> 13 with open(outfilename, 'w') as out:
14 #Read the contents of the input file
15 chaptertext = f.read()
OSError: [Errno 63] File name too long: '016_01_bg_ner_loc_ner_loc_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per_ner_per.txt'
And it worked… mostly? It was super weird that J’ai kept getting listed, but I wasn’t too worried. A quirk of the model, plus the source text! Probably the model wasn’t trained on first-person narratives like The Baby-Sitters Club. Yeah, there was also an example of C’ that was harder to explain, but it wasn’t until I saw an example of a double-curly-quote character (“) identified as an entity that I started getting suspicious. Could those be messing things up somehow?
(Note from July 19, 2021: these were errors that the spaCy 2 French model made. Almost all disappeared by spaCy 3, which is the source of the results you see above.)
6. Data cleaning#
Time for some data cleaning! When Lee brought the ABBYY FineReader output .txt files into Word to correct my bad transcriptions, Word “helpfully” replaced all the regular, straight single and double quotes with their curly equivalents.
I wrote some code that opened every text file in my folder, searched for opening and closing curly quotes and replaced them with the “straight quote” character (a quotation mark that doesn’t differentiate opening and closing quotes). While I was at it, I saw that some of the texts weren’t using the straight single quote for the apostrophe, so I put that in there, too. This code overwrites the text files in the folder (rather than creating a new version) so if you want to keep your originals, make sure you have a copy elsewhere.
# Look for files in the source directory that end in .txt
for filename in os.listdir(filedirectory):
if filename.endswith(".txt"):
#Open each file that ends in .txt
f = open(filename, 'r')
#Read the text
text = f.read()
#Replace curly double-quote with straight double-quote
lines = text.replace("“", '"')
lines = lines.replace('”', '"')
#Replace curly singl-quote with straight single-quote
lines = lines.replace('’', "'")
#Write output to a new file with the same name as the original, overwriting the original file.
with open(filename, 'w') as out:
out.writelines(lines)
7. French NER, second try#
I didn’t make any changes to the code from step 5, but check out the difference in the results. Gone are those quotation marks as so-called entities – along with all the examples of j’ai, c’, etc. All of those were showing up because they were using the curly single quote character, and that was messing up spaCy’s model.
#Sort all the files in the directory you specified above, alphabetically.
#For each of those files...
for filename in sorted(os.listdir(filedirectory)):
#If the filename ends with .txt (i.e. if it's actually a text files)
if filename.endswith('.txt'):
#Write out below the name of the file
print(filename)
#The file name of the output file adds _ner_per to the end of the file name of the input file
outfilename = filename.replace('.txt', '_ner_per.txt')
#Open the infput filename
with open(filename, 'r') as f:
#Create and open the output filename
with open(outfilename, 'w') as out:
#Read the contents of the input file
chaptertext = f.read()
#Do French NLP on the contents of the input file
chapterner = frnlp(chaptertext)
#For each recognized entity
for ent in chapterner.ents:
#If that entity is labeled as a person
if ent.label_ == 'PER':
#Print the entity, and the label (which should be PER)
print(ent.text, ent.label_)
#Write the entity to the output file
out.write(ent.text)
#Write a newline character to the output file
out.write('\n')
016_01_bg.txt
Maman PER
Roseline PER
Victor PER
Gringalet PER
Justine Victoire PER
Johanna PER
Marjorie Levêque PER
Johanna PER
Marjorie PER
Roseline PER
Roseline PER
Gringalet PER
J' PER
Justine PER
J' PER
Coppélia PER
J' PER
Coppélia PER
Roseline PER
Coppélius PER
Coppélia PER
Franz PER
Roseline PER
Franz PER
Coppélia PER
Coppé PER
Franz PER
Roseline PER
Allez PER
Roseline PER
Maman PER
Roseline PER
Roseline PER
Roseline PER
016_01_frmod.txt
Stonebrook PER
Stonebrook PER
John Philip Ramsey Junior PER
P'tit Bout PER
Jessica Ramsey PER
Bout PER
Keisha PER
Mes PER
Stonebrook PER
Heureusement PER
Mallory Pike PER
P'tit Bout PER
Bout PER
Bout PER
P'tit Bout PER
J' PER
Jessica PER
J' PER
Coppélia PER
Coppélia PER
Coppélius PER
Coppélia PER
Franz PER
Franz PER
Coppélia PER
Coppélius PER
Franz PER
Maman PER
Mme Noelle PER
Maman PER
016_01_qu.txt
Jaja PER
Jessica Raymond PER
Jaja PER
Noirs PER
Kara PER
Oakley PER
Marjorie Picard PER
Kara PER
Marjorie PER
Jaja PER
Jaja PER
Coppélia PER
Coppélia PER
Coppélia PER
Franz PER
Franz PER
Coppélia PER
Franz PER
016_02_bg.txt
Valérie PER
Marjorie Levêque PER
Valérie PER
Valérie PER
Marjorie PER
Valérie PER
Madame Demoulin PER
Valérie PER
Mélanie Moreau PER
Julie Kishi PER
Sophie Lambert PER
Sophie PER
Carole Leroy PER
Valérie PER
Valérie PER
Valérie PER
Valérie PER
Valérie PER
Stéphane PER
Valérie PER
Yvan Arnould PER
Arnaud PER
Valérie PER
Valérie PER
Julie Kishi PER
Yvan PER
Demoulin PER
Yvan PER
Yvan PER
Valérie PER
Valérie PER
Julie Kishi PER
Marjorie PER
Julie PER
Julie PER
Laurence PER
Julie PER
Mimi PER
Carole PER
Mélanie PER
Valérie PER
Valérie PER
Mélanie PER
Valérie PER
Mélanie PER
Moreau PER
Sophie PER
Sophie PER
Carole PER
Valérie PER
Mélanie PER
Marjorie PER
David PER
David PER
Marjorie PER
Marjorie PER
Marjorie PER
Bruno Lejeune PER
Cécile Gauthier PER
Valérie PER
Marjorie PER
Valérie PER
Valérie PER
Mélanie PER
Carole PER
Marjorie PER
Valérie PER
Laissez PER
Valérie PER
Agnès PER
Mathieu PER
Mathieu PER
Carole PER
Mélanie PER
Valérie PER
Marjorie PER
Marjorie PER
Garder PER
Marjorie PER
Justine PER
016_02_frmod.txt
Stonebrook PER
Laissez PER
David Michael PER
Mme Parker PER
Mary PER
Anne Cook PER
Claudia Koshi PER
Carla Schafer PER
Claudia PER
Mary Anne PER
Samuel PER
David Michael PER
Jim Lelland PER
Karen PER
Claudia Koshi PER
Claudia PER
Mary PER
Anne PER
Jim PER
Parker PER
Jim PER
Samuel PER
Claudia PER
Stonebrook PER
Mary Anne PER
Claudia Koshi PER
Claudia PER
Claudia PER
Claudia PER
Mary PER
Anne PER
Claudia PER
Claudia PER
Jane PER
Claudia PER
Claudia PER
Mimi PER
Mary PER
Anne Cook PER
Carla PER
Mary PER
Anne PER
Mary PER
Anne PER
Mary Anne PER
Claudia PER
Mary PER
Anne PER
M. Cook PER
Lucy MacDouglas PER
Ramsey PER
Carla Schafer PER
Mary PER
Anne PER
Carla PER
Claudia PER
David PER
David PER
Mallory PER
Mary PER
Anne PER
Louisa Kilbourne PER
Laissez PER
Mme Braddock PER
Helen PER
Mme Braddock PER
Carla PER
Mary PER
Anne PER
Claudia PER
Mallory PER
Mallory PER
Garder PER
016_02_qu.txt
Christine Thomas PER
Marjorie Picard PER
Christine Thomas PER
II PER
Marjorie PER
David PER
Sophie Ménard PER
Anne PER
Claudia PER
Sophie PER
Marjorie PER
Diane Dubreuil PER
Claudia PER
Anne-Marie PER
Diane PER
Marjorie PER
Pourquoi PER
Charles PER
David PER
Guillaume PER
Karen PER
Claudia PER
Anne PER
Marie PER
Guillaume PER
Guillaume PER
Charles PER
Anne-Marie PER
Claudia Kishi PER
Marjo PER
Claudia PER
Claudia PER
Claudia PER
Claudia PER
Marjorie PER
Anne PER
Marie PER
Claudia PER
Claudia PER
Josée PER
Claudia PER
Mimi PER
Anne PER
Diane PER
Anne PER
Marie PER
Anne PER
Marie PER
Claudia PER
Anne PER
Marie PER
Tigrou PER
Monsieur Lapierre PER
Sophie Ménard PER
Diane PER
Diane PER
Anne PER
Marie PER
Diane PER
Julien PER
Julien PER
Diane PER
Marjorie PER
Louis Brunet PER
Anne PER
Marie PER
Chantal Chrétien PER
Diane PER
Claudia PER
Anne-Marie PER
Diane PER
Marjorie PER
madame Biron PER
Matthieu PER
Madame Biron PER
Matthieu PER
Madame Biron PER
Diane PER
Anne PER
Marie PER
Claudia PER
Marjorie PER
madame Biron PER
Marjorie PER
016_03_bg.txt
Jacqué PER
Pliez PER
Coppélia PER
Maman PER
Madame Dillon PER
Hélène PER
Catherine PER
Hélène PER
Catherine PER
Victor PER
Victoire PER
Catherine PER
Madame Dillon PER
Hélène PER
Catherine PER
Madame Dillon PER
Coppélia PER
Madame Dillon PER
Hélène PER
Catherine PER
Madame Dillon PER
Marie Bernstein PER
Lise Jacqué PER
Carine Schmitt PER
Catherine PER
Coppélia PER
Madame Dillon PER
Hélène PER
Catherine PER
Coppélia PER
Coppélia PER
Justine Victor PER
Justine Victor PER
Justine Victoire PER
Justine PER
Justine PER
Justine PER
Marie Bernstein PER
Lise Jacqué PER
Justine PER
Marie PER
Lise PER
Hélène PER
Catherine PER
Coppélia PER
Catherine PER
Catherine PER
Coppélia PER
Catherine PER
Hélène PER
Hélène PER
Catherine PER
Catherine PER
Hélène PER
Avais PER
Coppélia PER
016_03_frmod.txt
mademoiselle Jones PER
Pliez PER
Mme Noelle PER
Mme Noelle PER
Coppélia PER
Maman PER
Mme Noelle PER
Mlle Romsey PER
Ramsey PER
Mme Noelle PER
Mme Noelle PER
Mme Noelle PER
Allez PER
Sachant PER
Coppélia PER
Mme Noelle PER
Mme Noelle PER
Mary Bramstedt PER
Lisa Jones PER
Carrie Steinfeld PER
Hilary PER
Coppélia PER
Mme Noelle PER
Coppélia PER
Coppélia PER
Mme Noelle PER
Jessica Romsey PER
Jessica Ramsey PER
Mme Noelle PER
Mary Bramstedt PER
Lisa Jones PER
Mary PER
Hilary PER
Coppélia PER
Coppélia PER
Mme Noelle PER
Avais PER
Mme Noelle PER
016_03_qu.txt
Marcil PER
Pliez PER
Mademoiselle PER
Catherine PER
Élizabeth PER
Raymond! PER
Catherine PER
Élizabeth PER
Mademoiselle PER
Catherine PER
Élizabeth PER
Mademoiselle Noëlle PER
Coppélia PER
Mademoiselle Noëlle PER
Catherine PER
Élizabeth PER
Marie Brazeau PER
Lise Jordan PER
Catherine PER
Coppélia PER
Catherine PER
Coppélia PER
Jessica Raymond PER
Jessica Raymond PER
Ton PER
Marie PER
Marie PER
Catherine PER
Élizabeth PER
Coppélia PER
Élizabeth PER
Allez PER
Élizabeth PER
Coppélia PER
Catherine PER
Catherine PER
016_04_bg.txt
Hélène PER
Catherine PER
Mathieu Brinbeuf PER
Mathieu PER
Agnès PER
Justine PER
Agnès PER
Agnès PER
Mathieu PER
Mathieu PER
Agnès PER
Agnès PER
Agnès PER
Mathieu PER
Agnès PER
Justine PER
Agnès PER
Mathieu PER
Mathieu PER
Mathieu PER
Mathieu PER
Mathieu PER
Agnès PER
C. Le PER
Roseline PER
Brinbeuf PER
Mathieu PER
Mathieu PER
Mathieu PER
Agnès PER
Madame Brinbeuf PER
Mathieu PER
Mathieu PER
Madame Brinbeuf PER
Mathieu PER
Mathieu PER
Justine PER
Mathieu PER
Madame Brinbeuf PER
M PER
Mathieu PER
Mathieu PER
Madame Brinbeuf PER
Agnès PER
Mathieu PER
Agnès PER
Mathieu PER
Demande PER
Mathieu PER
Agnès PER
Agnès PER
Mathieu PER
Agnès PER
Mathieu PER
Agnès PER
Mathieu PER
Agnès PER
madame Brinbeuf PER
Madame Brinbeuf PER
016_04_frmod.txt
Coppélia PER
Bout PER
Mme Noelle PER
– Y PER
Ben PER
Helen PER
Helen PER
Mme Braddock PER
Mme Braddock PER
Helen PER
Helen PER
Mme Braddock PER
Helen PER
Matthew grandira PER
Helen PER
C. Le PER
Mme Braddock PER
Mme Braddock PER
Mme Braddock PER
Mme Braddock PER
Matthew PER
Sa mère PER
Helen PER
Demande PER
Helen PER
Helen PER
Heureusement PER
Mme Braddock PER
Mme Braddock PER
016_04_qu.txt
Jaja PER
Catherine PER
Élizabeth PER
madame Biron PER
Matthieu PER
Lundi PER
Jessica PER
madame Biron PER
madame Biron PER
Matthieu PER
madame Biron PER
Matthieu PER
Matthieu PER
Matthieu PER
madame Biron PER
Matthieu PER
Matthieu PER
Imagine PER
Matthieu PER
Matthieu PER
Madame Biron PER
Matthieu PER
madame Biron PER
Matthieu PER
madame Biron PER
Madame Biron PER
Matthieu PER
madame Biron PER
Matthieu PER
Madame Biron PER
Hélène PER
madame Biron PER
Madame Biron PER
Matthieu PER
Matthieu PER
Hélène PER
Matthieu PER
Matthieu PER
Hélène PER
Matthieu PER
madame Biron PER
Fais PER
Madame Biron PER
J' PER
016_05_bg.txt
Mercredi PER
Justine PER
Mélanie Moreau PER
Aurélie Precisio PER
Mélanie PER
Madame Precisio PER
Mélanie PER
Aurélie PER
Roseline PER
Roseline PER
Aurélie PER
Madame Precisio PER
Mélanie PER
Aurélie PER
Termine PER
Mélanie PER
Aurélie PER
Mélanie PER
Piqu' PER
Mélanie PER
Ricky PER
Aurélie PER
Mélanie PER
Mélanie PER
Aurélie PER
Veux PER
Aurélie PER
Aurélie PER
Aurélie PER
Mélanie PER
Mathieu PER
Agnès PER
Brinbeuf PER
Mathieu PER
Agnès PER
Agnès PER
Mathieu PER
Aurélie PER
Agnès PER
Mathieu PER
Mathieu PER
Mathieu PER
Agnès PER
Mathieu PER
Aurélie PER
Agnès PER
Mathieu PER
Agnès PER
Agnès PER
Justine PER
Mélanie PER
Agnès PER
Mathieu PER
Mathieu PER
Mathieu PER
Agnès PER
Agnès PER
Mathieu PER
Agnès PER
Mathieu PER
016_05_frmod.txt
Mary PER
Anne Cook PER
Jenny Prezzioso PER
Mary PER
Anne PER
Jenny PER
Mary Anne PER
Mme Prezzioso PER
Jenny PER
Mme Prezzioso PER
Mary Anne PER
Jenny PER
Termine PER
Jen PER
Mary PER
Anne PER
Mary PER
Anne PER
Jenny PER
Mary Anne PER
Mary Anne PER
Jenny PER
Bambi PER
Mary Anne PER
Mary PER
Anne PER
Jenny PER
Mary Anne PER
Mary Anne PER
Jenny PER
Mary PER
Anne PER
Mme Braddock PER
Helen PER
Mary Anne PER
Helen PER
Jenny PER
Jenny PER
Helen PER
Jenny PER
Helen PER
Jenny PER
Mary PER
Anne PER
Helen PER
Helen PER
Jessica PER
Mary Anne PER
Helen PER
016_05_qu.txt
Mercredi PER
Jeanne PER
Jeanne PER
Anne PER
Marie PER
Jeanne Prieur PER
Anne PER
Marie PER
Jeanne PER
Anne PER
Marie PER
madame Prieur PER
Jeanne PER
Connaissez PER
Jeanne PER
Anne PER
Marie PER
Prieur PER
Jeanne PER
Jeanne PER
madame Prieur PER
Anne-Marie PER
Jeanne PER
Jeanne PER
Jeanne PER
Pensez PER
Anne PER
Marie PER
Jeanne PER
Anne PER
Marie PER
Jeanne PER
Anne PER
Marie PER
Jeanne PER
Jeanne PER
Anne PER
Marie PER
Anne-Marie PER
Jeanne PER
Anne-Marie PER
Anne PER
Marie PER
Jeanne PER
Anne PER
Marie PER
Anne PER
Marie PER
Anne PER
Marie PER
Jeanne PER
Jeanne PER
Anne PER
Marie PER
Anne-Marie PER
Jeanne PER
Anne-Marie PER
Matthieu PER
madame Biron PER
Anne PER
Marie PER
Jeanne PER
Anne PER
Marie PER
Jeanne PER
Jeanne PER
Anne-Marie PER
Jeanne PER
Jeanne PER
Matthieu PER
Matthieu PER
Hélène PER
Jeanne PER
Jeanne PER
Jeanne PER
Hélène PER
Matthieu PER
Jeanne PER
Anne-Marie PER
Hélène PER
Anne-Marie PER
Hélène PER
Matthieu PER
Matthieu PER
Matthieu PER
Matthieu PER
016_06_bg.txt
Agnès PER
Mathieu PER
Coppélia PER
Ordinairement PER
Mathieu PER
Agnès PER
Agnès PER
Mathieu PER
Mathieu PER
Madame Brinbeuf PER
Mathieu PER
Agnès PER
Agnès PER
Madame Brinbeuf PER
Mathieu PER
Mathieu PER
Agnès PER
Justine PER
Madame Brinbeuf PER
Agnès PER
Mathieu PER
Mathieu PER
Mathieu PER
Mathieu PER
Agnès PER
Mathieu PER
Mathieu PER
Mathieu PER
Agnès PER
Mélanie PER
Aurélie Precisio PER
Agnès PER
Mathieu PER
Charlotte Cuvelier PER
Roseline PER
Mathieu PER
Agnès PER
Agnès PER
Agnès PER
Mathieu PER
Agnès PER
Agnès PER
Mathieu PER
Vanessa PER
Mathieu PER
Agnès PER
Mathieu PER
Agnès PER
Mathieu PER
Dis PER
Agnès PER
Mathieu PER
Mathieu PER
Marjorie PER
Antoine PER
Agnès PER
Mathieu PER
Agnès PER
Marjorie PER
Mathieu PER
Pensez PER
Marjorie PER
Laurent PER
Vanessa PER
Agnès PER
Juliette PER
Mathieu PER
Juliette PER
Agnès PER
Juliette PER
Mathieu PER
Agnès PER
Mathieu PER
Antoine Godefroid PER
Agnès PER
Mathieu PER
Antoine PER
Mathieu PER
Marjorie PER
Brinbeuf PER
Agnès PER
Mathieu PER
Mathieu PER
Agnès PER
016_06_frmod.txt
Coppélia PER
Helen PER
Helen PER
Mme Braddock PER
Mme Braddock PER
Helen PER
Helen PER
Mme Braddock PER
Helen PER
Helen PER
Mme Braddock PER
Manger PER
Mary Anne PER
Jenny Prezzioso PER
Helen PER
Charlotte Johanssen PER
Helen PER
Helen PER
Helen PER
Vanessa PER
Helen PER
Helen PER
Dis PER
K PER
Barrett PER
Buddy PER
Barrett PER
Helen PER
Margot PER
Helen PER
Claire PER
Helen PER
Helen PER
Buddy Barrett PER
Helen PER
Buddy PER
Mme Braddock PER
Helen PER
016_06_qu.txt
Hélène PER
Coppélia PER
Matthieu PER
Matthieu PER
Matthieu PER
madame Biron PER
Rappelle PER
Matthieu PER
Madame Biron PER
madame Biron PER
madame Biron PER
Matthieu PER
Hélène PER
madame Biron PER
Matthieu PER
madame Biron PER
Matthieu PER
Matthieu PER
madame Biron PER
Matthieu PER
Matthieu PER
Jeanne Prieur PER
Matthieu PER
Charlotte PER
Picard PER
Matthieu PER
Hélène PER
Vanessa PER
Matthieu PER
Matthieu PER
Hélène PER
Matthieu PER
Picard PER
Matthieu PER
Marjorie PER
Matthieu PER
Marjorie PER
Picard PER
Matthieu PER
Margot PER
Matthieu PER
Claire PER
Picard PER
Claire PER
Hélène PER
Bruno Barrette PER
Bruno PER
Matthieu PER
madame Biron PER
Matthieu PER
016_07_bg.txt
Justine PER
Marjorie PER
Justine PER
Monsieur PER
Levêque PER
Marjorie PER
Alain PER
Loïc PER
Samuel PER
Laurent PER
Levêque PER
Marjorie PER
Marjorie PER
Carole PER
Alain PER
Alain PER
Hé PER
Hé PER
Juliette PER
Alain PER
Alain PER
Loïc PER
Samuel PER
Carole PER
Alain PER
Marjorie PER
Laurent PER
Laurent PER
Marjorie PER
Laurent PER
Carole PER
Samuel PER
Vanessa PER
Alain PER
Anaïs PER
Laurent PER
Marjorie PER
Laurent PER
Carole PER
Laurent PER
Samuel PER
Juliette PER
Marjorie PER
Marjorie PER
Vanessa PER
Marjorie PER
Carole PER
Marjorie PER
Carole PER
Marjorie PER
Agnès PER
Mathieu PER
Marjorie PER
Mathieu PER
Marjorie PER
Mathieu PER
Agnès PER
Agnès PER
016_07_frmod.txt
Mallory PER
Peut PER
Peut PER
Laissez PER
Mme Pike PER
Mallory PER
Byron PER
Adam PER
Vanessa PER
Margot PER
Claire PER
M. PER
Mme Pike PER
Mme Pike PER
Carla PER
Mallory PER
Adam PER
Adam PER
Claire PER
Adam PER
Adam PER
Byron PER
Carla PER
Adam PER
Mallory PER
Carla PER
Byron PER
Margot PER
Vanessa PER
Adam PER
Margot PER
Carla PER
Vanessa PER
Claire PER
Vanessa PER
Mallory PER
Mallory PER
Helen PER
Helen PER
016_07_qu.txt
Marjorie PER
Picard PER
Diane PER
Monsieur PER
Picard PER
Diane PER
Marjorie PER
Picard PER
Claire PER
Diane PER
Picard PER
Picard PER
Diane PER
Picard PER
Marjorie PER
Diane PER
Picard PER
Diane PER
Antoine PER
Antoine PER
Claire PER
Antoine PER
Antoine PER
Antoine PER
Donne PER
Diane PER
Antoine PER
Marjorie PER
Antoine PER
Diane PER
Marjorie PER
Marjorie PER
Diane PER
Diane PER
Vanessa PER
Antoine PER
Margot PER
Diane PER
Marjorie PER
Diane PER
Diane PER
Allez PER
Picard PER
Diane PER
Diane PER
Vanessa PER
Margot PER
Claire PER
Diane PER
Marjorie PER
Marjorie PER
Vanessa PER
Cerveau PER
Diane PER
Marjorie PER
Diane PER
Diane PER
Diane PER
Matthieu PER
Marjorie PER
Picard PER
Hélène PER
Hélène PER
016_08_bg.txt
Mes PER
Interpréter PER
ONNE PER
Mademoiselle Perron PER
OK PER
Mademoiselle Bersnstein PER
Prochaine PER
Catherine PER
Hélène PER
Catherine PER
Catherine PER
Catherine PER
Agnès PER
Catherine PER
Roseline PER
Catherine PER
Adeline PER
Catherine PER
Adeline PER
Catherine PER
Mes PER
Catherine PER
Adeline PER
Catherine PER
Catherine PER
Catherine PER
Adeline PER
Catherine PER
Catherine PER
Catherine PER
Catherine PER
Mathieu PER
Mathieu PER
Mathieu PER
Adeline PER
Catherine PER
Catherine PER
Regarde PER
Catherine PER
Perron PER
Catherine PER
Adeline PER
Catherine PER
Adeline PER
Catherine PER
Mes PER
Johanna PER
Marjorie PER
Justine PER
Coraline PER
Valérie PER
Valérie PER
Stéphane PER
016_08_frmod.txt
Mes PER
Interpréter PER
Coppélia PER
Mme Noelle PER
Mademoiselle Parson PER
Mademoiselle Bramstedt PER
Prochaine PER
Mme Noelle PER
Était PER
Adèle PER
Mme Noelle PER
Adèle PER
Adèle PER
Adèle PER
Adèle PER
Adèle PER
Adèle PER
Adèle PER
Adèle PER
Regarde PER
Adèle PER
Adèle PER
Adèle PER
Adèle PER
Adèle PER
Mes PER
016_08_qu.txt
Coppélia PER
Mademoiselle Croteau PER
Marie PER
Mademoiselle Raymond PER
Élizabeth PER
Catherine PER
Catherine PER
Élizabeth PER
Élizabeth PER
Hélène PER
Adèle PER
Adèle PER
Adèle PER
Mes PER
Essaye PER
Adèle PER
Adèle PER
Élizabeth PER
Adèle PER
Adèle PER
Adèle PER
Adèle PER
Matthieu PER
Matthieu PER
Adèle PER
Matthieu PER
Matthieu PER
Adèle PER
Regarde PER
Élizabeth PER
Adèle PER
Élizabeth PER
Adèle PER
Adèle PER
Élizabeth PER
Adèle PER
Adèle PER
Adèle PER
Kara PER
Marjorie PER
016_09_bg.txt
Justine PER
Valérie PER
Valérie PER
Stéphane PER
Mathieu PER
Arnaud PER
Valérie PER
Arnould PER
Julie PER
Yvan PER
Arnaud PER
Arnaud PER
Benoît Arnould PER
Arnaud PER
Valérie PER
Stéphane PER
Valérie PER
Valérie PER
Stéphane PER
Valérie PER
Valérie PER
madame Arnaud PER
Monsieur Arnaud PER
Arnaud PER
Arnaud PER
Valérie PER
Julie PER
Arnaud PER
Arnaud PER
Arnaud PER
Arnaud PER
Veux PER
Arnaud PER
Julie PER
Arnaud PER
Arnould PER
Arnaud PER
Hum PER
Arnaud PER
Julie PER
Julie PER
Justine Victoire PER
Roseline PER
Julie PER
Marjorie PER
Julie PER
Arnaud PER
Julie PER
Arnaud PER
Arnaud PER
Arnaud PER
Arnaud PER
Arnaud PER
Arnaud PER
Arnaud PER
Mademoiselle Minet PER
Julie PER
Julie PER
Justine PER
Arnaud PER
Arnaud PER
016_09_frmod.txt
Karen PER
Andrew PER
David Michael PER
Samuel PER
Karen PER
Claudia PER
Claudia PER
Claudia PER
Karen PER
Claudia PER
Claudia PER
Claudia PER
Karen PER
Jim PER
Karen PER
Karen PER
Mme Porter PER
Morbidda Destiny PER
Ben Lelland PER
Karen PER
Claudia PER
Karen PER
Andrew PER
David Michael PER
Charlie PER
Samuel PER
Claudia PER
Claudia PER
Karen PER
Claudia PER
Karen PER
Claudia PER
Claudia PER
Mme Lelland PER
Karen PER
Mme Lelland PER
Claudia PER
Andrew PER
Karen PER
David Michael PER
Karen PER
Claudia PER
Karen PER
David Michael PER
Andrew PER
Mme Lelland PER
Claudia PER
Karen PER
Claudia PER
Andrew PER
David Michael PER
Claudia PER
Claudia PER
Karen PER
Claudia PER
Lego PER
Karen PER
David Michael PER
Claudia PER
Calmez PER
– Karen PER
Claudia PER
Claudia PER
Karen PER
Claudia PER
Karen PER
David Michael PER
Karen PER
David Michael PER
Claudia PER
Karen PER
Karen PER
Jessica Ramsey PER
Claudia PER
Claudia PER
Claudia PER
Karen PER
Karen PER
Karen PER
Claudia PER
Karen PER
Claudia PER
Karen PER
Karen PER
Claudia PER
Karen PER
– Écoute PER
Karen PER
Claudia PER
Claudia PER
Claudia PER
Claudia PER
Karen PER
Tickly PER
Claudia PER
Karen PER
Claudia PER
Karen PER
Claudia PER
Karen PER
Claudia PER
Karen PER
Claudia PER
Karen PER
Claudia PER
David Michael PER
016_09_qu.txt
Samedi PER
Karen PER
Charles PER
Karen PER
Matthieu PER
Claudia PER
Karen PER
Karen PER
Laissez PER
Claudia PER
La mère PER
Claudia PER
Karen PER
Karen PER
Portai PER
Destinée Morbide PER
Karen PER
Claudia PER
Karen PER
Karen PER
Karen PER
Claudia PER
Karen PER
Karen PER
Claudia PER
Claudia PER
madame Marchand PER
Guillaume PER
Ah PER
Karen PER
Karen PER
Karen PER
Claudia PER
Guillaume PER
Papadakis PER
Karen PER
Claudia PER
Claudia PER
David PER
Karen PER
Claudia PER
Lego PER
Claudia PER
David PER
Claudia PER
Karen PER
Claudia PER
Claudia PER
Karen PER
Claudia PER
Karen PER
David PER
Claudia PER
Claudia PER
Karen PER
Karen PER
Claudia PER
Jessie Raymond PER
Claudia PER
Claudia PER
Karen PER
Karen PER
Mimer PER
Karen PER
Claudia PER
Karen PER
Claudia PER
Claudia PER
Karen PER
Karen PER
Claudia PER
Claudia PER
Claudia PER
Claudia PER
Karen PER
Claudia PER
Karen PER
Karen PER
Karen PER
Claudia PER
Karen PER
Claudia PER
Claudia PER
Karen PER
Claudia PER
David PER
Claudia PER
016_10_bg.txt
Mathieu PER
Agnès PER
Agnès PER
Agnès PER
Mathieu PER
Aurélie Precisio PER
Mathieu PER
Apprendre PER
Vanessa Levêque PER
Vanessa PER
Agnès PER
Laurent PER
Mathieu PER
Antoine Godefroid PER
Agnès PER
Mathieu PER
Mathieu PER
Antoine Godefroid PER
Mathieu PER
Vanessa PER
Agnès PER
Laurent PER
Agnès PER
Agnès PER
Antoine PER
Laurent PER
Mathieu PER
Mathieu PER
Loïc PER
Agnès PER
Agnès PER
Mathieu PER
Agnès PER
Mathieu PER
Mathieu PER
Agnès PER
Mathieu PER
Agnès PER
Gringalet PER
Agnès PER
Mathieu PER
Agnès PER
Agnès PER
Mathieu PER
Mathieu PER
Mathieu PER
Laurent PER
Alain PER
Agnès PER
Justine PER
Agnès PER
Roseline PER
Roseline PER
Agnès PER
Regarde PER
Mathieu PER
Mathieu PER
Roseline PER
Roseline PER
Agnès PER
Agnès PER
Mathieu PER
Mathieu PER
Agnès PER
Mathieu PER
Agnès PER
Mathieu PER
Agnès PER
Agnès PER
Coppélia PER
Agnès PER
016_10_frmod.txt
Mme Braddock PER
Helen PER
Mme Braddock PER
Jenny Prezzioso PER
Apprendre PER
Vanessa Pike PER
Vanessa PER
Helen PER
Buddy Barrett PER
Helen PER
Buddy Barrett PER
Helen PER
Helen PER
Buddy PER
Hors-jeu PER
Helen PER
Helen PER
Helen PER
Helen PER
P'tit Bout PER
Helen PER
Byron PER
Adam PER
Helen PER
Helen PER
Helen PER
Stonebrook PER
Stamford PER
Regarde PER
Helen PER
H PER
P'tit Bout PER
Dada PER
Helen PER
Helen Keller PER
Matthew PER
Helen PER
Adèle PER
Mme Noelle PER
Coppélia PER
Mme Noelle PER
016_10_qu.txt
Madame Biron PER
Hélène PER
Madame Biron PER
Matthieu PER
Matthieu PER
Picard PER
Jeanne Prieur PER
Matthieu PER
Apprendre PER
Hélène PER
Bruno Barrette PER
Hélène PER
Matthieu PER
Matthieu PER
Bruno Barrette PER
Matthieu PER
Vanessa PER
Bruno PER
Matthieu PER
Matthieu PER
Joël PER
Hélène PER
Matthieu PER
Sinon PER
Matthieu PER
Matthieu PER
Matthieu PER
Hélène PER
Regarde Matthieu PER
Matthieu PER
Jaja PER
Jaja PER
Regarde PER
H PER
Matthieu PER
Matthieu PER
Jaja PER
Jaja PER
Matthieu PER
Helen Keller PER
Hélène PER
Hélène PER
Adèle PER
016_11_bg.txt
madame Brinbeuf PER
Mes PER
Valérie PER
Valérie PER
Mélanie PER
Carole PER
Marjorie PER
Julie PER
Julie PER
J' PER
Valérie PER
Carole PER
Valérie PER
Valérie PER
Carole PER
Valérie PER
Valérie PER
Mélanie PER
Valérie PER
Mélanie PER
Valérie PER
Justine PER
Mathieu PER
Mathieu PER
Agnès PER
Agnès PER
Mathieu PER
Exact PER
Valérie PER
Carole PER
Hé PER
Mélanie PER
Bruno PER
Bruno PER
Mélanie PER
Bruno PER
Justine PER
Julie PER
Carole PER
Valérie PER
Avez PER
Mélanie PER
Bruno PER
Mélanie PER
Valérie PER
Valérie PER
Julie PER
Valérie PER
Carole PER
Hum! PER
Hé PER
Valérie PER
Marjorie PER
Bonjour madame Brinbeuf PER
Marjorie PER
Mathieu PER
Vas PER
Valérie PER
Mélanie PER
Marjorie PER
Coppélia PER
Aller PER
016_11_frmod.txt
Mme Noelle PER
Mme Braddock PER
Claudia PER
Mes PER
Mme Noelle PER
Mary Anne PER
Carla PER
Claudia PER
Mallory PER
Claudia PER
Claudia PER
Claudia PER
Claudia PER
Samuel PER
Claudia PER
Heureusement PER
Mary Anne PER
Helen PER
Claudia PER
– Hé PER
Mary Anne PER
Mary PER
Anne PER
Mary Anne PER
Claudia PER
Là PER
Mary Anne PER
Carla PER
Mary PER
Anne PER
Mary PER
Anne PER
Savez PER
Claudia PER
Samuel PER
Claudia PER
Mallory PER
Bonjour PER
Mallory PER
Vas PER
Mary Anne PER
Coppélia PER
Aller PER
016_11_qu.txt
madame Biron PER
Claudia PER
Anne PER
Marie PER
Claudia PER
Diane PER
Claudia PER
Claudia PER
Claudia PER
Claudia PER
Claudia PER
Claudia PER
Diane PER
Charles PER
Charles PER
Trousses PER
Trousses PER
Diane PER
Claudia PER
Diane PER
Anne PER
Marie PER
Matthieu PER
Anne PER
Claudia PER
Marjorie PER
Diane PER
Claudia PER
Matthieu PER
Anne PER
Marie PER
Claudia PER
Louis! PER
Louis PER
Anne PER
Marie PER
Anne PER
Marie PER
Claudia PER
Anne PER
Marie PER
Diane PER
Diane PER
Jaja PER
Anne PER
Marie PER
Avez PER
Diane PER
Louis PER
Anne-Marie PER
Claudia PER
Charles PER
Diane PER
Marjorie PER
Claudia PER
madame Biron PER
Matthieu PER
Marjorie PER
Aller PER
016_12_bg.txt
madame Brinbeuf PER
Prête PER
Madame Brinbeuf PER
Mathieu PER
Mathieu PER
Mathieu PER
Mathieu PER
Madame Brinbeuf PER
Justine PER
professeur de Mathieu PER
Madame Brinbeuf PER
Bonjour madame Brinbeuf PER
Justine PER
madame Franck PER
Brinbeuf PER
madame Franck PER
classe de Mathieu PER
Madame Franck PER
Madame Brinbeuf PER
Franck PER
Assieds PER
Franck PER
Madame Franck PER
Franck PER
Justine Victoire PER
E V PER
Franck PER
Madame Franck PER
Justine PER
Mathieu Brinbeuf PER
Franck PER
Coppélia PER
Brinbeuf PER
Coppélia PER
Voulez PER
Mathieu PER
Madame Franck PER
Coppélia PER
Mathieu PER
madame Franck PER
Madame Franck PER
Franck PER
madame Brinbeuf PER
Mathieu PER
Mathieu PER
Mathieu PER
Mathieu PER
016_12_frmod.txt
Mme Braddock PER
Mme Braddock PER
Mme Braddock PER
Mme Braddock PER
Bonjour PER
Mme Franck PER
Mme Braddock PER
Mme Franck PER
Assieds PER
Jessica Ramsey PER
Mme Franck PER
Matthew Braddock PER
Mme Franck PER
Ah PER
Coppélia PER
Mme Braddock PER
Coppélia PER
Voulez PER
Mme Franck PER
Coppélia PER
Mme Franck PER
Mme Franck PER
Mme Braddock PER
016_12_qu.txt
Ma mère PER
madame Biron PER
madame Biron PER
Madame Biron PER
madame Biron PER
Matthieu PER
madame Biron PER
Matthieu PER
Matthieu PER
Matthieu PER
professeur de Matthieu PER
Madame Biron PER
madame Biron PER
Matthieu PER
madame Biron PER
Madame Biron PER
Madame Biron PER
Jessie Raymond PER
M PER
N PER
Matthieu Biron PER
Jessie PER
Coppélia PER
Coppélia PER
Voulez PER
Coppélia PER
Matthieu PER
Matthieu PER
016_13_bg.txt
Mercredi PER
Justine PER
Gringalet PER
Roseline PER
Justine PER
Roseline PER
Charlotte Cuvelier PER
Roseline PER
Charlotte PER
Valérie PER
Roseline PER
Sophie Lambert PER
Roseline PER
Roseline PER
Roseline PER
Valérie PER
Roseline PER
Roseline PER
Valérie PER
Gringalet PER
Roseline PER
Hé PER
Roseline PER
Valérie PER
Roseline PER
Gringalet PER
Roseline PER
Valérie PER
Valérie PER
Roseline PER
Valérie PER
Valérie PER
Roseline PER
Roseline PER
Valérie PER
Mathieu PER
Roseline PER
Ma mère PER
Mettre PER
Valérie PER
Roseline PER
Valérie PER
Charlotte PER
Roseline PER
Charlotte Cuvelier PER
Roseline PER
Charlotte PER
Roseline PER
Charlotte PER
Sophie PER
Roseline PER
maison de Sophie PER
Charlotte PER
Roseline PER
Charlotte PER
Roseline PER
Charlotte PER
Charlotte PER
Roseline PER
Charlotte PER
Charlotte PER
Charlotte PER
Bonjour Charlotte PER
Roseline PER
Charlotte PER
Justine PER
Roseline PER
Charlotte PER
Justine PER
Roseline PER
Justine PER
Valérie PER
Coppernicus PER
Roseline PER
Coppélia PER
Valérie PER
Justine PER
Roseline PER
Charlotte PER
Valérie PER
Valérie PER
Gringalet PER
Valérie PER
Roseline PER
Charlotte PER
Justine PER
Roseline PER
Valérie PER
Valérie PER
Charlotte PER
Roseline PER
Charlotte PER
Valérie PER
Valérie Demoulin PER
016_13_frmod.txt
Jessica PER
Jessica PER
Charlotte PER
P'tit Bout PER
Bout PER
P'tit Bout PER
Bout PER
Bout PER
Bout PER
Stonebrook PER
Stonebrook PER
Ma mère PER
Charlotte PER
Charlotte Johanssen PER
Ma sœur PER
Charlotte PER
Charlotte PER
Charlotte PER
Stonebrook PER
Charlotte PER
Charlotte PER
Charlotte PER
Charlotte PER
Coppernicus PER
Coppélia PER
Bout PER
Charlotte PER
P'tit Bout PER
Bout PER
Charlotte PER
Ma sœur PER
Charlotte PER
Charlotte PER
Mme Braddock PER
Kristy Parker PER
016_13_qu.txt
Jaja PER
Charlotte Jasmin PER
Sophie PER
Jaja PER
Jaja PER
Jaja PER
Montre PER
Jaja PER
Jaja PER
Ma mère PER
Charlotte Jasmin PER
Charlotte PER
Sophie PER
Sophie PER
Charlotte PER
Charlotte PER
Charlotte PER
Charlotte PER
Charlotte PER
Christinel PER
Charlotte PER
Polanski PER
Copemicus PER
Coppélia PER
Jaja PER
Jaja PER
Polanski PER
Charlotte PER
madame Biron PER
Christine Thomas PER
016_14_bg.txt
Coppélia PER
Brinbeuf PER
Franck PER
Mathieu PER
Roseline PER
Charlotte PER
Valérie PER
Mathieu PER
Mélanie PER
Valérie PER
Roseline PER
Gringalet PER
Bruno Lejeune PER
Mélanie PER
Mathieu PER
Brinbeuf PER
Agnès PER
Coppélia PER
Coppélius PER
Prête PER
Prête PER
Justine PER
Agnès PER
Allez PER
Agnès PER
Agnès PER
madame Brinbeuf PER
Madame Brinbeuf PER
Caroline Brinbeuf PER
Agnès PER
Agnès PER
Coppélia PER
Agnès PER
Agnès PER
Mathieu PER
Catherine PER
Catherine PER
Mathieu PER
Catherine PER
Mathieu PER
Maman PER
Adeline PER
Catherine PER
II PER
Agnès PER
Coppelius PER
Franz PER
Coppélia PER
Coppélius PER
Franz PER
Agnès PER
Agnès PER
madame Brinbeuf PER
Agnès PER
Christophe Gélin PER
Franz PER
Mathieu PER
Mathieu PER
Mathieu PER
Mathieu PER
Christophe PER
Mathieu PER
Adeline PER
Catherine PER
Catherine PER
Catherine PER
Adeline PER
Christophe PER
Catherine PER
Mathieu PER
Adeline PER
016_14_frmod.txt
Mme Braddock PER
Mme Noelle PER
Mme Franck PER
Charlotte PER
Mary PER
Anne PER
Mary Anne PER
Claudia PER
P'tit Bout PER
Logan Rinaldi PER
Mary PER
Anne PER
M. Braddock PER
Mme Braddock PER
Helen PER
Coppélia PER
Coppélius PER
Mme Noelle PER
Mme Braddock PER
Helen PER
Helen PER
Bonsoir PER
Mme Braddock PER
Mme Braddock PER
Carolyn Braddock PER
Helen PER
Helen PER
Coppélia PER
Helen PER
Helen PER
Mme Noelle PER
Maman PER
Adèle PER
Mme Braddock PER
Coppélius PER
Franz PER
Coppélia PER
Coppélius PER
Franz PER
Helen PER
Helen PER
Mme Braddock PER
Helen PER
Christopher Gerber PER
Franz PER
Christopher PER
Adèle PER
Adèle PER
Christopher PER
Adèle PER
016_14_qu.txt
Matthieu PER
Charlotte PER
Louis Brunet PER
Jaja PER
Matthieu PER
madame Biron PER
Coppélia PER
madame Biron PER
madame Biron PER
Madame Biron PER
Caroline Biron PER
Adèle PER
Élizabeth PER
Matthieu PER
Adèle PER
II PER
Franz PER
Coppélia PER
Christophe Baril PER
Franz PER
Matthieu PER
Matthieu PER
Matthieu PER
Adèle PER
Matthieu PER
Adèle PER
Christophe PER
Matthieu PER
Adèle PER
016_15_bg.txt
Catherine PER
Adeline PER
Catherine PER
Adeline PER
Catherine PER
Adeline PER
Justine PER
Catherine PER
Catherine PER
Dis PER
Justine PER
Justine PER
Roseline PER
Brinbeuf PER
Agnès PER
Valérie PER
Mathieu PER
Justine PER
Marjorie PER
Marjorie PER
Justine PER
Johanna PER
Johanna PER
Johanna PER
Johanna PER
Johanna PER
Johanna PER
Catherine PER
Adeline PER
Johanna PER
Mathieu PER
Agnès PER
Catherine PER
Johanna PER
Johanna PER
Marjorie PER
Justine PER
Marjorie PER
Marjorie PER
Johanna PER
Johanna PER
Marjorie PER
Johanna PER
Johanna PER
Justine PER
Johanna PER
Marjorie PER
Mathieu PER
Johanna PER
Catherine PER
Adeline PER
Johanna PER
Adeline PER
Mathieu PER
Catherine PER
Catherine PER
Johanna PER
Justine PER
Franz PER
Catherine PER
Catherine PER
Catherine PER
Catherine PER
Mathieu PER
Justine PER
Maman PER
Roseline PER
Johanna PER
Marjorie PER
Brinbeuf PER
Mathieu PER
Agnès PER
Agnès PER
Melba PER
Pêche Melba PER
Justine PER
Julie PER
Mélanie PER
Mathieu PER
Agnès PER
Mathieu PER
Mathieu PER
Agnès PER
Mathieu PER
Mathieu PER
016_15_frmod.txt
Adèle PER
Adèle PER
Adèle PER
Adèle PER
Ah PER
Carla PER
Mary Anne PER
Matthew PER
Mallory PER
J' PER
Keisha PER
Adèle PER
Mme Braddock PER
Mallory » PER
Maman PER
Mes PER
Adèle PER
Keisha PER
Adèle PER
Mme Noelle PER
Franz PER
Bêbête PER
Adèle PER
Adèle PER
Maman PER
Mary Anne PER
Claudia PER
M. PER
Mme Braddock PER
Helen PER
Claudia PER
Claudia PER
Claudia PER
Claudia PER
Mary PER
Anne PER
Helen PER
Helen PER
016_15_qu.txt
Adèle PER
Adèle PER
Adèle PER
Adèle PER
Élizabeth PER
Marjorie PER
Kara PER
Kara PER
Kara PER
Marjorie PER
Kara PER
Kara PER
Élizabeth PER
Kara PER
Kara PER
Kara PER
Marjorie PER
Marjorie PER
Marjorie PER
Kara PER
Kara PER
Marjorie PER
Marjorie PER
Kara PER
Kara PER
Marjorie PER
Matthieu PER
Adèle PER
Kara PER
Matthieu PER
Adèle PER
Élizabeth PER
Franz PER
Adèle PER
Matthieu PER
Viens PER
Gaston PER
Hélène PER
Claudia PER
Claudia PER
Anne-Marie PER
Claudia PER
Hélène PER
Matthieu PER
It’s not perfect: chapter 1 of the Quebec translation flags “mange Julie” (Julie eats) as a person instead of a name and noun. But it’s a lot better.
8. French NER for places#
I moved all the _ner_per.txt files into their own folder, so that spaCy wouldn’t try to run NER on text files of its own NER results, and would instead just use the files with the chapter texts as the objects of investigation.
I changed the code from step 7 (and 5) to replace PER with LOC and ran it again to get location entities.
#Sort all the files in the directory you specified above, alphabetically.
#For each of those files...
for filename in sorted(os.listdir(filedirectory)):
#If the filename ends with .txt (i.e. if it's actually a text files)
if filename.endswith('.txt'):
#Write out below the name of the file
print(filename)
#The file name of the output file adds _ner_lov to the end of the file name of the input file
outfilename = filename.replace('.txt', '_ner_loc.txt')
#Open the infput filename
with open(filename, 'r') as f:
#Create and open the output filename
with open(outfilename, 'w') as out:
#Read the contents of the input file
chaptertext = f.read()
#Do French NLP on the contents of the input file
chapterner = frnlp(chaptertext)
#For each recognized entity
for ent in chapterner.ents:
#If that entity is labeled as a person
if ent.label_ == 'LOC':
#Print the entity, and the label (which should be PER)
print(ent.text, ent.label_)
#Write the entity to the output file
out.write(ent.text)
#Write a newline character to the output file
out.write('\n')
016_01_bg.txt
Espagne LOC
Neuville LOC
Burkina Faso LOC
barre de Neuville LOC
Gringalet LOC
Burkina Faso LOC
France LOC
Neuville LOC
Aubrives LOC
Neuville LOC
Neuville LOC
Noirs LOC
chambre de Gringalet LOC
Gringalet LOC
Gringalet LOC
Aubrives LOC
Franz LOC
Maman LOC
Roseline LOC
Pense LOC
016_01_frmod.txt
Mexique LOC
Connecticut LOC
Oakley LOC
New Jersey LOC
Oakley LOC
Becca LOC
Oakley LOC
New Jersey LOC
Noirs LOC
Stamford LOC
Connecticut LOC
Stonebrook LOC
Stamford LOC
Noirs LOC
Keisha LOC
Mallory LOC
Oakley LOC
Becca LOC
Stamford LOC
Oakley LOC
Franz LOC
Pense LOC
016_01_qu.txt
Mexique LOC
Nouville LOC
Nouville LOC
États-Unis LOC
Oakley LOC
New Jersey LOC
Oakley LOC
Becca LOC
Jean-Philippe LOC
Jessie LOC
Becca LOC
Oakley LOC
Noirs LOC
Blancs LOC
Nouville LOC
Noirs LOC
Noire LOC
Noirs LOC
Lentement LOC
Jaja LOC
Becca LOC
Becca LOC
Nouville LOC
Oakley LOC
Becca LOC
Becca LOC
Noëlle LOC
Becca LOC
Becca LOC
Becca LOC
Becca LOC
016_02_bg.txt
Marjorie LOC
Valérie LOC
Sébastien LOC
Neuville LOC
Marjorie LOC
Sébastien LOC
Coralie LOC
Valérie LOC
Neuville LOC
Mélanie LOC
Valérie LOC
Marjorie LOC
Paris LOC
Carole LOC
Provence LOC
Carole LOC
Carole LOC
Neuville LOC
Provence LOC
Marjorie LOC
Voilà LOC
Carole LOC
Julie LOC
Julie LOC
Brinbeuf LOC
Brinbeuf LOC
Brinbeuf LOC
016_02_frmod.txt
Kristy LOC
Mallory Pike LOC
Kristy LOC
Kristy LOC
Kristy LOC
Mallory LOC
Kristy LOC
Lucy LOC
Stonebrook LOC
Mallory LOC
Kristy LOC
Carla LOC
Mallory LOC
Kristy LOC
Kristy LOC
Kristy LOC
Kristy LOC
Kristy LOC
Kristy LOC
Kristy LOC
Kristy LOC
Kristy LOC
Mallory LOC
Kristy LOC
Kristy LOC
Mallory LOC
Mallory LOC
Mallory LOC
Kristy LOC
Kristy LOC
Kristy LOC
New York LOC
Lucy LOC
Carla LOC
de Californie LOC
Kristy LOC
Carla LOC
Mallory LOC
Californienne LOC
Californie LOC
Mallory LOC
Mallory LOC
Kristy LOC
Carla LOC
Kristy LOC
Bonjour LOC
Kristy LOC
Matthew LOC
Braddock LOC
Matthew LOC
Kristy LOC
Mallory LOC
016_02_qu.txt
Désolée LOC
Nouville LOC
Sébastien LOC
Marjo LOC
Marjo LOC
Croyez LOC
Toronto LOC
de Californie LOC
Californienne LOC
Californie LOC
Marjorie LOC
Marjorie LOC
Marjorie LOC
Donnez LOC
Biron LOC
Nouville LOC
Maijorie LOC
016_03_bg.txt
Mademoiselle Victor LOC
Poupée Chinoise LOC
Poupée Chinoise LOC
Noirs LOC
Franz LOC
Félicitations LOC
Félicitations LOC
Félicitations LOC
Poupée Chinoise LOC
Ouais LOC
Avaient LOC
016_03_frmod.txt
Romsey LOC
Noirs LOC
Franz LOC
Félicitations LOC
Félicitations LOC
Félicitations LOC
Avaient LOC
016_03_qu.txt
Mademoiselle Noëlle LOC
Becca LOC
Mademoiselle Raymond LOC
Mademoiselle LOC
Poupée chinoise LOC
Mademoiselle LOC
Poupée chinoise LOC
Europe LOC
Vais LOC
Franz LOC
Lise LOC
Lise LOC
Poupée chinoise LOC
Élizabeth LOC
Est LOC
016_04_bg.txt
Gringalet LOC
Aubrives LOC
Crois LOC
Brinbeuf LOC
Brinbeuf LOC
Brinbeuf LOC
Brinbeuf LOC
Voudrais LOC
Brinbeuf LOC
Brinbeuf LOC
Brinbeuf LOC
Brinbeuf LOC
Brinbeuf LOC
016_04_frmod.txt
Stamford LOC
Crois LOC
Matthew Braddock LOC
Matthew LOC
Braddock LOC
Braddock LOC
Bonjour LOC
Helen LOC
Matthew LOC
Matthew LOC
Helen LOC
Matthew LOC
Helen LOC
Voudrais LOC
Jessica LOC
Tandis LOC
Matthew LOC
Matthew LOC
Matthew LOC
Matthew LOC
Matthew LOC
Matthew LOC
Helen LOC
Matthew LOC
Matthew LOC
Matthew LOC
Helen LOC
Matthew LOC
Braddock LOC
Matthew LOC
Helen LOC
Matthew LOC
Matthew LOC
Matthew LOC
Helen LOC
016_04_qu.txt
Noëlle LOC
Noëlle LOC
porte des Biron LOC
Biron LOC
Marjorie LOC
Gougeon LOC
Appelle LOC
Jessie LOC
Fantastique LOC
Jessie LOC
J'aimerais LOC
Becca LOC
Crois LOC
Jessie LOC
Merveilleux LOC
016_05_bg.txt
Aurélie LOC
Aurélie LOC
Precisio LOC
Aurélie LOC
Precisio LOC
Aurélie LOC
Aurélie LOC
Aurélie LOC
Aurélie LOC
Hippo LOC
Ecureuil LOC
Aurélie LOC
Mélanie LOC
Mélanie LOC
Aurélie LOC
Aurélie LOC
Brinbeuf LOC
Brinbeuf LOC
Aurélie LOC
Aurélie LOC
J'acquiesçai LOC
Brinbeuf LOC
Eh LOC
Neuville LOC
016_05_frmod.txt
Braddock LOC
Jen LOC
Désolée LOC
– Oui LOC
Matthew LOC
Helen LOC
Braddock LOC
Braddock LOC
Matthew LOC
Matthew LOC
– Matthew LOC
Matthew LOC
Jenny LOC
Matthew LOC
Matthew LOC
Matthew LOC
Matthew LOC
Matthew LOC
Helen LOC
Matthew LOC
Braddock LOC
Helen LOC
Matthew LOC
Stonebrook LOC
016_05_qu.txt
les Prieur LOC
Becca LOC
les Prieur LOC
Serpents LOC
Échelles LOC
Caillou LOC
Biron LOC
Biron LOC
Biron LOC
Biron LOC
Biron LOC
Jessie LOC
Biron LOC
Noire LOC
Nouville LOC
016_06_bg.txt
Brinbeuf LOC
Brinbeuf LOC
Brinbeuf LOC
Brinbeuf LOC
Brinbeuf LOC
Brinbeuf LOC
Brinbeuf LOC
P LOC
P. LOC
P LOC
Roseline LOC
Levêque LOC
Levêque LOC
Marjorie LOC
Levêque LOC
Levêque LOC
Levêque LOC
Godefroid LOC
Marjorie LOC
Ouais LOC
Marseille LOC
Monaco LOC
Marjorie LOC
Marjorie LOC
Levêque LOC
016_06_frmod.txt
Helen LOC
Matthew LOC
Matthew LOC
Matthew LOC
Matthew LOC
– Oui LOC
J LOC
Braddock LOC
Matthew LOC
Matthew LOC
Matthew LOC
Matthew LOC
Matthew LOC
Matthew LOC
Matthew LOC
Matthew LOC
P LOC
P. LOC
Matthew LOC
Matthew LOC
Helen LOC
Matthew LOC
Matthew LOC
Matthew LOC
Matthew LOC
Matthew LOC
N LOC
Pike LOC
Matthew LOC
Mallory LOC
Helen LOC
Matthew LOC
Mallory LOC
Matthew LOC
Nicky LOC
Matthew LOC
Matthew LOC
– Matthew LOC
Matthew LOC
Mallory LOC
Helen LOC
Matthew LOC
Matthew LOC
016_06_qu.txt
Biron LOC
J LOC
E LOC
S LOC
S LOC
I LOC
Regardez LOC
J'ai l'air plus LOC
Becca LOC
Nouville LOC
Becca LOC
N LOC
A LOC
Venez LOC
Maijorie LOC
Maijorie LOC
Patriotes LOC
Soulagées LOC
Marjorie LOC
Marjorie LOC
016_07_bg.txt
Levêque hier LOC
Levêque LOC
Carole LOC
Levêque LOC
Marjorie LOC
Carole LOC
Levêque LOC
Levêque LOC
Carole LOC
Levêque LOC
Levêque LOC
Vanessa LOC
Anaïs LOC
Juliette LOC
Levêque LOC
Loïc LOC
Marjorie LOC
Marjorie LOC
Marjorie LOC
Marjorie LOC
Carole LOC
Levêque LOC
Carole LOC
Marjorie LOC
Carole LOC
Marjorie LOC
Anaïs LOC
Carole LOC
Carole LOC
Tenez LOC
Levêque LOC
Marjorie LOC
Carole LOC
Carole LOC
Carole LOC
Levêque LOC
Loïc LOC
Carole LOC
Levêque LOC
016_07_frmod.txt
Carla LOC
Mallory LOC
Carla LOC
Mallory LOC
Carla LOC
Mallory LOC
Nicky LOC
Carla LOC
Mallory LOC
Vanessa LOC
Margot LOC
Claire LOC
Carla LOC
Mallory LOC
Nicky LOC
Mallory LOC
Carla LOC
Mallory LOC
Carla LOC
Nicky LOC
Mallory LOC
Nicky LOC
Carla LOC
Dites LOC
Margot LOC
Margot LOC
Margot LOC
Carla LOC
Nicky LOC
Mallory LOC
Tenez LOC
Carla LOC
Mallory LOC
Carla LOC
Carla LOC
Nicky LOC
Margot LOC
Carla LOC
Mallory LOC
Margot LOC
Mallory LOC
Carla LOC
Carla LOC
Mallory LOC
Helen LOC
Matthew LOC
Matthew LOC
Matthew LOC
016_07_qu.txt
Maijorie LOC
Margot LOC
Marjorie LOC
Mathurin LOC
Marjorie LOC
Margot LOC
Margot LOC
Arrêtez LOC
Margot LOC
Silence LOC
Stupide LOC
Margot LOC
Marjorie LOC
J'ai peine LOC
016_08_bg.txt
CHI LOC
Victor LOC
Ouais LOC
Explique LOC
A LOC
Adeline LOC
Roseline LOC
Adeline LOC
Adeline LOC
Adeline LOC
Brinbeuf LOC
Adeline LOC
Adeline LOC
Arnaud LOC
Sébastien LOC
Coralie LOC
Coralie LOC
016_08_frmod.txt
Romsey LOC
Assise LOC
J'ai regardé les groupes d'élèves sortir LOC
Helen LOC
Bonjour LOC
Adèle LOC
Essaie LOC
– Oui LOC
Massachusetts LOC
gorge de Matthew LOC
Matthew LOC
Braddock LOC
Matthew LOC
histoire de Matthew LOC
Keisha LOC
Mallory LOC
016_08_qu.txt
Répétition LOC
Noëlle LOC
Pellerin LOC
Élizabeth LOC
Becca LOC
Élizabeth LOC
Élizabeth LOC
Élizabeth LOC
Noëlle LOC
Élizabeth LOC
Surprise LOC
Becca LOC
Élizabeth LOC
Élizabeth LOC
Élizabeth LOC
Élizabeth LOC
Biron LOC
Élizabeth LOC
Élizabeth LOC
Pellerin LOC
Élizabeth LOC
Élizabeth LOC
016_09_bg.txt
Coralie Arnaud LOC
Sébastien LOC
Valérie LOC
Coralie LOC
Julie LOC
Coralie LOC
Coralie LOC
Valérie LOC
Coralie LOC
Coralie LOC
Coralie LOC
Coralie LOC
Coralie LOC
Arnaud LOC
Sébastien LOC
Bonjour Coralie LOC
Bonjour LOC
Coralie LOC
Coralie LOC
Coralie LOC
Coralie LOC
Sébastien LOC
Coralie LOC
Coralie LOC
Sébastien LOC
Dumont LOC
Les Arnaud LOC
Coralie LOC
Julie LOC
Sébastien LOC
Bonjour LOC
Coralie LOC
Coralie LOC
Sébastien LOC
Coralie LOC
Coralie LOC
Arnould LOC
Coralie LOC
Sébastien LOC
Coralie LOC
Sébastien LOC
Coralie LOC
Coralie LOC
Attends LOC
Coralie LOC
J'essayai de le LOC
Coralie LOC
Coralie LOC
Coralie LOC
Coralie LOC
Coralie LOC
Coralie LOC
Appelle LOC
Coralie LOC
Coralie LOC
Coralie LOC
Coralie LOC
Coralie LOC
Julie LOC
Coralie LOC
Coralie LOC
Coralie LOC
Sébastien LOC
016_09_frmod.txt
Kristy LOC
Matthew LOC
Kristy LOC
Karen LOC
Kristy LOC
Lelland LOC
Kristy LOC
Kristy LOC
Kristy LOC
Kristy LOC
Bonjour LOC
Kristy LOC
Bonjour LOC
Bonjour LOC
Kristy LOC
– Donc LOC
Kristy LOC
Papadakis LOC
Lelland LOC
Lego LOC
Lelland LOC
Lelland LOC
Mallory LOC
Appelle LOC
Moosie LOC
Héloïse LOC
Moosie LOC
Tickly LOC
Claudia LOC
016_09_qu.txt
J'ai gardé LOC
Sébastien LOC
les Marchand LOC
vieux Ben LOC
Bonsoir LOC
Lego LOC
Marjorie LOC
Venez LOC
Heureusement LOC
016_10_bg.txt
Brinbeuf LOC
Brinbeuf LOC
Brinbeuf LOC
Neuville LOC
Brinbeuf LOC
Levêque LOC
Levêque LOC
Brinbeuf LOC
Ouais LOC
Parfois LOC
Roseline LOC
Roseline LOC
Neuville LOC
Aubrives LOC
Agnès LOC
A LOC
Roseline LOC
Gringalet LOC
Adeline LOC
016_10_frmod.txt
Braddock LOC
Matthew LOC
Helen LOC
Helen LOC
bus de Matthew LOC
Matthew LOC
Matthew LOC
Helen LOC
Nicky LOC
Matthew LOC
Matthew LOC
Matthew LOC
Braddock LOC
Matthew LOC
– Où LOC
Vanessa LOC
Nicky LOC
Nicky LOC
Matthew LOC
Helen LOC
Matthew LOC
Matthew LOC
Matthew LOC
Matthew LOC
Matthew LOC
Matthew LOC
Nicky LOC
Matthew LOC
Nicky LOC
J'aimerais LOC
Quelquefois LOC
– Matthew LOC
Helen LOC
Matthew LOC
Matthew LOC
Helen LOC
016_10_qu.txt
Biron LOC
Biron LOC
Centre LOC
Les Biron LOC
Barrette LOC
Non! LOC
Explique LOC
Jaja LOC
Quelquefois LOC
Becca LOC
Becca LOC
Becca LOC
Becca LOC
Becca LOC
Ga LOC
enfants Biron LOC
Noëlle LOC
Noëlle LOC
J'ai une idée LOC
Noëlle LOC
016_11_bg.txt
Marjorie LOC
Julie LOC
Julie LOC
Marjorie LOC
Valérie LOC
Marjorie LOC
Valérie LOC
Heureusement LOC
Brinbeuf LOC
Brinbeuf LOC
Provence LOC
Crois LOC
Carole LOC
Gringalet LOC
Provence LOC
Carole LOC
Savez LOC
Ouais LOC
Marjorie LOC
Marjorie LOC
Allô LOC
Aubrives LOC
016_11_frmod.txt
Bonjour LOC
Kristy LOC
Mallory LOC
Désolée LOC
Kristy LOC
Mallory LOC
Kristy LOC
Mallory LOC
Kristy LOC
Carla LOC
Kristy LOC
Kristy LOC
Carla LOC
Kristy LOC
Kristy LOC
Mallory LOC
Kristy LOC
Kristy LOC
Matthew LOC
Matthew LOC
Helen LOC
Matthew LOC
Kristy LOC
Braddock LOC
Carla LOC
Braddock LOC
de Californie LOC
Kristy LOC
Carla LOC
Californie LOC
– Tigrou LOC
Carla LOC
Kristy LOC
Kristy LOC
Kristy LOC
Carla LOC
Kristy LOC
Mallory LOC
Mallory LOC
Braddock LOC
Matthew LOC
Kristy LOC
– Oui LOC
Mallory LOC
J'ai invité LOC
Stamford LOC
016_11_qu.txt
Noëlle LOC
Marjorie LOC
Marjorie LOC
Maijorie LOC
Jessie LOC
de Californie LOC
Penses LOC
Californie LOC
Marjorie LOC
Becca LOC
J'imagine LOC
016_12_bg.txt
Principal LOC
A 13h25 LOC
Aubrives LOC
Brinbeuf LOC
Brinbeuf LOC
Brinbeuf LOC
J'acquiesçai LOC
Brinbeuf LOC
N LOC
Je LOC
016_12_frmod.txt
Stamford LOC
Matthew LOC
Matthew LOC
Matthew LOC
Matthew LOC
professeur de Matthew LOC
classe de Matthew LOC
Braddock LOC
Matthew LOC
Matthew LOC
Matthew LOC
Matthew LOC
Matthew LOC
Matthew LOC
Matthew LOC
Matthew LOC
016_12_qu.txt
France LOC
France LOC
France LOC
France LOC
France LOC
France LOC
S LOC
S LOC
I LOC
J'épelle B LOC
A LOC
L LOC
L LOC
E LOC
France LOC
France LOC
Matthieu LOC
de France LOC
France LOC
France LOC
016_13_bg.txt
Bonjour LOC
Gringalet LOC
Brinbeuf LOC
Gringalet LOC
Aubrives LOC
Gringalet LOC
Valérie LOC
Valérie LOC
J'ai le Livre du Chat LOC
Gringalet LOC
Ouais LOC
Valérie LOC
chambre de Gringalet LOC
Roseline LOC
Gringalet LOC
Roseline LOC
Gringalet LOC
Valérie LOC
Gringalet LOC
Aubrives LOC
Neuville LOC
Roseline LOC
Neuville cimente LOC
Gringalet LOC
Valérie LOC
Gringalet LOC
Gringalet LOC
Brinbeuf LOC
016_13_frmod.txt
Braddock LOC
Kristy LOC
Stamford LOC
Kristy LOC
J'ai le Livre du chat LOC
Kristy LOC
Kristy LOC
Kristy LOC
Kristy LOC
Kristy LOC
Kristy LOC
Kristy LOC
Kristy LOC
Matthew LOC
– Oui LOC
– Oui LOC
Kristy LOC
maison de LOC
Lucy LOC
Kristy LOC
Kristy LOC
Kristy LOC
Kristy LOC
Jessica LOC
Kristy LOC
Kristy LOC
Kristy LOC
– Oui LOC
Kristy LOC
016_13_qu.txt
Becca LOC
Biron LOC
J'ai eu beaucoup de plaisir LOC
Becca LOC
Maman LOC
Becca LOC
Oakley LOC
Becca LOC
Becca LOC
Becca LOC
Becca LOC
Becca LOC
Becca LOC
Nouville LOC
Becca LOC
Becca LOC
Becca LOC
Becca LOC
Becca LOC
Becca LOC
Becca LOC
Becca LOC
Becca LOC
Becca LOC
Becca LOC
Becca LOC
Jaja LOC
016_14_bg.txt
J'avais gardé LOC
Marjorie LOC
Aubrives LOC
Valérie LOC
Carole LOC
Mélanie LOC
Marjorie LOC
Devinez LOC
Brinbeuf LOC
Bonsoir LOC
Brinbeuf LOC
Aubrives LOC
Adeline LOC
Franz LOC
Brinbeuf LOC
Brinbeuf LOC
A LOC
016_14_frmod.txt
Matthew LOC
Mallory LOC
Kristy LOC
Stamford LOC
Matthew LOC
Kristy LOC
New Jersey LOC
Kristy LOC
Carla LOC
Mallory LOC
Devinez LOC
Matthew LOC
Excuse LOC
– Oui LOC
Helen LOC
Stamford LOC
Matthew LOC
– Adèle LOC
Matthew LOC
Matthew LOC
J'ai entendu Helen LOC
Franz LOC
Matthew LOC
Matthew LOC
Matthew LOC
Matthew LOC
Matthew LOC
016_14_qu.txt
Biron LOC
France LOC
Noëlle LOC
Becca LOC
J'ai gardé LOC
Biron LOC
Noëlle LOC
Merci LOC
Noëlle LOC
Arrivées LOC
Nouville LOC
Noëlle LOC
Élizabeth LOC
J'entends des reniflements et je vois LOC
016_15_bg.txt
Adeline LOC
Bonjour LOC
Félicitations LOC
Carole LOC
Mélanie LOC
Avais LOC
Marjorie LOC
Marjorie LOC
Marjorie LOC
France LOC
Brinbeuf LOC
France LOC
Marjorie LOC
Maman LOC
Adeline LOC
Valérie LOC
Carole LOC
Mélanie LOC
Celles LOC
Brinbeuf LOC
Chantilly LOC
016_15_frmod.txt
Adèle LOC
Félicitations LOC
Braddock LOC
Helen LOC
Kristy LOC
Claudia LOC
Bonjour LOC
Keisha LOC
Keisha LOC
Mallory LOC
Mallory LOC
Keisha LOC
Keisha LOC
Matthew LOC
Helen LOC
Keisha LOC
Keisha LOC
Mallory LOC
cousine de Jessica LOC
Mallory LOC
Keisha LOC
– LOC
Keisha LOC
Mallory LOC
Keisha LOC
Mallory LOC
Mallory LOC
Keisha LOC
Keisha LOC
Mallory LOC
Matthew LOC
Mallory LOC
Keisha LOC
– Oui LOC
J'ai vu Matthew LOC
Adèle LOC
Keisha LOC
Matthew LOC
Keisha LOC
Mallory LOC
Kristy LOC
Carla LOC
Matthew LOC
Helen LOC
Celles LOC
Braddock LOC
Chantilly LOC
Kristy LOC
Matthew LOC
Matthew LOC
Matthew LOC
016_15_qu.txt
Surprise LOC
Élizabeth LOC
Bonjour LOC
Félicitations LOC
Félicitations LOC
Marjorie LOC
Jessie LOC
Kara LOC
Marjorie LOC
Marjorie LOC
Regardez LOC
Élizabeth LOC
Élizabeth LOC
Noëlle LOC
Élizabeth LOC
Skimming the results, it was interesting to see how much worse it performed than the person entity recognition. It feels like a minority of the results are legit places, and most of the results are people’s names.
9. French NER for orgs#
I was curious what I’d get by looking for entities flagged as organizations. I mean, this entire book series is about an organization: would that get flagged correctly? (Once again, I moved the _ner_loc.txt files into their own folder first.)
The verdict: Club des Baby (France-French translation) and Club des baby (Quebec translation) get marked as organizations; the “sitters” gets lost when “Baby-sitters” gets separated at the hyphen. There’s also various things that are most definitely not organizations that get tagged, like “Bonjour!” in Belgian ch. 10, or “PLIÉ” in Quebec ch. 3 (maybe spaCy thought it was an acronym and not a yelling dance teacher?)
#Sort all the files in the directory you specified above, alphabetically.
#For each of those files...
for filename in sorted(os.listdir(filedirectory)):
#If the filename ends with .txt (i.e. if it's actually a text files)
if filename.endswith('.txt'):
#Write out below the name of the file
print(filename)
#The file name of the output file adds _ner_org to the end of the file name of the input file
outfilename = filename.replace('.txt', '_ner_org.txt')
#Open the infput filename
with open(filename, 'r') as f:
#Create and open the output filename
with open(outfilename, 'w') as out:
#Read the contents of the input file
chaptertext = f.read()
#Do French NLP on the contents of the input file
chapterner = frnlp(chaptertext)
#For each recognized entity
for ent in chapterner.ents:
#If that entity is labeled as a person
if ent.label_ == 'ORG':
#Print the entity, and the label (which should be PER)
print(ent.text, ent.label_)
#Write the entity to the output file
out.write(ent.text)
#Write a newline character to the output file
out.write('\n')
016_01_bg.txt
Neuville ORG
016_01_frmod.txt
Club des Baby-Sitters ORG
– Peut ORG
– Allez ORG
Club des Baby-Sitters ORG
016_01_qu.txt
Club des baby-sitters ORG
Club des baby-sitters ORG
016_02_bg.txt
Mélanie ORG
016_02_frmod.txt
– Salut ORG
Club des Baby-Sitters ORG
Lucy MacDouglas ORG
Club des Baby-Sitters ORG
Tigrou ORG
Club des Baby-Sitters ! ORG
– Oh ORG
– Eh ORG
016_02_qu.txt
Bonjour! ORG
Club des baby ORG
Club des baby ORG
Club des baby-sitters ORG
Marjorie ORG
Club des baby ORG
016_03_bg.txt
PLIE ORG
016_03_frmod.txt
– Ouais ORG
016_03_qu.txt
MOI! ORG
016_04_bg.txt
016_04_frmod.txt
– Formidable ! ORG
– Est ORG
– Existe ORG
016_04_qu.txt
016_05_bg.txt
Brinbeuf! ORG
Mélanie ORG
Mélanie ORG
Salut! ORG
INFECT ORG
016_05_frmod.txt
Prezzioso ORG
Prezzioso ORG
– Maintenant ORG
– Salut ! ORG
– Est ORG
016_05_qu.txt
ENTENDS ORG
TU ORG
016_06_bg.txt
Bonjour! ORG
LAURENT ORG
Anaïs ORG
016_06_frmod.txt
Bonjour ! ORG
– Est ORG
016_06_qu.txt
Observer ORG
016_07_bg.txt
Majorie ORG
Anaïs ORG
Anaïs ORG
Anaïs ORG
016_07_frmod.txt
Jordan ORG
Jordan ORG
– Eh ORG
– Eh ORG
Jordan ORG
– Mallory ORG
– Peut ORG
Jordan ORG
Jordan ORG
016_07_qu.txt
016_08_bg.txt
PER ORG
FEC ORG
016_08_frmod.txt
– Ouais ORG
– Oh ORG
– Eh ORG
– Est ORG
– Eh ORG
– Comment ORG
– Oh ORG
– Est ORG
016_08_qu.txt
016_09_bg.txt
Edith ORG
016_09_frmod.txt
Edith ORG
– Salut ORG
– Salut ! ORG
– Est ORG
Club des Baby-Sitters ORG
– Oh ORG
016_09_qu.txt
Assez de fantômes et de sorcières pour la soirée ORG
016_10_bg.txt
Bonjour! ORG
016_10_frmod.txt
Nicky ORG
Jordan ORG
– Eh ORG
– Ouais ! ORG
016_10_qu.txt
Hélène sourit ORG
016_11_bg.txt
Mélanie ORG
016_11_frmod.txt
– Nous ORG
– Ouais ORG
Club des Baby-Sitters ORG
016_11_qu.txt
Perplexe ORG
Club des baby ORG
016_12_bg.txt
IC ORG
016_12_frmod.txt
– Eh ORG
016_12_qu.txt
J-E ORG
016_13_bg.txt
016_13_frmod.txt
Lucy MacDouglas ORG
– Ouais ! ORG
Club des Baby ORG
– Dis ORG
– Oh ORG
Lucy MacDouglas ORG
– Salut ORG
– Salut ORG
– Est ORG
016_13_qu.txt
Mercredi
Salut ORG
Club des baby-sitters ORG
Miss Nouville ORG
016_14_bg.txt
016_14_frmod.txt
– Allez ORG
016_14_qu.txt
016_15_bg.txt
Oh! ORG
016_15_frmod.txt
– Oh ORG
Keisha ORG
– Chérie ORG
Charley ORG
– Est ORG
016_15_qu.txt
Fleur bleue ORG
10. French NER for misc#
The French entity model also has a “MISC” type, so for the sake of completeness, I couldn’t not try it. And the results are as advertised. Lots of names. Lots of “Ça”. There’s a “Tu es atroce” (You’re excruciating) from Ch. 5 of the France French version. “Le Langage Secret” in Ch. 6 of the Belgian translation gets flagged, and “P’tit” makes an appearance more than once. Only in the France French version do “Noirs” (Black people) and “Noire de mon école” (Black person in my school) get flagged.
(July 19, 2021 note: this is clearly another area where there’s been a lot of improvement between spaCy 2 and spaCy 3. There’s still some weird stuff, like “Oh!” and “Salut”, but overall a lot fewer results and a lot less garbage.)
#Sort all the files in the directory you specified above, alphabetically.
#For each of those files...
for filename in sorted(os.listdir(filedirectory)):
#If the filename ends with .txt (i.e. if it's actually a text files)
if filename.endswith('.txt'):
#Write out below the name of the file
print(filename)
#The file name of the output file adds _ner_misc to the end of the file name of the input file
outfilename = filename.replace('.txt', '_ner_misc.txt')
#Open the infput filename
with open(filename, 'r') as f:
#Create and open the output filename
with open(outfilename, 'w') as out:
#Read the contents of the input file
chaptertext = f.read()
#Do French NLP on the contents of the input file
chapterner = frnlp(chaptertext)
#For each recognized entity
for ent in chapterner.ents:
#If that entity is labeled as a person
if ent.label_ == 'MISC':
#Print the entity, and the label (which should be PER)
print(ent.text, ent.label_)
#Write the entity to the output file
out.write(ent.text)
#Write a newline character to the output file
out.write('\n')
016_01_bg.txt
J'ai une barre MISC
J'ai gagnées MISC
Je sais que MISC
Ouagadougou MISC
Mes grands MISC
J'ai rencontré MISC
Ouagadougou MISC
Bonjour MISC
J'aime manger MISC
C'était de ne pas faire de bêtises MISC
J'aimais tout de même donner le meilleur de moi lors des auditions MISC
C'est un ballet MISC
Swanilda MISC
Swanilda MISC
Swanilda MISC
Swanilda MISC
Merci MISC
Dillon MISC
J'ai une réunion au club des baby-sitters MISC
016_01_frmod.txt
J'ai une barre MISC
C'est une des choses MISC
Rebecca MISC
Je sais que cela paraît bizarre de l'annoncer comme ça MISC
Naturellement MISC
Rebecca MISC
Mes grands MISC
J'ai rencontré MISC
Rebecca MISC
J'ai donc déposé MISC
Aah MISC
Rebecca MISC
C'était de ne pas faire de bêtises MISC
Rebecca MISC
C'est un ballet célèbre MISC
Ça MISC
Rebecca MISC
J'ai continué MISC
Swanilda MISC
Swanilda MISC
Swanilda MISC
Swanilda MISC
Rebecca MISC
Merci MISC
Rebecca MISC
Rebecca MISC
J'ai une audition cet après-midi MISC
Rebecca MISC
J'ai une réunion MISC
Rebecca MISC
016_01_qu.txt
J'ai presque appris à parler l'espagnol MISC
Laissez MISC
Rebecca MISC
Je sais que ça semble drôle de mentionner MISC
Mes grands MISC
J'ai connu MISC
C'est le souper MISC
Jessie MISC
J'aie été acceptée dans la classe avancée MISC
Dr Coppélius MISC
Swanilda MISC
Swanilda MISC
Dr Coppélius MISC
Swanilda MISC
Papa et maman MISC
C'est la course pour ne pas être en retard MISC
J'ai une réunion MISC
016_02_bg.txt
Salut! MISC
Excusez MISC
Je sais pourquoi C' MISC
Nicolas MISC
Julie MISC
Nicolas MISC
Julie MISC
Julie MISC
Julie MISC
Julie MISC
Julie MISC
Julie MISC
Julie MISC
Mélanie MISC
Merci MISC
Julie MISC
Oh MISC
Merci MISC
016_02_frmod.txt
Excusez MISC
J'ai compris MISC
C'est Kristy MISC
Je sais pourquoi Kristy MISC
Je sais MISC
Charlie MISC
Andrew MISC
Jim est millionnaire ! MISC
C'est des appareils dentaires MISC
Toujours MISC
Logan MISC
Mallory et moi MISC
Merci MISC
– Hé MISC
Jessica MISC
– Merci ! MISC
016_02_qu.txt
Ça MISC
Christine MISC
J'ai un certain talent MISC
Je m'éloigne MISC
Laissez MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Une fois par semaine MISC
Je sais pourquoi MISC
Christine MISC
Je sais... MISC
Christine MISC
André MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Agenda MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Laissez MISC
Merci MISC
Christine MISC
Hélène MISC
Matthieu MISC
Christine MISC
Popote Roulante MISC
Christine MISC
Jessie! MISC
016_03_bg.txt
Dillon MISC
Dillon MISC
J'ai eue la première fois MISC
J'ai dansé MISC
Dillon MISC
Hélène MISC
Dillon MISC
Danse des Heures MISC
Hélène MISC
C'était humiliant! MISC
J'étais si occupée à m'apitoyer sur moi- MISC
Dillon MISC
Swanilda MISC
Swanilda MISC
Dillon MISC
MOI! MISC
Swanilda MISC
Dillon MISC
Swanilda MISC
costume de Swanilda MISC
J'ôtai mes chaussons et mes jambières MISC
Merci MISC
Oh, arrête! MISC
J'étais en train d'enfiler mon pull MISC
Dillon MISC
Dillon MISC
016_03_frmod.txt
J'ai eue la première fois MISC
J'ai dansé MISC
Hilary MISC
Katie MISC
Hilary MISC
Katie MISC
Hilary MISC
Katie MISC
Mon cœur battait la chamade MISC
Hilary MISC
Katie MISC
Valse des heures MISC
Katie MISC
C'était humiliant ! MISC
Hilary MISC
Katie MISC
XIXe MISC
J'étais si occupée à m'apitoyer sur mon MISC
J'ai failli MISC
Swanilda MISC
– Swanilda MISC
Mlle Jessica Romsey MISC
Moi ! MISC
Swanilda MISC
Jessica MISC
Swanilda MISC
costume de Swanilda MISC
J'ai cherché mon jean MISC
Jessica MISC
– Merci MISC
Jessica MISC
Le ton était ironique MISC
Lisa MISC
Oh MISC
C'est merveilleux MISC
Katie MISC
J'ai décidé d'adopter MISC
Katie MISC
Katie MISC
Hilary MISC
J'ai entendu Hilary MISC
Katie MISC
Katie MISC
Hilary MISC
Coppélia MISC
J'ai quitté le cours MISC
016_03_qu.txt
PLIÉ MISC
C'était un bâton de golf MISC
Coppélia MISC
J'ai eu mes premières pointes MISC
J'ai senti MISC
Et un et deux et trois MISC
Ça MISC
Danse des heures MISC
Elizabeth MISC
Je suis tellement absorbée par ma déception MISC
personnage de Swanilda MISC
C'est moi! MISC
Swanilda MISC
Jessica MISC
Jessica MISC
J'ai l'impression de marcher MISC
Jessie MISC
Jessie MISC
016_04_bg.txt
Dillon MISC
Dillon MISC
Merci MISC
Bonjour MISC
lettre C MISC
J'ai compris! MISC
Ça MISC
Ça MISC
J'aimais la langue des signes MISC
016_04_frmod.txt
C'est impossible MISC
– Merci MISC
Hilary MISC
Katie MISC
J'ai pensé MISC
La langue des signes MISC
Une fillette MISC
J'ai été accueillie MISC
– Parce MISC
Rebecca MISC
lettre C MISC
J'ai compris ! MISC
Ça MISC
– Ah MISC
Jessica MISC
– Ça MISC
C'est le M de Matthew MISC
J'ai feuilleté le dictionnaire dans mon lit MISC
016_04_qu.txt
Les Petits Chaussons est la meilleure de toute la région? me demande-t-elle MISC
Penses MISC
Matthieu MISC
J'ai une réunion du club MISC
Oh! MISC
Hélène MISC
Jessie MISC
Hélène MISC
Hélène MISC
Jessica MISC
Hélène MISC
Hélène MISC
Hélène MISC
Hélène MISC
Hélène MISC
Hélène MISC
Hélène MISC
C. Ça MISC
Clé MISC
Ça MISC
Matthieu MISC
lettre M MISC
Oh! MISC
Jessie MISC
Hélène MISC
lettre H MISC
comète de Halley MISC
Hélène MISC
Hélène MISC
Hélène MISC
J'ai encore MISC
016_05_bg.txt
Mélanie MISC
Mélanie MISC
Ça MISC
Mélanie MISC
Mélanie MISC
C'était nouveau pour moi et eux MISC
Partons MISC
Mélanie MISC
C'est faux MISC
Ah! MISC
016_05_frmod.txt
Mercredi
Bon MISC
Jenny MISC
Jessica et les petits MISC
Jenny MISC
Jenny MISC
Jenny MISC
Jenny MISC
– Bon MISC
– Bon MISC
Jenny MISC
– Bon MISC
– Je MISC
Jenny MISC
C'était une sage petite promenade MISC
C'était nouveau pour moi et eux MISC
– Partons MISC
Atroce ! MISC
C'est faux MISC
Ah ! MISC
016_05_qu.txt
Jessie et les Biron MISC
Ça MISC
Hélène MISC
Hélène MISC
Matthieu MISC
Hélène et moi MISC
Matthieu MISC
Hélène MISC
Hélène et Matthieu MISC
Hélène MISC
016_06_bg.txt
J'ai aussi pensé MISC
V MISC
Merci MISC
C'était regarder la télé sans le son MISC
Godefroid MISC
Elodie MISC
Le Langage Secret MISC
Tes frères MISC
016_06_frmod.txt
J'ai remarqué MISC
– Sois MISC
J'ai aussi pensé MISC
V MISC
C'est un J MISC
– Bon MISC
Jessica MISC
– Merci MISC
Regarder MISC
C'était comme regarder MISC
J'ai compris MISC
Rebecca MISC
Rebecca MISC
Pike MISC
– Y MISC
J'ai épelé son nom MISC
Liz MISC
Pike MISC
Signes MISC
Le Langage secret MISC
– Waouh MISC
Vanessa MISC
Pike MISC
J'ai seulement compris MISC
Patriots MISC
Super MISC
– Qu' MISC
Pike MISC
016_06_qu.txt
Hélène MISC
Et MISC
Hélène MISC
Oh! MISC
signe J MISC
Jessie MISC
V MISC
Un J qui danse! MISC
J'ai mémorisés MISC
Matthieu MISC
Hélène MISC
Jessie MISC
Hélène et moi MISC
J'aime les langues? MISC
Super! MISC
Hélène MISC
Je marche vers la maison MISC
Hélène MISC
Hélène MISC
Hélène MISC
Matthieu MISC
Hélène MISC
Hélène et Matthieu MISC
Hélène MISC
Nicolas MISC
Génial! MISC
Vanessa MISC
Hélène MISC
Hélène MISC
Hélène MISC
Hélène MISC
Hélène et Matthieu MISC
Hélène MISC
016_07_bg.txt
Pas moi! MISC
C'était une des babysitters MISC
Vanessa MISC
Juliette: 5 ans MISC
C'était la baby-sitter MISC
Levêque sur les chaises MISC
J'ai perdu ma mie de pain MISC
Trop MISC
Vanessa MISC
Allons MISC
Swanilda MISC
Je brûlai d'impatience MISC
016_07_frmod.txt
Oh non ! MISC
Pike MISC
Pike MISC
Pike MISC
– Bon MISC
Pike MISC
Pike MISC
Pike MISC
Pike MISC
Trop MISC
Mallory ! MISC
– Moui MISC
Ça MISC
Pike MISC
– Allons MISC
Carla MISC
J'ai eu un coup au cœur MISC
Swanilda MISC
Pike MISC
016_07_qu.txt
Vendredi
MISC
Jessie MISC
Jessie MISC
Journal de bord ici MISC
Bernard MISC
Antoine MISC
Joël MISC
Vanessa MISC
Nicolas MISC
Ça MISC
C'est avant la comptine du ver de terre MISC
J'ai perdu ma boulette de viande MISC
Joël MISC
Pouah MISC
Bernard MISC
Joël MISC
Nicolas MISC
Nicolas MISC
Nicolas MISC
Nicolas MISC
Bernard MISC
Oh non! MISC
Nicolas MISC
Ça MISC
Nicolas MISC
Nicolas MISC
J'ai dit MISC
Nicolas MISC
Ça MISC
Bernard MISC
Joël MISC
Jessie MISC
Hélène et Matthieu MISC
Swanilda MISC
Ça MISC
Matthieu MISC
Et ça veut surtout dire MISC
016_08_bg.txt
Swanilda MISC
Dillon MISC
Swanilda MISC
Dillon MISC
C'était Catherine accompagnée MISC
Hélène MISC
Etait MISC
Nos regards MISC
Bonjour Adeline MISC
Je regardai ma montre MISC
Dillon MISC
Je souris MISC
C'était un martien MISC
Catherine MISC
Oh MISC
Je méditai MISC
Je pensai MISC
Oh, allez MISC
C'était Adeline MISC
J'ai fé une garde MISC
Nicolas MISC
016_08_frmod.txt
Swanilda MISC
Katie MISC
Swanilda MISC
Katie MISC
Hilary MISC
– Bon travail MISC
Katie MISC
J'ai pris mon temps MISC
J'ai remarqué MISC
Katie MISC
J'ai levé la tête MISC
Katie MISC
Katie MISC
Rebecca MISC
Nos regards MISC
Katie MISC
Katie MISC
J'ai compris MISC
J'ai souri MISC
Katie MISC
À ma plus grande surprise MISC
Katie MISC
C'était un insecte nuisible MISC
Katie MISC
Rebecca MISC
Katie MISC
J'ai remarqué MISC
Katie MISC
Katie MISC
Katie MISC
Katie MISC
– Qu' MISC
Katie MISC
Katie MISC
– Qu' MISC
Katie MISC
Katie MISC
016_08_qu.txt
Mes os me MISC
Mes muscles me font mal MISC
Être Swanilda MISC
Élizabeth MISC
Merci MISC
J'ai travaillé MISC
Swanilda MISC
Élizabeth MISC
Élizabeth MISC
J'ai encore besoin de nouvelles pointes MISC
Je suis certaine MISC
Matthieu MISC
Allons MISC
Super MISC
Hélène MISC
Élizabeth MISC
Merci! MISC
Ça MISC
Mes seules meilleures amies sont ma cousine MISC
016_09_bg.txt
J'ai fé une garde MISC
Nicolas MISC
Coraline MISC
Coraline MISC
C'était sûr et je m'en réjouissais MISC
Julie MISC
Julie MISC
Revenons MISC
Julie MISC
Rensonnet MISC
Vieille Sorcière MISC
Julie MISC
Nicolas MISC
Julie MISC
Nicolas MISC
Valérie! MISC
Julie MISC
Julie MISC
Julie MISC
Bonjour Julie MISC
Merci MISC
Super MISC
Julie MISC
Julie MISC
Julie MISC
Bonjour MISC
Julie MISC
Julie MISC
Ça MISC
Julie MISC
Julie MISC
Julie MISC
Julie MISC
Julie MISC
Julie MISC
C'est possible MISC
Julie MISC
Julie MISC
Julie MISC
J'étais heureuse MISC
Julie MISC
C'était Coralie MISC
Je souris MISC
Julie MISC
Julie MISC
Merci Julie MISC
Julie MISC
Oh MISC
C'était juste pour rire MISC
Julie MISC
J'ai peur MISC
Julie MISC
HOU MISC
Oh MISC
Julie MISC
Bonne nuit MISC
Julie MISC
Doudou MISC
Flocon MISC
Julie MISC
Julie MISC
Doudou MISC
Flocon MISC
Bonne nuit MISC
Julie MISC
Julie MISC
Julie MISC
016_09_frmod.txt
J'ai fait une garde MISC
Charlie MISC
J'ai décidé MISC
Andrew MISC
Revenons MISC
Andrew MISC
Andrew MISC
Karen MISC
– Merci MISC
Andrew MISC
– Bon MISC
Super ! MISC
Merci ! MISC
Ça MISC
Andrew MISC
– Bon MISC
Andrew MISC
Andrew MISC
– Bof ! MISC
Andrew MISC
– Si MISC
C'est possible MISC
Rebecca MISC
J'ai dit :
– Attends MISC
J'étais heureuse MISC
– Qu' MISC
– Oui ! MISC
Andrew MISC
– Merci MISC
Claudia ! MISC
Andrew MISC
Andrew MISC
Andrew MISC
C'était juste pour rire MISC
J'ai peur MISC
Andrew MISC
Bonne nuit MISC
Andrew MISC
Tiens MISC
Jessica MISC
Andrew MISC
Andrew MISC
016_09_qu.txt
Jessie MISC
André MISC
David MISC
Christine MISC
J'ai décidé de leur enseigner MISC
Christine MISC
André MISC
Christine MISC
Christine MISC
André MISC
Christine MISC
André MISC
André MISC
David MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
André MISC
Christine MISC
André MISC
David MISC
Merci! MISC
Christine MISC
André MISC
Les Marchand MISC
André MISC
David MISC
J'ai mis le pâté fantôme MISC
André MISC
André MISC
Ça MISC
Ça MISC
André MISC
David MISC
André MISC
Ça MISC
C'est «chat» MISC
Christine MISC
André MISC
André MISC
André MISC
André MISC
Salut MISC
André MISC
Bou! MISC
Bonne nuit MISC
André MISC
La sorcière d'à côté MISC
Jessie MISC
André MISC
Je t'aime» MISC
016_10_bg.txt
C'était mon seul après-midi de libre MISC
Je souris MISC
J'allais MISC
C'est formidable MISC
Oh MISC
J'ai envie de danser MISC
Je souris MISC
J'aimerais... MISC
J'aimerais MISC
Je fus un peu surprise de ce MISC
C'était sensé MISC
Noël MISC
‘Dada Bo MISC
C'est dommage! MISC
Dillon MISC
Dillon MISC
C'était simplement très excitant d'être au théâtre MISC
Dillon MISC
016_10_frmod.txt
C'était mon seul après-midi de libre MISC
Pike MISC
Un jour MISC
Nicky Pike MISC
– Qu' MISC
– Parce MISC
J'ai envie de faire de la danse MISC
J'aimerais MISC
Jessica MISC
C'était normal MISC
– Tu sais MISC
Rebecca MISC
Rebecca MISC
P'tit Bout MISC
Rebecca MISC
Rebecca MISC
Noël MISC
Rebecca MISC
Rebecca MISC
Rebecca MISC
P'tit Bout MISC
C'est dommage ! me suis-je exclamée MISC
Et un… et deux… et trois… et quatre » MISC
J'ai commencé à y réfléchir MISC
016_10_qu.txt
Hélène et moi prenons MISC
Matthieu MISC
Hélène MISC
Vanessa MISC
Nicolas MISC
Vanessa MISC
Matthieu MISC
Nicolas MISC
Été! MISC
Nicolas MISC
Hélène MISC
Nicolas MISC
Hélène MISC
Hélène et moi MISC
Nicolas MISC
J'allais dire MISC
Nicolas MISC
Hélène et moi MISC
C'est heureux MISC
Hélène MISC
Hélène MISC
soeur de MISC
Hélène MISC
C'est ça MISC
Hélène MISC
Hélène MISC
fond Matthieu MISC
Matthieu MISC
Nicolas MISC
Antoine MISC
Nicolas MISC
Antoine MISC
Hélène MISC
Jessie MISC
Je suis surprise des paroles MISC
Hélène MISC
J'ai la peau noire MISC
Hélène MISC
C'est une famille MISC
Noël MISC
Becca MISC
Ça MISC
Hélène MISC
Hélène MISC
J'ai lu récemment MISC
Hélène MISC
Coppélia' MISC
Hélène MISC
016_11_bg.txt
Dillon MISC
Julie MISC
Dillon MISC
Bonjour MISC
Excusez MISC
Julie MISC
Julie MISC
Julie MISC
Je souris MISC
Julie MISC
Nicolas MISC
Julie MISC
Je m'étonnai moi- MISC
Julie MISC
lettre J MISC
Mélanie MISC
Nicolas MISC
Julie MISC
C'est formidable MISC
J'ai invité MISC
016_11_frmod.txt
C'était en bonne voie MISC
– Bon MISC
Jessica ? MISC
J'ai essayé de parler MISC
J'ai raconté ma conversation MISC
J'ai posé MISC
lettre J MISC
Logan ! MISC
Logan MISC
– Logan MISC
Jessica MISC
P'tit Bout MISC
Logan MISC
Hum ! MISC
Oh MISC
C'est formidable MISC
016_11_qu.txt
J'ai les cheveux coiffés en un chignon MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Matthieu MISC
Hélène MISC
J'ai eue MISC
Hélène MISC
grande soeur de MISC
Christine MISC
Christine MISC
lettre J MISC
Christine MISC
Christine MISC
Christine MISC
Dring MISC
Christine MISC
Je suis sans doute MISC
Bonjour MISC
C'est merveilleux! MISC
Christine MISC
J'ai promis MISC
J'ai quelque MISC
Cappella MISC
016_12_bg.txt
J'étais d'accord MISC
‘costume'? MISC
classe de Mathieu MISC
Merci MISC
Ah bon. MISC
Vendredi prochain MISC
C'était le signal pour les enfants MISC
Merci MISC
016_12_frmod.txt
J'ai dû MISC
– Prête MISC
Jessica ? MISC
Jessica MISC
J'ai jeté MISC
– Jessica MISC
J'ai vu une réaction sur certains visages MISC
C'est bien MISC
– Oui ! MISC
– Vendredi prochain MISC
J'ai remarqué MISC
Jessica MISC
Merci MISC
016_12_qu.txt
Oh! MISC
Je me rends bien vite MISC
Stationnement de l'École MISC
Ça MISC
Jessie MISC
classe de Matthieu MISC
Jessie MISC
Merci MISC
Ça MISC
J'ai un rôle dans cette danse MISC
Matthieu MISC
Vendredi prochain MISC
Matthieu MISC
Matthieu MISC
Soudain MISC
Merci MISC
016_13_bg.txt
J'ai passé un bon moment MISC
J'ai compris MISC
Aga! MISC
Oh! MISC
Oh! MISC
Merci! MISC
Ah! MISC
J'ai déménagé MISC
Japonaise MISC
Oh, super! MISC
Sophie MISC
Bonjour Valérie MISC
Oh! MISC
016_13_frmod.txt
Mercredi
Salut MISC
Rebecca MISC
J'ai passé un bon moment MISC
Rebecca MISC
J'ai compris MISC
Rebecca MISC
Quand maman MISC
Rebecca MISC
Rebecca MISC
Rebecca MISC
– Viens MISC
Rebecca MISC
Rebecca MISC
Rebecca MISC
Rebecca MISC
P'tit Bout MISC
Rebecca MISC
Rebecca MISC
Merci ! MISC
Rebecca MISC
Rebecca MISC
Ça MISC
Rebecca MISC
J'ai déménagé MISC
Japonaise MISC
Rebecca MISC
Rebecca MISC
Rebecca MISC
Rebecca MISC
Rebecca MISC
Rebecca MISC
Rebecca MISC
Jessica MISC
Rebecca MISC
– Allons MISC
Jessica MISC
Jessica MISC
Rebecca MISC
Rebecca MISC
– Rebecca MISC
Jessica MISC
Oh ! MISC
Rebecca MISC
016_13_qu.txt
Jessie MISC
Jessie MISC
J'ai senti MISC
Qu' MISC
Christine MISC
C'était la première fois qu'elle y allait MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Becca MISC
Christine MISC
Christine MISC
Christine MISC
Merci! MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Matthieu MISC
Becca MISC
J'ai emménagé dans un quartier MISC
Japonaise MISC
Christine MISC
C'était même difficile pour elle de venir jouer MISC
Jessie MISC
Jessie MISC
Jessie MISC
Christine MISC
Jessie MISC
Christine MISC
Ça MISC
Christine MISC
Christine MISC
Christine MISC
016_14_bg.txt
Dieu MISC
Dillon MISC
Je fus flattée MISC
Julie MISC
C'était monsieur Brinbeuf MISC
Swanilda MISC
Excusez MISC
Dillon MISC
Oh, mille fois merci MISC
Dillon MISC
Merci MISC
Je suis Swanilda MISC
Dillon MISC
Dillon MISC
Swanilda MISC
Je souris et la remerciai MISC
Swanilda MISC
Oh MISC
C'est dommage MISC
acte III MISC
Swanilda MISC
Swanilda MISC
Swanilda MISC
Swanilda MISC
Je souris MISC
Swanilda MISC
Swanilda MISC
histoire de Swanilda MISC
016_14_frmod.txt
Seigneur ! MISC
Rebecca MISC
Rebecca MISC
Swanilda MISC
– Prête MISC
Une main MISC
– Prête MISC
– Jessica MISC
Je suis Swanilda MISC
Katie MISC
Swanilda MISC
Swanilda MISC
Katie MISC
Katie MISC
J'ai commencé MISC
Katie MISC
C'est dommage MISC
acte III MISC
Swanilda MISC
Swanilda MISC
Swanilda MISC
Swanilda MISC
Swanilda MISC
Swanilda MISC
histoire de Swanilda MISC
J'ai vu une silhouette monter sur la scène MISC
J'ai répondu en signant :
– Je t'en prie MISC
J'ai même vu une femme au premier rang MISC
Katie MISC
Katie MISC
Oh, non ! MISC
Katie MISC
Katie MISC
J'ai traduit MISC
016_14_qu.txt
Hélène MISC
Dr Coppélius MISC
Swanilda MISC
J'ai mis mon costume MISC
Hélène MISC
Hélène MISC
Hélène MISC
Hélène MISC
Hélène MISC
Merci MISC
Hélène MISC
Hélène MISC
Je suis Swanilda MISC
Swanilda MISC
Swanilda MISC
Élizabeth MISC
Matthieu MISC
J'ai fait moi MISC
Hélène MISC
acte III MISC
Swanilda MISC
Dr Coppélius MISC
Swanilda MISC
Dr Coppélius MISC
Franz et Swanilda MISC
Je souris MISC
Hélène MISC
Swanilda MISC
Swanilda MISC
Hélène et madame Biron MISC
Matthieu MISC
Élizabeth MISC
016_15_bg.txt
Julie MISC
J'ai été impressionnée! MISC
Je souris MISC
Merci MISC
Bonjour MISC
Excusez MISC
Hé! MISC
Chérie MISC
Ça MISC
Super! MISC
Mes amies MISC
J'ai encore MISC
Dillon MISC
Swanilda MISC
Swanilda MISC
J'ai répété le rôle de Swanilda MISC
Oh MISC
C'est ma mère MISC
Julie MISC
Taverne des Pêcheurs MISC
Julie MISC
Julie MISC
Julie MISC
016_15_frmod.txt
Katie MISC
Katie MISC
Katie MISC
Jessica MISC
Katie MISC
J'ai dit à MISC
Katie MISC
– Annonce MISC
J'ai eu la paix MISC
C'était maman MISC
– Jessica ! MISC
Jessica ! MISC
Rebecca MISC
Jessi ! MISC
– Jessica MISC
J'ai souri MISC
– Merci MISC
J'ai entendu une voix familière MISC
Jessica MISC
– As MISC
Katie MISC
J'ai remarqué que ma famille MISC
Katie MISC
Oh ! MISC
J'ai acquiescé MISC
Jessica et moi MISC
Oh MISC
Super ! MISC
Katie MISC
Katie MISC
J'ai l'impression d'avoir MISC
Katie MISC
Jessica MISC
Swanilda MISC
J'ai bien dû admettre MISC
Swanilda MISC
Katie MISC
J'ai répété le rôle de Swanilda MISC
Katie MISC
Katie MISC
C'est ma mère MISC
Katie MISC
– Jessica MISC
Rebecca MISC
– Moi MISC
– Mmm MISC
J'ai commandé une salade de fruits MISC
Jessica MISC
– Merci MISC
016_15_qu.txt
Élizabeth MISC
Jessie MISC
Matthieu MISC
Jessie MISC
J'ai tant de choses MISC
J'ai moins de difficulté à m'exprimer MISC
Jessie MISC
Jessie MISC
Merci MISC
Mes amies MISC
Je suis en train de mettre mon manteau MISC
Jessie MISC
Swanilda MISC
Swanilda MISC
Swanilda MISC
Jessie MISC
Christine MISC
Jessie MISC
Let’s do it all again for English#
The steps below in the notebook basically repeat the process described above, but using the spaCy model for English instead of French. At first, I just copied over the code cells from the French section (switching out enfiledirectory
for filedirectory
in the first line so it looked in the right place for the English files), but kept getting some bizarre output: namely, it wasn’t able to find any entities labeled PER or LOC.
To figure out what was going on, I removed the line if ent.label_ == 'PER':
and un-indented the code nested inside it, to avoid Python indentation errors, then commented-out the following out.write lines by putting a # in front of them. I didn’t want to write any results, I just wanted to see what entities it found.
Lo and behold, there were lots of entities, with more different entity labels than available for French. There’s GPE (geopolitical entity, AKA location, but not any of the locations that are like “so-and-so’s room”), DATE, CARDINAL (number type), LANGUAGE (seems relevant for Jessi’s Secret Language), TIME (e.g. “the morning of the day”), DATE (things like “a few weeks”, or, strangely, “eight-year-old”), PERSON (not PER).
First, we have to put in the path to the English files, then change to that directory:
#Put in the path here, using the same conventions as described above in step 4
enfiledirectory = '/Users/qad/Documents/dsc/dscm2/en'
os.chdir(enfiledirectory)
11. English NER for people#
#Sort all the files in the directory you specified above, alphabetically.
#For each of those files...
for filename in sorted(os.listdir(enfiledirectory)):
#If the filename ends with .txt (i.e. if it's actually a text files)
if filename.endswith('.txt'):
#Write out below the name of the file
print(filename)
#The file name of the output file adds _ner_per to the end of the file name of the input file
outfilename = filename.replace('.txt', '_ner_per.txt')
#Open the infput filename
with open(filename, 'r') as f:
#Create and open the output filename
with open(outfilename, 'w') as out:
#Read the contents of the input file
chaptertext = f.read()
#Do English NLP on the contents of the input file
chapterner = ennlp(chaptertext)
#For each recognized entity
for ent in chapterner.ents:
#If that entity is labeled as a person
if ent.label_ == 'PERSON':
#Print the entity, and the label (which should be PER)
print(ent.text, ent.label_)
#Write the entity to the output file
out.write(ent.text)
#Write a newline character to the output file
out.write('\n')
016_01_en.txt
Becca PERSON
Squirt PERSON
John Philip Ramsey PERSON
Jessi Ramsey PERSON
Jessica Davis Ramsey PERSON
Becca PERSON
Barre PERSON
Becca PERSON
Jessi PERSON
Jessi PERSON
Becca PERSON
Coppelius PERSON
Coppelius PERSON
Becca PERSON
Noelle PERSON
016_02_en.txt
David Michael PERSON
Thomas PERSON
Mary Anne Spier PERSON
Claudia Kishi PERSON
Stacey McGill PERSON
Stacey PERSON
Mal PERSON
Dawn Schafer PERSON
Mary Anne PERSON
Mal PERSON
Sam PERSON
Charlie PERSON
David Michael PERSON
Watson Brewer PERSON
Karen PERSON
Andrew PERSON
Mary Anne Spier PERSON
Watson PERSON
Brewer PERSON
Charlie PERSON
Mary Anne PERSON
Claudia Kishi PERSON
Mary Anne PERSON
Mal PERSON
Janine PERSON
Mimi PERSON
Mary Anne Spier PERSON
Dawn Schafer's PERSON
Mary Anne PERSON
Mary Anne PERSON
Mary Anne PERSON
Mary Anne PERSON
Mary Anne PERSON
Spier PERSON
Stacey McGill PERSON
Stacey PERSON
Dawn Schafer PERSON
Mary Anne PERSON
Jeff PERSON
Jeff PERSON
Schafer PERSON
Dawn PERSON
Dawn PERSON
Mal PERSON
Mal PERSON
Logan Bruno PERSON
Mary Anne's PERSON
Shannon Kilbourne PERSON
Dawn PERSON
Mal PERSON
Mary Anne PERSON
Braddock PERSON
Braddocks PERSON
Ameslan PERSON
Ameslan PERSON
Braddock PERSON
Mary Anne PERSON
Braddock PERSON
Mal PERSON
Jessi PERSON
016_03_en.txt
PLIÉ PERSON
Mademoiselle Jones PERSON
Mme Noelle PERSON
Becca PERSON
Mme Noelle PERSON
Mme Noelle PERSON
Katie Beth PERSON
Katie Beth PERSON
Mademoiselle Romsey PERSON
Mademoiselle Romsey PERSON
Ramsey PERSON
Mme Noelle's PERSON
Katie Beth PERSON
Katie Beth PERSON
Mme Noelle PERSON
Mme Noelle PERSON
Hilary PERSON
Katie Beth PERSON
Madame PERSON
Mary Bramstedt PERSON
Lisa Jones PERSON
Katie Beth PERSON
Coppélia PERSON
Hilary PERSON
Katie Beth PERSON
Coppélia PERSON
Mme Noelle PERSON
Mademoiselle Jessica Romsey PERSON
Jessica Romsey PERSON
Jessica Ramsey PERSON
Mme Noelle PERSON
Jessica PERSON
Closs PERSON
Jessi PERSON
Mary Bramstedt PERSON
Lisa Jones PERSON
Jessi PERSON
Mary PERSON
Lisa PERSON
Katie Beth PERSON
Coppélia PERSON
Katie Beth PERSON
Katie Beth PERSON
Katie Beth PERSON
Katie PERSON
Madame PERSON
Katie Beth PERSON
016_04_en.txt
Mama PERSON
Noelle PERSON
Katie Beth PERSON
Matthew Braddock PERSON
Ameslan PERSON
Ameslan PERSON
Ameslan PERSON
Matt PERSON
Braddocks PERSON
Braddocks PERSON
Jessica PERSON
Jessi PERSON
Haley PERSON
Geiger PERSON
Braddock PERSON
Reeboks PERSON
Haley PERSON
Jessica PERSON
Jessi PERSON
Mommy PERSON
Braddock PERSON
Braddock PERSON
Matt PERSON
Matt PERSON
Haley PERSON
Braddock PERSON
Mommy PERSON
Haley PERSON
Haley PERSON
Braddock PERSON
Matt PERSON
Jessi PERSON
Haley PERSON
Braddock PERSON
Matt PERSON
Matt PERSON
Matt PERSON
Braddock PERSON
Matt PERSON
Matt PERSON
Braddocks PERSON
Matt PERSON
Braddock PERSON
Matt PERSON
Braddock PERSON
Haley PERSON
Braddock PERSON
Braddock PERSON
Matt PERSON
Braddocks PERSON
Braddock PERSON
Matt PERSON
Haley PERSON
Braddock PERSON
Braddock PERSON
Matt PERSON
Braddock PERSON
Matt PERSON
Braddock PERSON
Matt PERSON
Matt PERSON
Jessi PERSON
Matt PERSON
Braddock PERSON
Matt PERSON
Matt PERSON
Haley PERSON
Braddock PERSON
Matt PERSON
Braddock PERSON
Haley PERSON
Matt PERSON
Braddock PERSON
Braddpck PERSON
Matt PERSON
Braddocks PERSON
Matt PERSON
Matt PERSON
Haley PERSON
Haley PERSON
Matt PERSON
Haley PERSON
Matt PERSON
Haley PERSON
Braddock PERSON
Braddock PERSON
016_05_en.txt
Jenny PERSON
Braddocks PERSON
Mary Anne Spier PERSON
Jenny Prezzioso PERSON
Mary Anne's PERSON
Mary Anne's afternoon PERSON
Prezziosos PERSON
P. PERSON
Mary Anne PERSON
Mary Anne PERSON
Prezziosos PERSON
Mary Janes PERSON
Prezziosos PERSON
Prezzioso PERSON
Mary Anne PERSON
Mary Anne PERSON
Jen PERSON
Mary Anne PERSON
Jenny PERSON
Mary Anne PERSON
Mary Anne PERSON
Candy Land PERSON
Squirrel Nutkin PERSON
Mary Anne PERSON
Mary Anne PERSON
Mary Anne PERSON
Mary Anne PERSON
Mary Anne PERSON
Mary Anne PERSON
Mary Anne PERSON
Jenny PERSON
Windbreaker PERSON
Mary Anne PERSON
Mary Anne PERSON
Matt PERSON
Haley PERSON
Braddocks PERSON
Braddocks PERSON
Braddock PERSON
Matt PERSON
Haley PERSON
Mary Anne PERSON
Braddocks PERSON
Mary Anne PERSON
Braddocks PERSON
Braddocks PERSON
Jenny PERSON
Mary Anne PERSON
Haley PERSON
Matt PERSON
Jenny PERSON
Haley PERSON
Matt PERSON
Jenny PERSON
Matt PERSON
Matt PERSON
Haley PERSON
Matt PERSON
Jenny PERSON
Haley PERSON
Matt PERSON
Mary Anne PERSON
Haley PERSON
Mary Anne PERSON
Jessi PERSON
Matt PERSON
Matt PERSON
Matt PERSON
Matt PERSON
Braddocks PERSON
Haley PERSON
Matt PERSON
Braddocks PERSON
Haley PERSON
Matt PERSON
016_06_en.txt
Haley PERSON
Matt PERSON
Matt PERSON
Haley PERSON
Matt PERSON
Matt PERSON
Braddock PERSON
Braddocks PERSON
Braddock PERSON
Matt PERSON
Braddock PERSON
Braddock PERSON
Braddock PERSON
Matt PERSON
Haley PERSON
Braddock PERSON
Matt PERSON
Matt PERSON
Jessi PERSON
Braddock PERSON
Haley PERSON
Matt PERSON
Braddock PERSON
Matt PERSON
Matt PERSON
Matt PERSON
Haley PERSON
Braddock PERSON
Matt PERSON
Matt PERSON
Matt PERSON
Matt PERSON
Haley PERSON
Braddocks PERSON
Mary Anne PERSON
Jenny Prezzioso PERSON
Matt PERSON
Haley PERSON
Becca PERSON
Charlotte Johanssen PERSON
Becca PERSON
Matt PERSON
Haley PERSON
Pikes PERSON
Matt PERSON
Matt PERSON
Vanessa PERSON
Haley PERSON
Matt PERSON
Matt PERSON
Haley PERSON
Matt PERSON
Matt PERSON
Matt PERSON
Barretts PERSON
Haley PERSON
Matt PERSON
Pikes PERSON
Pike PERSON
Barretts PERSON
Haley PERSON
Matt PERSON
Margo PERSON
Nicky PERSON
Vanessa PERSON
Haley PERSON
Claire PERSON
Matt PERSON
Claire PERSON
Haley PERSON
Claire PERSON
Matt PERSON
Matt PERSON
Haley PERSON
Matt PERSON
Matt PERSON
Matt PERSON
Pikes PERSON
Braddock PERSON
Haley PERSON
Matt PERSON
Matt PERSON
Haley PERSON
016_07_en.txt
Jessi PERSON
Jessi PERSON
Mal PERSON
Mal PERSON
Dawn PERSON
Pike PERSON
Claire PERSON
Pike PERSON
Pike PERSON
Pike PERSON
Pike PERSON
Vanessa PERSON
Claire PERSON
Pike PERSON
Claire PERSON
Byron PERSON
Vanessa Pike PERSON
Dawn PERSON
Dawn PERSON
Vanessa PERSON
Byron PERSON
Claire PERSON
Dawn PERSON
Mal PERSON
Vanessa PERSON
Jordan PERSON
Mal PERSON
Jessi PERSON
Haley PERSON
Matt PERSON
Dawn PERSON
Matt PERSON
Matt PERSON
Haley PERSON
Haley PERSON
016_08_en.txt
Mademoiselle Parsons PERSON
Katie Beth PERSON
Mademoiselle Bramstedt PERSON
Mary PERSON
Mademoiselle Romsey PERSON
Katie Beth PERSON
Katie Beth PERSON
Hilary PERSON
Katie Beth PERSON
Katie Beth PERSON
Katie Beth PERSON
Haley PERSON
Katie PERSON
Katie PERSON
Becca PERSON
Katie Beth PERSON
Adele PERSON
Adele PERSON
Adele PERSON
Katie Beth PERSON
Katie Beth PERSON
Adele PERSON
Katie Beth PERSON
Katie PERSON
Adele PERSON
Katie Beth PERSON
Adele PERSON
Katie Beth PERSON
Adele PERSON
Katie Beth PERSON
Katie PERSON
Adele PERSON
Adele PERSON
Katie Beth PERSON
Katie Beth PERSON
Adele PERSON
Katie Beth PERSON
Katie Beth PERSON
Matt PERSON
Boy PERSON
Matt PERSON
Adele PERSON
Braddocks PERSON
Matt PERSON
Adele PERSON
Katie Beth PERSON
Katie Beth PERSON
Matt PERSON
Adele PERSON
Katie Beth PERSON
Adele PERSON
Adele PERSON
Haley PERSON
Adele PERSON
Katie Beth PERSON
Adele PERSON
Adele PERSON
Katie Beth PERSON
Adele PERSON
Adele PERSON
Katie Beth's PERSON
Katie Beth PERSON
Adele PERSON
Katie Beth PERSON
016_09_en.txt
Jessie PERSON
Karen Andrew PERSON
David Micheal PERSON
Sam PERSON
Charlie PERSON
Karen PERSON
Karen PERSON
Matt PERSON
Karen PERSON
Karen Brewer PERSON
Karen PERSON
Karen PERSON
Porter PERSON
Morbidda Destiny PERSON
Ben Brewer PERSON
Andrew PERSON
Karen PERSON
David Michael PERSON
Sam PERSON
Charlie PERSON
Karen PERSON
Karen PERSON
Karen PERSON
Karen PERSON
Brewer PERSON
Brewer PERSON
Elizabeth PERSON
Karen PERSON
Andrew PERSON
Brewer PERSON
Andrew PERSON
Karen PERSON
David Michael's PERSON
Karen PERSON
Karen PERSON
David Michael's PERSON
Brewer PERSON
Andrew PERSON
Brewer PERSON
David Michael PERSON
Andrew PERSON
Karen PERSON
Andrew PERSON
David Michael PERSON
Karen PERSON
Karen PERSON
Karen PERSON
Andrew PERSON
Karen PERSON
Karen PERSON
Andrew PERSON
David Michael PERSON
Karen PERSON
David Michael PERSON
Karen PERSON
Karen PERSON
Jessi Ramsey PERSON
Mal PERSON
Karen PERSON
Karen PERSON
Karen PERSON
Karen PERSON
Karen PERSON
Yick PERSON
Andrew PERSON
Karen PERSON
Andrew PERSON
Karen PERSON
Andrew PERSON
Andrew PERSON
Creak PERSON
Karen PERSON
Moosie PERSON
Karen PERSON
Karen PERSON
Karen PERSON
Karen PERSON
Karen PERSON
Jessi PERSON
Karen PERSON
David Michael's PERSON
016_10_en.txt
Braddocks PERSON
Matt PERSON
Haley PERSON
Braddock PERSON
Braddocks PERSON
Haley PERSON
Braddock PERSON
Haley PERSON
Matt PERSON
Matt PERSON
Barretts PERSON
Jenny Prezzioso PERSON
Matt PERSON
Matt PERSON
Haley PERSON
Nicky Pike PERSON
Vanessa PERSON
Haley PERSON
Matt PERSON
Haley PERSON
Matt PERSON
Matt PERSON
Buddy Barrett PERSON
Braddocks PERSON
Matt PERSON
Haley PERSON
Haley PERSON
Matt PERSON
Matt PERSON
Haley PERSON
Haley PERSON
Haley PERSON
Matt PERSON
Haley PERSON
Haley PERSON
Matt PERSON
Matt PERSON
Matt PERSON
Haley PERSON
Matt PERSON
Haley PERSON
Haley PERSON
Matt PERSON
Matt PERSON
Matt PERSON
Matt PERSON
Matt PERSON
Jessi PERSON
Haley PERSON
Haley PERSON
Matt PERSON
Matt PERSON
Haley PERSON
Matt PERSON
Haley PERSON
Matt PERSON
Matt PERSON
Helen Keller PERSON
Haley PERSON
Matt PERSON
Haley PERSON
Haley PERSON
Adele PERSON
Mme Noelle's PERSON
Madame PERSON
Haley PERSON
Mme Noelle PERSON
016_11_en.txt
Mme Noelle PERSON
Braddock PERSON
Mal PERSON
Mary Anne PERSON
Mal PERSON
Mal PERSON
Ho-Ho's PERSON
Charlie PERSON
Charlie PERSON
Dawn PERSON
Mary Anne PERSON
Mary Anne PERSON
Jessi PERSON
Matt PERSON
Haley PERSON
Matt PERSON
Haley PERSON
Matt PERSON
Mary Anne PERSON
Kristy PERSON
Braddocks PERSON
Mary Anne PERSON
Mary Anne's PERSON
Mary Anne PERSON
Jessi PERSON
Braddocks PERSON
Mary Anne's PERSON
Dawn PERSON
Dawn PERSON
Mary Anne PERSON
Dawn PERSON
Mary Anne PERSON
Charlie PERSON
Dawn PERSON
Hmphh PERSON
Mal PERSON
Braddock PERSON
Matt PERSON
Mary Anne PERSON
Mal PERSON
016_12_en.txt
Mama PERSON
Braddock PERSON
Braddock PERSON
Braddock PERSON
Braddock PERSON
Matt PERSON
Braddock PERSON
Matt PERSON
Braddock PERSON
Matt PERSON
Matt PERSON
Braddock PERSON
Braddock PERSON
Jessi PERSON
Matt PERSON
Braddock PERSON
Matt PERSON
Braddock PERSON
Jessi PERSON
Frank PERSON
Matt PERSON
Braddock PERSON
Braddock PERSON
Matt PERSON
Frank PERSON
Matt PERSON
Frank PERSON
Matt PERSON
Braddock PERSON
Frank PERSON
Frank PERSON
Frank PERSON
Jessi Ramsey PERSON
J-E-S-S- PERSON
Matt PERSON
Frank PERSON
Jessi PERSON
Frank PERSON
Matt Braddock PERSON
Jessi PERSON
Frank PERSON
Frank PERSON
Braddock PERSON
Matt PERSON
Frank PERSON
Matt PERSON
Matt PERSON
Frank PERSON
Frank PERSON
Frank PERSON
Braddock PERSON
Matt PERSON
Matt PERSON
Matt PERSON
Matt PERSON
016_13_en.txt
Jessi PERSON
Jessi PERSON
Braddocks PERSON
Charlotte Johanssen PERSON
Becca PERSON
Stacey McGill PERSON
Millikan PERSON
Millikan PERSON
Oakley PERSON
Becca PERSON
Pinky Pye PERSON
Becca PERSON
Becca PERSON
Kristy PERSON
Matt PERSON
Becca PERSON
Charlotte Johanssen PERSON
Becca PERSON
Stacey McGill PERSON
Stacey PERSON
Stacey PERSON
Charlotte PERSON
Miss Stoneybrook PERSON
Becca PERSON
Becca PERSON
Becca PERSON
Charlotte PERSON
Char PERSON
Becca PERSON
Jessi PERSON
Charlotte PERSON
Becca PERSON
Jessi PERSON
Becca PERSON
Jessi PERSON
Char PERSON
Jessi PERSON
Becca PERSON
Charlotte PERSON
Becca PERSON
Braddock PERSON
016_14_en.txt
Braddocks PERSON
Mme Noelle PERSON
Frank PERSON
Matt PERSON
Becca PERSON
Matt PERSON
Mary Anne PERSON
Mary Anne PERSON
Mal PERSON
Logan Bruno PERSON
Mary Anne's PERSON
Braddock PERSON
Matt PERSON
Braddock PERSON
Haley PERSON
Coppelius PERSON
Mme Noelle PERSON
Jessi PERSON
Braddock PERSON
Haley PERSON
Haley PERSON
Haley PERSON
Braddock PERSON
Braddock PERSON
Braddock PERSON
Carolyn Braddock PERSON
Haley PERSON
Haley PERSON
Haley PERSON
Matt PERSON
Mme Noelle PERSON
Katie Beth PERSON
Katie Beth PERSON
Adele PERSON
Matt PERSON
Katie Beth PERSON
Matt PERSON
Dad PERSON
Adele PERSON
Katie Beth PERSON
Braddock PERSON
Coppelius PERSON
Coppelius PERSON
Haley PERSON
Haley PERSON
Braddock PERSON
Haley PERSON
Braddock PERSON
Christopher Gerber PERSON
Matt PERSON
Matt PERSON
Matt PERSON
Matt PERSON
Christopher PERSON
Matt PERSON
Adele PERSON
Katie Beth PERSON
Katie PERSON
Katie Beth PERSON
Adele PERSON
Christopher PERSON
Katie Beth PERSON
Matt PERSON
Adele PERSON
016_15_en.txt
Katie Beth PERSON
Adele PERSON
Adele PERSON
Katie Beth PERSON
Adele PERSON
Katie Beth PERSON
Adele PERSON
Jessi PERSON
Katie Beth PERSON
Katie PERSON
Jessi PERSON
Jessi PERSON
Braddock PERSON
Mary Anne PERSON
Matt PERSON
Mal PERSON
Mal PERSON
Jessi PERSON
Keisha PERSON
Mal PERSON
Mal PERSON
Katie Beth PERSON
Adele PERSON
Matt PERSON
Haley PERSON
Braddock PERSON
Katie Beth PERSON
Mal PERSON
Jessi PERSON
Keisha PERSON
Mal PERSON
Mal PERSON
Jessi PERSON
Mal PERSON
Matt PERSON
Mal PERSON
Adele PERSON
Katie Beth PERSON
Matt PERSON
Adele PERSON
Katie Beth PERSON
Katie Beth PERSON
Jessi PERSON
Jessi PERSON
Franzes PERSON
Katie Beth PERSON
Swanilda PERSON
Katie Beth PERSON
Katie Beth PERSON
Katie Beth PERSON
Adele PERSON
Katie Beth PERSON
Adele PERSON
Matt PERSON
Jessi PERSON
Mal PERSON
Mary Anne PERSON
Braddock PERSON
Matt PERSON
Haley PERSON
Braddocks PERSON
Haley PERSON
Jessi PERSON
Mary Anne PERSON
Matt PERSON
Matt PERSON
Haley PERSON
Matt PERSON
12. English NER for places#
#Sort all the files in the directory you specified above, alphabetically.
#For each of those files...
for filename in sorted(os.listdir(enfiledirectory)):
#If the filename ends with .txt (i.e. if it's actually a text files)
if filename.endswith('.txt'):
#Write out below the name of the file
print(filename)
#The file name of the output file adds _ner_loc to the end of the file name of the input file
outfilename = filename.replace('.txt', '_ner_loc.txt')
#Open the infput filename
with open(filename, 'r') as f:
#Create and open the output filename
with open(outfilename, 'w') as out:
#Read the contents of the input file
chaptertext = f.read()
#Do English NLP on the contents of the input file
chapterner = ennlp(chaptertext)
#For each recognized entity
for ent in chapterner.ents:
#If that entity is labeled as a person
if ent.label_ == 'GPE':
#Print the entity, and the label (which should be PER)
print(ent.text, ent.label_)
#Write the entity to the output file
out.write(ent.text)
#Write a newline character to the output file
out.write('\n')
016_01_en.txt
Mexico GPE
Stoneybrook GPE
Connecticut GPE
Oakley GPE
New Jersey GPE
Stoneybrook GPE
Rebecca GPE
Oakley GPE
Keisha GPE
Stamford GPE
Connecticut GPE
Stoneybrook GPE
Stamford GPE
Stoneybrook GPE
Oakley GPE
Becca GPE
Breakfast GPE
Coppélia GPE
Stamford GPE
Oakley GPE
Becca GPE
Coppélia GPE
Becca GPE
016_02_en.txt
Stoneybrook GPE
Stoneybrook GPE
Kristy GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
New York City GPE
California GPE
California GPE
Claudia GPE
California GPE
Claudia GPE
Claudia GPE
016_03_en.txt
Coppélia GPE
Hilary GPE
Hilary GPE
Coppélia GPE
Coppélia GPE
016_04_en.txt
Coppélia GPE
Stamford GPE
Haley GPE
Becca GPE
Haley GPE
016_05_en.txt
Brat GPE
Jessi GPE
Stamford GPE
U.S. GPE
Stoneybrook GPE
016_06_en.txt
Coppélia GPE
Haley GPE
Ursula GPE
016_07_en.txt
Jordan GPE
Jordan GPE
Jordan GPE
Byron GPE
Nicky GPE
Nicky GPE
Nicky GPE
Nicky GPE
016_08_en.txt
Coppélia GPE
Hilary GPE
Stamford GPE
Becca GPE
Massachusetts GPE
016_09_en.txt
Kristys GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Legos GPE
Claudia GPE
Legos GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
016_10_en.txt
Nicky GPE
Nicky GPE
Jordan GPE
Nicky GPE
Becca GPE
Becca GPE
Becca GPE
Becca GPE
Stamford GPE
Becca GPE
Becca GPE
Becca GPE
Coppélia GPE
016_11_en.txt
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Claudia GPE
Ballerinas GPE
Oreo GPE
Claudia GPE
Claudia GPE
California GPE
California GPE
Logan GPE
Claudia GPE
Claudia GPE
Becca GPE
Stamford GPE
016_12_en.txt
Stamford GPE
Coppélia GPE
Coppélia GPE
Coppélia GPE
016_13_en.txt
Becca GPE
Becca GPE
Charlotte GPE
Becca GPE
Stamford GPE
Becca GPE
Becca GPE
Becca GPE
Becca GPE
New Jersey GPE
Stoneybrook GPE
Becca GPE
Becca GPE
Becca GPE
Becca GPE
Becca GPE
Becca GPE
Becca GPE
Becca GPE
Stoneybrook GPE
Stoneybrook GPE
Becca GPE
Becca GPE
Charlotte GPE
Becca GPE
Charlotte GPE
Becca GPE
Charlotte GPE
Becca GPE
Becca GPE
Becca GPE
Charlotte GPE
Charlotte GPE
Charlotte GPE
Jessi GPE
Charlotte GPE
Charlotte GPE
016_14_en.txt
Coppélia GPE
Charlotte GPE
Stamford GPE
Becca GPE
New Jersey GPE
Claudia GPE
Coppélia GPE
Stamford GPE
Coppélia GPE
Coppélia GPE
016_15_en.txt
Becca GPE
Haley GPE
Kristy GPE
Keisha GPE
Keisha GPE
Keisha GPE
Keisha GPE
Keisha GPE
Keisha GPE
Swanildas GPE
Becca GPE
Claudia GPE
Claudia GPE
Ambrosia GPE
13. English NER for organizations#
I think my favorite entity type is ORG, which gets you everything from Oakley Elementary to Swanilda to Mama. (Yes, friends, “Daddy” is a PERSON but “Mama” is an ORG.)
#Sort all the files in the directory you specified above, alphabetically.
#For each of those files...
for filename in sorted(os.listdir(enfiledirectory)):
#If the filename ends with .txt (i.e. if it's actually a text files)
if filename.endswith('.txt'):
#Write out below the name of the file
print(filename)
#The file name of the output file adds _ner_org to the end of the file name of the input file
outfilename = filename.replace('.txt', '_ner_org.txt')
#Open the infput filename
with open(filename, 'r') as f:
#Create and open the output filename
with open(outfilename, 'w') as out:
#Read the contents of the input file
chaptertext = f.read()
#Do English NLP on the contents of the input file
chapterner = ennlp(chaptertext)
#For each recognized entity
for ent in chapterner.ents:
#If that entity is labeled as a person
if ent.label_ == 'ORG':
#Print the entity, and the label (which should be PER)
print(ent.text, ent.label_)
#Write the entity to the output file
out.write(ent.text)
#Write a newline character to the output file
out.write('\n')
016_01_en.txt
Stoneybrook ORG
General Hospital ORG
Stoneybrook Middle School ORG
Keisha ORG
Baby ORG
Club ORG
Squirt ORG
Squirt ORG
Squirt ORG
Franz ORG
Franz ORG
Franz ORG
Franz ORG
Baby ORG
016_02_en.txt
Baby ORG
Kristy Thomas ORG
Mallory Pike ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Baby ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Baby ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Thomases ORG
Kristy ORG
Kristy ORG
Stoneybrook Middle School ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Baby ORG
Stoneybrook ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Matthew ORG
Matthew ORG
Kristy ORG
Kristy ORG
016_03_en.txt
the Donce of the Hours ORG
Franz ORG
016_04_en.txt
Mama ORG
Baby ORG
the American Sign Language Dictionary ORG
Halley's Comet ORG
the American Sign Language Dictionary ORG
016_05_en.txt
House ORG
Jenny ORG
Jenny ORG
016_06_en.txt
Stoneybrook ORG
Suzi ORG
Patriots ORG
016_07_en.txt
Pike ORG
Pike ORG
016_08_en.txt
Noelle ORG
Parsonses ORG
Parsonses ORG
Keisha ORG
016_09_en.txt
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Stoneybrook ORG
Kristy ORG
Kristy ORG
Kristy ORG
Fair ORG
Karen and Claudia ORG
Brewers ORG
Baby ORG
Baby ORG
Club ORG
Kristy ORG
BOO ORG
Tickly ORG
The Witch Next Door ORG
016_10_en.txt
the Stoneybrook Community Center ORG
Pike ORG
Squirt ORG
Squirt ORG
Stoneybrook Elementary ORG
Braddock ORG
016_11_en.txt
Baby ORG
Kristy ORG
Kristy ORG
Yodels ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kid-Kits ORG
Kristy ORG
Kid-Kits ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy, Claudia ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Club ORG
Kristy ORG
016_12_en.txt
Stoneybrook Middle School ORG
016_13_en.txt
Squirt ORG
Kristy ORG
Squirt ORG
Squirt ORG
Kristy ORG
Millikan ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Squirt ORG
Baby-Wipes ORG
Kristy ORG
Squirt ORG
Squirt ORG
Kristy ORG
Squirt ORG
Squirt ORG
Squirt ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Baby ORG
Kristy ORG
Kristy ORG
Kristy ORG
Siamese ORG
Kristy ORG
Chadotte ORG
Kristy ORG
Baby ORG
Club ORG
Kristy ORG
Copernicus ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
Kristy ORG
016_14_en.txt
Baby ORG
Kristy ORG
Kristy ORG
Kristy ORG
016_15_en.txt
Keisha ORG
Keisha ORG
Keisha ORG
Keisha ORG
Keisha ORG
Keisha ORG
Honey ORG
Franz ORG
Kristy ORG
14. English NER for works of art#
That said, there’s also a rare WORK_OF_ART entity type, exemplified by “Morning, Squirts”, “Hey, Jessi”, “On Top of Old Smoky” (depends on how you feel about folk music, I guess), and, my very favorite work of art, “Nope”.
(July 19, 2021 note: Alas, spaCy 3 has learned that ‘nope’ isn’t a work of art. 😔 It was fun while it lasted.)
#Sort all the files in the directory you specified above, alphabetically.
#For each of those files...
for filename in sorted(os.listdir(enfiledirectory)):
#If the filename ends with .txt (i.e. if it's actually a text files)
if filename.endswith('.txt'):
#Write out below the name of the file
print(filename)
#The file name of the output file adds _ner_art to the end of the file name of the input file
outfilename = filename.replace('.txt', '_ner_art.txt')
#Open the infput filename
with open(filename, 'r') as f:
#Create and open the output filename
with open(outfilename, 'w') as out:
#Read the contents of the input file
chaptertext = f.read()
#Do English NLP on the contents of the input file
chapterner = ennlp(chaptertext)
#For each recognized entity
for ent in chapterner.ents:
#If that entity is labeled as a person
if ent.label_ == 'WORK_OF_ART':
#Print the entity, and the label (which should be PER)
print(ent.text, ent.label_)
#Write the entity to the output file
out.write(ent.text)
#Write a newline character to the output file
out.write('\n')
016_01_en.txt
Mallory Pike WORK_OF_ART
Morning, Squirts WORK_OF_ART
016_02_en.txt
016_03_en.txt
Swanilda WORK_OF_ART
Swanilda because Madame WORK_OF_ART
016_04_en.txt
Show Jessi WORK_OF_ART
016_05_en.txt
016_06_en.txt
The Secret Language WORK_OF_ART
016_07_en.txt
Mallory and Dawn WORK_OF_ART
On Top of Old Smoky" WORK_OF_ART
Mallory, Dawn WORK_OF_ART
American Sign Language WORK_OF_ART
016_08_en.txt
016_09_en.txt
Hi, Claudia WORK_OF_ART
Tinker Toys WORK_OF_ART
"Night, Andrew. WORK_OF_ART
016_10_en.txt
016_11_en.txt
Logan WORK_OF_ART
Opening night WORK_OF_ART
016_12_en.txt
016_13_en.txt
016_14_en.txt
016_15_en.txt
What now?#
I reran the notebook for English, reluctantly limiting myself to the entity types held in common between the English and French models. So now I had, chapter-by-chapter, translation-by-translation (plus original English), all the person, location, and organization type entities.
That’s nice.
But that wasn’t my question: what I wanted to know was how the translators adapted people and place names. I had to figure out what to do with all these text files to get me closer to an answer.
I had some thinking to do.
Lee#
So, before getting into the finer differences between the various translations and the approaches taken by the translators, can we just take a second to appreciate the narrative of this BSC story? Now, far from perfect as a disability narrative - for instance, the main deaf character never gets to “speak” for himself, with instead his older sister “speaking” both for him and for herself - this is a really nuanced portrayal of difference and empathy. All I remember from when I read the book (sigh) 30 years ago was the dancing J sign for Jessi’s name, but now I’m struck by how Jessi/Jessie/Justine/Jessica is wise beyond her years, a reflection of her own experiences with being different in her new hometown. I got a little choked up as I read (at least the first of the four different times I read it) her efforts to bring the deaf schoolkids to the show, and how her frenemy at ballet finally connects with her own deaf sister.
And when Jessi said that her performance could never be perfect because “There was no way Swanilda could have been black, so I wasn’t perfect, but I knew I was dancing very well”? Gutted. In every French version(s).
There’s another layer to this discussion of translation as there is another language in the text: sign language. As pointed out in the narrative(s), there are many different variations of sign language, which means that the translators had to accurately “translate” the signs that were being done, as it wasn’t the same in the various French dialects.
So, a fun little close-reading exercise on my part, once I stopped crying, I mean, YOU’RE CRYING NOT ME.
In order of levels of translation/cultural adaptation, it goes: Belgium, Quebec, France. The Belgium translation takes great pains to situate the narrative in an environment that would be familiar to a European reader: the names have all been francisized (even Claudia Kishi becomes Julie Kishi), and the places have been localized as well. Justine and her family are not from New Jersey but Burkina Faso. Interestingly, they move to Neuville, France, and not Belgium. Sophie Lambert (aka Stacey McGill) moves back to Paris and not New York City, while Carole (aka Dawn) is “une vraie Provençale” rather than a California girl. These localized adaptations extend to any cultural reference (such as books and board games) and particularly in this volume, the gross-out songs Mallory/Marjorie’s brothers sing at dinner about spaghetti.
At least I think it is. I’m not so up on Franco-Belgium gross-out songs about pasta dishes.
In the Quebec translation, the names are all again mostly francisized (seriously, though, I knew kids with like ¾ of the last names chosen for the kids here), and the place names stay also mostly the same: Jessie Raymond is from New Jersey and Diane Dubreuil is from California, but Sophie Ménard moves back to Toronto. The cultural references are either rendered local (OH GOD LOOK HERE IS A REFERENCE TO FREAKING CAILLOU) or completely neutralized (like Claudia’s stash of now non-brand name snacks - a missed opportunity to get some local references to Vachon snack cakes). They sing the same spaghetti song in the Quebec translation as the Belgium one, but there is one really interesting difference choice that the Quebec translators make in multiple (ok, the two that I’ve read) volumes: if it is a question of how to translate something, they just ignore it and erase it.
For example, in this volume, in the original English, Jessi’s new ballet teacher keeps getting her name wrong. That is preserved in both the French from France and French from Belgium translations, but not the Quebec one, where Jessie just gets confused that she is being called Madame Raymond rather than by her first name. This might just be a result of the initial name-choice for Jessie in Quebec: it’s kinda hard to mess up Raymond. So rather than force the issue, it was just tweaked.
More interestingly, when reading of the mysteries in Quebec translation, California figured prominently (it’s one of the ones featuring the child actor from their hometown), but other than one reference to the kid coming home, all other references to California or things that may or may not have happened in California in other volumes are just simply removed and erased.
(Not to mention that having a FRANCOPHONE child star starring in a FRANCOPHONE show for FRANCOPHONE kids be recorded in California when OH I DON’T KNOW THERE IS A LOCAL FRANCOPHONE TV/MOVIE INDUSTRY IN MONTREAL and then remove all other references to California seems to me like a missed opportunity, but I digress. Seriously. Have them live in a suburb outside of Quebec City or Sherbrooke and then send him to Montreal to be a star. It works. It makes sense. I mean, there is no movie industry that I know of in Provençal so having the other Baby-Sitters visit Provençal and run into said child star unless that’s where he summers now makes zero sense but it didn’t stop the Belgium translators…)
Ok, wait, what were we talking about…
So, Belgium goes all-in on translation and adaptation, Quebec splits the difference (it would make sense that a kid from New Jersey or California would end up at a French school because THANKS BILL 101!), while France makes zero effort to mask that these are American kids doing American things and that while the story is in French, the references and names and such remain firmly American.
Most of the names remain largely English (although bizarrely Claudia Kishi becomes Claudia Koshi), as do the place names, except for Stoneybrooke (which would almost be like a sci-fi or fantasy place name that you see and have no idea how to pronounce for any French audience), which becomes Stamford, Connecticut, unlike Neuville, France or Quebec’s generic Nouville with no province/state/country associated with it. Even a cultural reference, in this case a book, gets changed to a more recognizable American title: Bambi. It is a different gross-out spaghetti song sung at dinner, which is not recognizable to me either, and may be the only notable instance of adaptation in the French-from-France translation.
The varying approaches to translating/adapting the text is perhaps most noticeable in how the three different translations deal with sign language. The Belgium translation goes into great detail about which version of sign language that is being used and taught. The Quebec translation drops in a mention of Québécois sign language, while the French-from-France translation is like, meh, it’s sign language, you get it. I don’t know enough about the different sign languages to know if the adaptations are accurate or not for the various different sign languages, but the French-from-France translation explains sign language even less than the original English text (which does identify that they are learning and using American Sign Language as opposed to Signed English or British Sign Language, which is actually based off of French Sign Language).
I didn’t notice (so thank-you Quinn for that helpful chart below!) that the reference to the architect of the neighborhood isn’t mentioned in the France or Belgium translations, which again, makes cultural sense. These “planned communities” and post-WWII suburbs are a uniquely North American phenomenon, but one that wouldn’t have been foreign to a Quebec audience (my grandparent’s neighborhood was like that, for instance) but wouldn’t have made much sense to a European audience, so even with France going all-in on the Americanness of the books, they still took care to ensure that the references weren’t so foreign as to be unrecognizable to a reader.
So, to recap, Belgium goes all-in on a European adaptation of the book, Quebec splits the difference, and France doesn’t seem to care, and in fact would seem to emphasize the americanness of the text. This all makes sense from a marketing perspective. Belgium is a smaller market, and so making the books hyper-local would limit the sales appeal to other Francophone readers. Quebec had a basically captured market, and their efforts reflect real efforts to appeal to the local market, while also not trying to make things too complicated for the army of translators they were employing to churn out these translations. France, well, France is France and France is gonna France, and probably figured who cares, you know it’s the USA, we know it’s the USA, you’re going to buy the books in part for that reason, so…let’s just make sure you can pronounce all the names.
I realize that this is probably not the level of cultural analysis that you all are expecting from this. But when you ask someone from Quebec to reflect on cultural/linguistic choices that France makes…we in Quebec have an Office de la langue française, one of whose tasks is to make up new French words every time there is a new English word that comes along. France says things like “stopping” and “shopping” where in Quebec WE DO NOT SAY SUCH THINGS USE THE PROPER FRENCH VERBS NOT SOME BASTARDIZED ENGLISH GERUND SAID WITH A PARISIAN ACCENT.
So that France goes all-in on the Americanness of the series isn’t surprising. That Quebec didn’t go more all-in in the adaptation to a local audience was, but given the limitations due to pressures to produce, it is understandable.
Quinn#
Lee’s take on BSC names in translation had got me thinking. For starters, score one for close reading! If you’ve got four variations of a children’s novel, and want to say something about some aspect of that novel, you should not start by opening a Jupyter notebook. Put down your laptop, find a comfy chair, and just read the books with your own eyeballs. It definitely took Lee less time to do that than it took me to scan and OCR the books, reformat them into individual chapters, clean up the punctuation characters, and write some Python code. And by the end, she could sit down and write something. I, on the other hand, had 180 additional small text files on my laptop to show for it, and nothing new to say yet. Woot.
But then I thought about what Lee didn’t do there. Knowledgeable humans are great at providing a synthesis of the interesting things they’ve noticed in a text. What they’re less great at is being comprehensive, pulling out details that aren’t individually interesting, but may become interesting in the aggregate. It’s tedious work, and it’s only possible with a lot of manual labor – unless you use digital tools. (And even still, let’s face it, there’s a lot of manual work that goes into getting your text ready for digital tools.)
In my moment of doubt, I turned to the Bible – the BSC Bible, that is. Smith College’s Special Collections finding aid for the Ann M. Martin papers actually includes that as an alternate title for The Complete Guide to The Baby-Sitters Club, the complete (through 1996) compendium of all information about all people (my PER/PERSON entities), places (LOC/GPE), and things (including ORGs) in the Baby-Sitters Club universe. And by all, I mean all. Find any arbitrary character, however niche, and this book has all the information in the canon. Does anyone remember Nicole Lavista, one of the “Battle of the Bakers” daycare kids in BSC Mystery #21: Claudia and the Recipe for Danger? Me, neither. But she’s six years old, has “hair in black curls”, is “full of tricks”, and “loves to draw and paint”. There’s even page-level citations, though they do me no good since we chose to remove page numbers from the text file output of our OCR. All this information is already compiled for the English series through the BSC Bible, originally as a resource for the ghostwriters. But what might we learn from trying to recreate something similar from the ground up, for the universe of each of the French translations? In addition to the question of how much localization has been done, the question of consistency intrigues me. If you’re not localizing much of anything, it’s less of an issue, but was the pool of translators in Montreal comparing notes about how they adapted the names of peripheral but at least occasionally-recurring characters?
Rethinking NER#
I wanted to write up NER for this book because, hey, names! And I’ve done it before, and it seemed like a fun opportunity to compare the performance of spaCy’s French and English models. Ultimately, though, NER makes more sense as the primary method of finding names and places when you’re dealing with a much larger corpus, and/or a much more heterogenous one than Baby-Sitters Club books. It’s feasible to make a list of characters and places within a single fictional universe, like what you can find in the BSC Bible, even if you have all 200ish novels in translation. It’s much less feasible if you have 200 novels, each in its own universe – let alone 2,000 or more.
So where to go from here? It’d be fun to try to annotate some texts and see if I can train a better NER model for the Baby-Sitters Club (especially the French), but that’s a topic for another DSC Multilingual Mystery. Instead, what I have now for locations, I can use to cross-check with the English, for an easy and interesting source of likely localizations and differences. With names, I can hopefully use NER to identify characters that are newly-introduced in each book (or perhaps re-introduced under a different localized name?), and then add those names to the list of known characters in that translated universe. That curated list, rather than NER itself, will be the basis for checking translations for references to characters. And another thing: for best chances of identifying new characters, I think I’ll stop limiting the NER results to just the entities flagged PER. Too many character names are showing up as LOC or ORG to trust the classification. As for the ORG entities specifically, they’re almost all errors of one sort or another. I can’t think of any good research questions offhand that deal with organizations in this universe, so I think I won’t worry about them.
Close-reading some distant-reading outputs#
For lulz, I threw the location NER per-chapter output for the Belgian corpus into Voyant, and even the word cloud was mostly a testament to how badly the model performed at classifying entities as locations.
So in the end, I deleted all those little NER output files I’d generated with the Jupyter notebook, with aspirations of somehow comparing them programmatically. Instead, those Python print() statements in the notebook were what I consulted – using my own eyeballs. Because the chapters are so short, it’s not hard to trace place names back to the context, and then find that same context in the original. I’m about halfway there to imagining how I’d implement something more scalable in Python for checking at least the entities that refer to people (whether or not they’re tagged that way). But honestly, for my own process, I find that I need to spend time cleaning data manually before I feel like I understand it well enough to come up with a workable programmatic approach.
Here’s some things I found following up on some of the words tagged as locations in the NER:
In the Belgian translation, Justine Victoire is from Ouagadougou, Burkina Faso (as Lee mentioned). Her best friend / cousin in Ouagadougou is named Johanna. Neuville is a small town not far from Aubrives, where her father works. Justine’s linguistic aptitude was put to the test on a family trip to Spain. Her sister’s name is adapted as Roseline, and her brother is Victor, AKA Gringalet.
In the Quebec translation, like in the English original and France French translation, Jessie picked up Spanish in Mexico. Jessi’s best friend/cousin Keisha becomes Kara. Her sister is still Becca, but her brother becomes Jean-Philippe, AKA Jaja. To Lee’s point of the Quebec translation just deleting things when they complicate matters too much: there’s no mention of Jessi’s father working in another city. They moved to Nouville for his job, end of discussion.
In the France French translation… well, see Lee’s section for details. It’s just like the English original, just in French. The only exception of note is that her little brother, John Philip Ramsey Junior (yes, that whole thing is his name in French too, including the “Junior”), is nicknamed “P’tit Bout” instead of Squirt. (Sorry for being baffled by your inclusion of “P’tit” among the entities, spaCy. You were right.)
As of Belgian BSC #16, Sophie Lambert has moved back with her family to Paris. Meanwhile, Carole Leroy has moved to Neuville from Provence along with her brother (still named David). Both the Belgian and the France-French versions take a “just-the-facts” approach to Dawn’s situation, but the translators in Montreal were willing to take on Jessi’s compassionate editorializing: “Comme le dit si souvent Diane, sa famille est déchirée en deux, mais je suis certaine qu’elle va s’en sortir.” / “As Dawn pointed out, her family is now ripped in half. I think Dawn is a survivor, though.”
In the Belgian translation, Jessi’s ballet school is in Aubrives (consistent with her father also working there).
Belgian Mathieu thinks that Marseille will beat Monaco in soccer, rather than the Patriots winning the Super Bowl. In Quebec, Matthieu talks about the “Patriotes” winning the soccer eliminations – a sort of mixed-reference there. The French are all in for the Patriots and the Super Bowl.
And here’s some notes and a list of the name correspondences, working from the PER tags:
Chapter 5 starts with “Brat, brat, brat.” in English. All the French translations skip that part and go straight into the next sentence about how everyone agrees that Jenny Prezzioso is spoiled and a little bratty (Lee adds: She really, really is. Reading it four times really drives that point home. How many synonyms in French for “spoiled brat”? A lot and it is amazing).
Even without having to compare across books, I managed to find a naming inconsistency! One of Jessi’s dance colleagues, Mary Bramstedt, is given the role of a villager. In the Quebec version, her name is first translated as Marie Brazeau. Later, there’s a reference to Mademoiselle Croteau, which Jessi explains: “(c’est Marie, une citoyenne dans le ballet)”. So either the Quebec translator innovated a second Marie, dancing the same part as the first one, or… the translator forgot the surname they used earlier.
English |
Quebec |
Belgium |
France |
---|---|---|---|
Kristy Thomas |
Christine Thomas |
Valérie Demoulin |
Kristy Parker |
Mary-Anne Spier |
Anne-Marie Lapierre |
Mélanie Moreau |
Mary Anne Cook |
Claudia Kishi |
Claudia Kishi |
Julie Kishi |
Claudia Koshi |
Stacey McGill |
Sophie Ménard |
Sophie Lambert |
Lucy MacDouglas |
Dawn Schafer |
Diane Dubreuil |
Carole Leroy |
Carla Schafer |
Mallory Pike |
Marjorie Picard |
Marjorie Levêque |
Mallory Pike |
Jessi Ramsay |
Jessie Raymond |
Justine Victoire |
Jessica Ramsey |
Becca Ramsay |
Becca Raymond |
Roseline |
Rebecca Ramsey |
John Philip Ramsey Jr. (“Squirt”) |
Jean-Philippe (“Jaja”) Raymond |
Victor “Gringalet” |
John Philip Ramsey Junior (“P’tit Bout”) |
Keisha (Jessi’s cousin) |
Kara |
Johanna |
Keisha |
Sam Thompson |
Sébastien Thompson |
Stéphane Demoulin |
Samuel Parker |
Charlie Thompson |
Charles Thompson |
Nicolas Demoulin |
Charlie Parker |
David Michael Thompson |
David Thompson |
Sébastien Demoulin |
David Michael Parker |
Watson Brewer |
Guillaume Marchand |
Yvan Arnould |
Jim Lelland |
Karen Brewer |
Karen Marchand |
Coralie Arnould |
Karen Lelland |
Andrew Brewer |
André Marchand |
Arnaud Arnould |
Andrew Lelland |
Janine Kishi |
Josée Kishi |
Laurence Kishi |
Jane Koshi |
Jeff Schafer |
Julien Dubreuil |
David Leroy |
David Schafer |
Tigger (Mary-Anne’s cat) |
Tigrou |
N/A |
Tigrou |
Logan Bruno |
Louis Brunet |
Bruno Lejeune |
Logan Rinaldi |
Shannon Kilbourne |
Chantal Chrétien |
Cécile Gauthier |
Louisa Kilbourne |
Matthew Braddock |
Matthieu Biron |
Mathieu Brinbeuf |
Matthew Braddock |
Haley Braddock |
Hélène Biron |
Agnès Brinbeuf |
Helen Braddock |
Madame Noelle |
Mademoiselle Noëlle |
Madame Dillon |
Mme Noelle |
Hilary (dancer) |
Élizabeth |
Hélène |
Hilary |
Katie Beth Parsons (dancer/frenemy) |
Élizabeth Pellerin |
Catherine |
Katie Parson |
Mary Bramstedt (dancer) |
Marie Brazeau (Croteau in ch. 8) |
Marie Bernstein |
Mary Bramstedt |
Lisa Jones (dancer) |
Lise Jordan |
Lise Jacqué |
Lisa Jones |
Carrie Steinfeld (dancer) |
Carole St-Onge |
Carine Schmitt |
Carrie Steinfeld |
Mr. Geiger (architect in Stoneybrook) |
monsieur Gougeon |
N/A (omits paragraph about all the houses looking the same) |
N/A (omits paragraph about all the houses looking the same) |
Jenny Prezzioso |
Jeanne Prieur |
Aurélie Precisio |
Jenny Prezzioso |
Charlotte Johanssen |
Charlotte Jasmin |
Charlotte Cuvelier |
Charlotte Johanssen |
Nicky Pike |
Nicolas Picard |
Laurent Levêque |
Nicky Pike |
Vanessa Pike |
Vanessa |
Vanessa Levêque |
Vanessa Pike |
Buddy Barrett |
Bruno Barrette |
Antoine Godefroid |
Buddy Barrett |
Suzi Barrett |
– |
Elodie Godefroid |
Liz Barrett |
Margo Pike |
Margot Picard |
Anaïs Levêque |
Margot Pike |
Claire Pike |
Claire Picard |
Juliette Levêque |
Claire Pike |
Byron Pike |
Bernard Picard |
Alain Levêque |
Byron Pike |
Adam Pike |
Antoine Picard |
Loïc Levêque |
Adam Pike |
Jordan Pike |
Joël Picard |
Samuel Levêque |
Jordan Pike |
Adele Parson (Katie Beth’s sister) |
Adèle Pellerin |
Adeline |
Adèle Parson |
Ben Brewer (ghost) |
vieux Ben |
Benoît Arnould |
Ben Lelland |
Mrs. Porter |
Madame Portai |
madame Rensonnet |
Mme Porter |
Morbidda Destiny |
Destinée Morbide |
Vieille Sorcière |
Morbidda Destiny |
Moosie (Karen’s stuffed cat) |
N/A |
Flocon |
Moosie |
Mrs. Frank (Matt’s teacher) |
France |
madame Franck |
Mme Franck |
Carolyn Braddock (Matt’s mother) |
Caroline Biron |
Caroline Brinbeuf |
Carolyn Braddock |
Christopher Gerber (dancer) |
Christophe Baril |
Christophe Gélin |
Christopher Gerber |
Where to next?#
I find it tantalizing that my idea about the translators getting sloppy with peripheral character names already seems to be playing out even over the course of a single book. The risks are biggest for the Belgian and Quebec translations, which do more by way of localization, and we know that at least in Quebec they had multiple different translators. Lee made some progress in DSC Multilingual Mystery #1 on identifying the translators of all the French translations, using national library metadata. I think it’s time we clean up the metadata mess and make a clean spreadsheet we can use to support further inquiry.
To be continued…
Suggested Citation#
Skallerup Bessette, Lee and Quinn Dombrowski. “DSC Multilingual Mystery 2: Beware, Lee and Quinn!”. February 27, 2020. https://datasittersclub.github.io/site/dscm2.html.