DSC logo

DSC Multilingual Mystery 2: Beware, Lee and Quinn!

DSC M2 book cover

By Lee Skallerup-Bessette and Quinn Dombrowski

February 27, 2020

Quinn

The differences leap out at you before you even open any of the books. “Christine a une idée géniale”. Or was the Baby-Sitters Club “L’idée géniale de Valérie:? Does “Bruno aime Mélanie”, or is he instead “Un amoureux pour Anne-Marie”? There’s a case study on the Dutch translation of the Baby-Sitters Club books in Babysitting the reader: translating English narrative fiction for girls into Dutch (1946-1995) by Mieke K T Desmet that gets into strategies for localizing a story that takes place in a different cultural context, and the article “Cultural Understanding in the Indonesian Translation of The Baby-sitters Club” by Halida Aisyah talk about how the Indonesian translation took a different approach, maintaining the protagonists’ foreign names and locations, and only adopting Indonesian cultural references when the American equivalent would’ve been incomprehensible without some kind of extensive explanation. But I hadn’t come across any scholarly literature on translation strategies for The Baby-Sitters Club in French (in any of the translations: Québécois, Belgian, or French from France).

I had questions, and not just “what did they decide to call Mallory?” (Spoiler alert: in Québécois it’s Marjorie, and just like in English, it’s the most frequently screwed up name when OCRing the books.) The ghostwriters in the US were working with an extensive “BSC Bible” that had the description and background of every character in Stoneybrook, and further afield in the BSC universe. (This was adapted and published as The complete guide to The Baby-sitters Club.) But in DSC Mystery #1: Lee and the Missing Metadata, Lee discovered that at least in Quebec, they were throwing Baby-Sitters Club books at multiple translators, who turned them around in no time at all. How careful were the translators about consistency, in terms of what they called various peripheral characters and places? This was the making of another Data-Sitters Club Multilingual Mystery. (Who are the data-sitters? So glad you asked. Check out Chapter 2.)

Lee and I put our heads together about how we’d start looking into this mystery. We needed a book that we had on hand in all the translations: Québécois, Belgian, French from France (the last of these being the source of the recent French re-releases). We settled on Jessi’s Secret Language, on the thought that all the major characters had been established by that point, as well as many peripheral ones. We’d need to compare with some of the other translations, but that would be our starting point.

Here’s the thing, though: Lee reads French. I don’t. I mean, I could probably pick my way through the text and come up with a list of characters and places, but I had other ideas. I wanted to see how French named-entity recognition performed compared to English, when applied to The Baby-Sitters Club.

What’s named-entity recognition?

Named-entity recognition (often abbreviated NER) is a kind of information extraction task – basically, trying to identify particular things (like names of people, places, and organizations) in unstructured text, like a novel. (Yeah, I know that novels have structure, but your average plain-text file of a novel’s text – even if it maintains chapter headers and such – doesn’t have the kind of structure that a computer can easily read. I mean, it’s not like it’s a spreadsheet or something.) There are two major technical approaches: one uses grammar-based rules to identify the things of interest, and the other uses statistical models like machine learning, and requires a ton of labeled data (e.g. texts where a human has already gone through and correctly identified all the things of interest) upfront. Particularly for statistical models, the more your texts resemble the example texts that the model was trained on, the better the NER will perform. These models are most commonly trained on news corpora, or Wikipedia – not 80’s and 90’s girls’ literature. This sort of thing is a problem in DH more broadly, not just for us Data-Sitters. David Bamman’s LitBank project (a dataset of annotated excerpts from public domain literature) is one example of how DH scholars can significantly improve the effectiveness of natural-language processing (NLP) by training models on data that looks more like what we’re trying to apply it to. But I’ll save the question of how, exactly, one goes about training a model for a future Data-Sitters Club Multilingual Mystery. For the moment, let’s see how some commonly-used tools perform “out of the box”.

The tools

The two major NLP tools with multilingual coverage are spaCy and Stanford NLP. To use spaCy, you load it into Python and run it that way. While there’s a Python version of Stanford NLP, as of February 2020 it doesn’t cover everything – and entities are one thing that’s currently left out. To get entities with Stanford NLP, you have to run a memory-hungry Java program from the command line, with all the joy that comes from setting that up. To make matters worse, Stanford NLP doesn’t have an NER model for French: just English, Spanish, German, and Chinese. It’s a better comparison to look at English vs. French with the same tool, rather than English with one and French with the other, so for this mystery, we’ll be using spaCy.

The texts

To make it easier to compare the entities from each text, I split up each translation plus the English original into 15 plain text files, one from every chapter. Everything else I left as I got it from ABBYY FineReader (as discussed in DSC #2: Katia and the Phantom Corpus), plus the corrections to my (often bad) attempt to transcribe the “handwritten text” portions. I didn’t appreciate some implications of that – I’ll get back to it in a bit.

Getting started with spaCy

SpaCy is run via Python, so it can seem a little intimidating if you’ve never worked with a programming language before. For this mystery, I set up a Jupyter notebook in the Data-Sitters Club GitHub repo that you can download and use for your own texts. (If you’re not familiar with Jupyter notebooks, here’s a Programming Historian tutorial.)

You can’t run the exact same experiment I did without access to the same texts I have (which I can’t share for copyright reasons), but the Jupyter notebook on GitHub has all the output I got running it on the Baby-Sitters Club corpus, so you can see the results of the process one step at a time.

1. Downloading spaCy models

The first step is to download the spaCy models. These models have been pre-trained on annotated French and English corpora, respectively. You only have to run these code cells below the first time you run the notebook; after that, you can skip right to step 2 and carry on from there. (If you run them again later, nothing bad will happen; it’ll just download again.) You can also run spaCy in other notebooks on your computer in the future, and you’ll be able to skip the step of downloading the models.

#Imports the module you need to download and install the spaCy French and English models
import sys
#Installs the French spaCy model
!{sys.executable} -m pip install https://github.com/explosion/spacy-models/releases/download/fr_core_news_sm-2.2.0/fr_core_news_sm-2.2.0.tar.gz
Collecting https://github.com/explosion/spacy-models/releases/download/fr_core_news_sm-2.2.0/fr_core_news_sm-2.2.0.tar.gz
?25l  Downloading https://github.com/explosion/spacy-models/releases/download/fr_core_news_sm-2.2.0/fr_core_news_sm-2.2.0.tar.gz (14.7MB)
    100% |████████████████████████████████| 14.7MB 2.0MB/s ta 0:00:011   51% |████████████████▍               | 7.5MB 8.8MB/s eta 0:00:01
?25hRequirement already satisfied (use --upgrade to upgrade): fr-core-news-sm==2.2.0 from https://github.com/explosion/spacy-models/releases/download/fr_core_news_sm-2.2.0/fr_core_news_sm-2.2.0.tar.gz in ./anaconda3/lib/python3.6/site-packages
Requirement already satisfied: spacy>=2.2.0 in ./anaconda3/lib/python3.6/site-packages (from fr-core-news-sm==2.2.0) (2.2.3)
Requirement already satisfied: catalogue<1.1.0,>=0.0.7 in ./anaconda3/lib/python3.6/site-packages (from spacy>=2.2.0->fr-core-news-sm==2.2.0) (1.0.0)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in ./anaconda3/lib/python3.6/site-packages (from spacy>=2.2.0->fr-core-news-sm==2.2.0) (3.0.2)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in ./anaconda3/lib/python3.6/site-packages (from spacy>=2.2.0->fr-core-news-sm==2.2.0) (2.0.2)
Requirement already satisfied: thinc<7.4.0,>=7.3.0 in ./anaconda3/lib/python3.6/site-packages (from spacy>=2.2.0->fr-core-news-sm==2.2.0) (7.3.1)
Requirement already satisfied: numpy>=1.15.0 in ./anaconda3/lib/python3.6/site-packages (from spacy>=2.2.0->fr-core-news-sm==2.2.0) (1.16.2)
Requirement already satisfied: setuptools in ./anaconda3/lib/python3.6/site-packages (from spacy>=2.2.0->fr-core-news-sm==2.2.0) (41.0.0)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in ./anaconda3/lib/python3.6/site-packages (from spacy>=2.2.0->fr-core-news-sm==2.2.0) (2.21.0)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in ./anaconda3/lib/python3.6/site-packages (from spacy>=2.2.0->fr-core-news-sm==2.2.0) (1.0.2)
Requirement already satisfied: wasabi<1.1.0,>=0.4.0 in ./anaconda3/lib/python3.6/site-packages (from spacy>=2.2.0->fr-core-news-sm==2.2.0) (0.6.0)
Requirement already satisfied: plac<1.2.0,>=0.9.6 in ./anaconda3/lib/python3.6/site-packages (from spacy>=2.2.0->fr-core-news-sm==2.2.0) (0.9.6)
Requirement already satisfied: srsly<1.1.0,>=0.1.0 in ./anaconda3/lib/python3.6/site-packages (from spacy>=2.2.0->fr-core-news-sm==2.2.0) (1.0.1)
Requirement already satisfied: blis<0.5.0,>=0.4.0 in ./anaconda3/lib/python3.6/site-packages (from spacy>=2.2.0->fr-core-news-sm==2.2.0) (0.4.1)
Requirement already satisfied: importlib-metadata>=0.20; python_version < "3.8" in ./anaconda3/lib/python3.6/site-packages (from catalogue<1.1.0,>=0.0.7->spacy>=2.2.0->fr-core-news-sm==2.2.0) (1.5.0)
Requirement already satisfied: tqdm<5.0.0,>=4.10.0 in ./anaconda3/lib/python3.6/site-packages (from thinc<7.4.0,>=7.3.0->spacy>=2.2.0->fr-core-news-sm==2.2.0) (4.31.1)
Requirement already satisfied: certifi>=2017.4.17 in ./anaconda3/lib/python3.6/site-packages (from requests<3.0.0,>=2.13.0->spacy>=2.2.0->fr-core-news-sm==2.2.0) (2019.9.11)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in ./anaconda3/lib/python3.6/site-packages (from requests<3.0.0,>=2.13.0->spacy>=2.2.0->fr-core-news-sm==2.2.0) (3.0.4)
Requirement already satisfied: idna<2.9,>=2.5 in ./anaconda3/lib/python3.6/site-packages (from requests<3.0.0,>=2.13.0->spacy>=2.2.0->fr-core-news-sm==2.2.0) (2.8)
Requirement already satisfied: urllib3<1.25,>=1.21.1 in ./anaconda3/lib/python3.6/site-packages (from requests<3.0.0,>=2.13.0->spacy>=2.2.0->fr-core-news-sm==2.2.0) (1.24.1)
Requirement already satisfied: zipp>=0.5 in ./anaconda3/lib/python3.6/site-packages (from importlib-metadata>=0.20; python_version < "3.8"->catalogue<1.1.0,>=0.0.7->spacy>=2.2.0->fr-core-news-sm==2.2.0) (3.0.0)
Building wheels for collected packages: fr-core-news-sm
  Building wheel for fr-core-news-sm (setup.py) ... ?25ldone
?25h  Stored in directory: /Users/qad/Library/Caches/pip/wheels/1e/5f/4c/b196e2768830b7636db9b6509af16e2bffc0da98b0725421dd
Successfully built fr-core-news-sm
#Installs the English spaCy model
!{sys.executable} -m pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.0/en_core_web_sm-2.2.0.tar.gz
Collecting https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.0/en_core_web_sm-2.2.0.tar.gz
?25l  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.0/en_core_web_sm-2.2.0.tar.gz (12.0MB)
    100% |████████████████████████████████| 12.0MB 2.5MB/s ta 0:00:011
?25hRequirement already satisfied (use --upgrade to upgrade): en-core-web-sm==2.2.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.0/en_core_web_sm-2.2.0.tar.gz in ./anaconda3/lib/python3.6/site-packages
Requirement already satisfied: spacy>=2.2.0 in ./anaconda3/lib/python3.6/site-packages (from en-core-web-sm==2.2.0) (2.2.3)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in ./anaconda3/lib/python3.6/site-packages (from spacy>=2.2.0->en-core-web-sm==2.2.0) (3.0.2)
Requirement already satisfied: thinc<7.4.0,>=7.3.0 in ./anaconda3/lib/python3.6/site-packages (from spacy>=2.2.0->en-core-web-sm==2.2.0) (7.3.1)
Requirement already satisfied: blis<0.5.0,>=0.4.0 in ./anaconda3/lib/python3.6/site-packages (from spacy>=2.2.0->en-core-web-sm==2.2.0) (0.4.1)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in ./anaconda3/lib/python3.6/site-packages (from spacy>=2.2.0->en-core-web-sm==2.2.0) (1.0.2)
Requirement already satisfied: numpy>=1.15.0 in ./anaconda3/lib/python3.6/site-packages (from spacy>=2.2.0->en-core-web-sm==2.2.0) (1.16.2)
Requirement already satisfied: catalogue<1.1.0,>=0.0.7 in ./anaconda3/lib/python3.6/site-packages (from spacy>=2.2.0->en-core-web-sm==2.2.0) (1.0.0)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in ./anaconda3/lib/python3.6/site-packages (from spacy>=2.2.0->en-core-web-sm==2.2.0) (2.21.0)
Requirement already satisfied: srsly<1.1.0,>=0.1.0 in ./anaconda3/lib/python3.6/site-packages (from spacy>=2.2.0->en-core-web-sm==2.2.0) (1.0.1)
Requirement already satisfied: setuptools in ./anaconda3/lib/python3.6/site-packages (from spacy>=2.2.0->en-core-web-sm==2.2.0) (41.0.0)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in ./anaconda3/lib/python3.6/site-packages (from spacy>=2.2.0->en-core-web-sm==2.2.0) (2.0.2)
Requirement already satisfied: wasabi<1.1.0,>=0.4.0 in ./anaconda3/lib/python3.6/site-packages (from spacy>=2.2.0->en-core-web-sm==2.2.0) (0.6.0)
Requirement already satisfied: plac<1.2.0,>=0.9.6 in ./anaconda3/lib/python3.6/site-packages (from spacy>=2.2.0->en-core-web-sm==2.2.0) (0.9.6)
Requirement already satisfied: tqdm<5.0.0,>=4.10.0 in ./anaconda3/lib/python3.6/site-packages (from thinc<7.4.0,>=7.3.0->spacy>=2.2.0->en-core-web-sm==2.2.0) (4.31.1)
Requirement already satisfied: importlib-metadata>=0.20; python_version < "3.8" in ./anaconda3/lib/python3.6/site-packages (from catalogue<1.1.0,>=0.0.7->spacy>=2.2.0->en-core-web-sm==2.2.0) (1.5.0)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in ./anaconda3/lib/python3.6/site-packages (from requests<3.0.0,>=2.13.0->spacy>=2.2.0->en-core-web-sm==2.2.0) (3.0.4)
Requirement already satisfied: urllib3<1.25,>=1.21.1 in ./anaconda3/lib/python3.6/site-packages (from requests<3.0.0,>=2.13.0->spacy>=2.2.0->en-core-web-sm==2.2.0) (1.24.1)
Requirement already satisfied: certifi>=2017.4.17 in ./anaconda3/lib/python3.6/site-packages (from requests<3.0.0,>=2.13.0->spacy>=2.2.0->en-core-web-sm==2.2.0) (2019.9.11)
Requirement already satisfied: idna<2.9,>=2.5 in ./anaconda3/lib/python3.6/site-packages (from requests<3.0.0,>=2.13.0->spacy>=2.2.0->en-core-web-sm==2.2.0) (2.8)
Requirement already satisfied: zipp>=0.5 in ./anaconda3/lib/python3.6/site-packages (from importlib-metadata>=0.20; python_version < "3.8"->catalogue<1.1.0,>=0.0.7->spacy>=2.2.0->en-core-web-sm==2.2.0) (3.0.0)
Building wheels for collected packages: en-core-web-sm
  Building wheel for en-core-web-sm (setup.py) ... ?25ldone
?25h  Stored in directory: /Users/qad/Library/Caches/pip/wheels/48/5c/1c/15f9d02afc8221a668d2172446dd8467b20cdb9aef80a172a4
Successfully built en-core-web-sm

2. Importing spaCy and setting up NLP

Run the code cell below to import the spaCy module, and create two functions: one which loads the French model and runs the NLP algorithms ( includes named-entity recognition), and one which does the same for the English.

#Imports spaCy
import spacy
#Imports the French model
import fr_core_news_sm
#Sets up a function so you can run the French model on texts
frnlp = fr_core_news_sm.load()
#Imports the English model
import en_core_web_sm
#Sets up a function so you can run the English model on texts
ennlp = en_core_web_sm.load()

3. Importing other modules

There’s various other modules that will be useful in this notebook. The code comments explain what each one is for. This code cell imports all of those.

#io is used for opening and writing files
import io
#glob is used to find all the pathnames matching a specified pattern (here, all text files)
import glob
#os is used to navigate your folder directories (e.g. change folders to where you files are stored)
import os

4. Diretory setup

Assuming you’re running Jupyter Notebook from your computer’s home directory, this code cell gives you the opportunity to change directories, into the directory where you’re keeping your French text files. (This notebook is designed to deal with one language at a time, and assumes your French text files are in one folder, and English are in another.)

Replace /Users/qad/Documents/dsc/dscm2 with the full path to the directory with your files.

For instance, the default path to the Documents directory is (substituting your user name on the computer for YOUR-USER-NAME):

  • On Mac: ‘/Users/YOUR-USER-NAME/Documents’

  • On Windows: ‘C:\Users\YOUR-USER-NAME\Documents’

#Define the file directory here
filedirectory = '/Users/qad/Documents/dsc/dscm2'
#Change the working directory to the one you just defined
os.chdir(filedirectory)

Running spaCy

5. French NER, first try

The code cell in step 5 in the Jupyter notebook iterates through the files in the folder you specified up in step 4, after sorting them alphabetically. For every file that ends in .txt (an important limitation – you’ll get an error if you try to have Python open a file that isn’t a text file, including those pesky invisible .DS_STORE files in just about every Mac folder), the code defines an output file name that involves appending ‘_ner_per.txt’ to the end of the input filename.

Opening the input file (i.e. each file in turn, one at a time) and the newly-created, empty output file, the code reads in the text of the input file, and runs the spaCy French NLP. Then, for every word recognized as an entity, as long as it’s an entity labeled ‘PER’ (a person), the entity is written to the screen (with a print command) and to the output file. I thought it’d be easiest to work through the entities one type at a time, starting just with the character names.
I wrote this code, a couple times pulling up previous notebooks I’d written that did similar things, and consulting the spaCy documentation and examples for how to display the entities.

#Sort all the files in the directory you specified above, alphabetically.
#For each of those files...
for filename in sorted(os.listdir(filedirectory)):
    #If the filename ends with .txt (i.e. if it's actually a text files)
    if filename.endswith('.txt'):
        #Write out below the name of the file
        print(filename)
        #The file name of the output file adds _ner_per to the end of the file name of the input file
        outfilename = filename.replace('.txt', '_ner_per.txt')
        #Open the input filename
        with open(filename, 'r') as f:
            #Create and open the output filename
            with open(outfilename, 'w') as out:
                #Read the contents of the input file
                chaptertext = f.read()
                #Do French NLP on the contents of the input file
                chapterner = frnlp(chaptertext)
                #For each recognized entity
                for ent in chapterner.ents:
                    #If that entity is labeled as a person
                    if ent.label_ == 'PER':
                        #Print the entity, and the label (which should be PER)
                        print(ent.text, ent.label_)
                        #Write the entity to the output file
                        out.write(ent.text)
                        #Write a newline character to the output file
                        out.write('\n')
            
016_01_bg.txt
J’ PER
J’ PER
j’ai PER
Roseline PER
Victor surnommé Gringalet PER
Justine Victoire PER
J’ PER
Gringalet PER
Johanna PER
j’ai PER
Marjorie Levêque PER
Ouagadougou PER
j’ PER
papa PER
j’entrai PER
Roseline PER
j’ PER
j’ PER
m’ PER
Justine PER
’’ PER
j’aimerais PER
j’espérais PER
j’ PER
Coppélius PER
Coppélia PER
Roseline PER
Franz PER
Swanilda PER
Franz PER
Swanilda PER
Coppélia PER
Coppé PER
Swanilda PER
Franz PER
”
“ PER
madame Dillon PER
Roseline PER
“Maman PER
”
“ PER
Roseline PER
016_02_bg.txt
Marjorie Levêque PER
Marjorie PER
Valérie PER
Marjorie PER
Valérie PER
Sébastien PER
Madame Demoulin PER
Valérie PER
Mélanie Moreau PER
Julie Kishi PER
Sophie Lambert PER
Carole Leroy PER
Valérie PER
Valérie PER
Valérie PER
Stéphane PER
Sébastien PER
Valérie PER
Yvan Arnould PER
Arnaud PER
Valérie PER
Julie Kishi PER
Julie PER
Yvan PER
Demoulin PER
d’Yvan PER
Yvan PER
Valérie PER
Valérie PER
Julie PER
Julie Kishi PER
Julie PER
Marjorie PER
Julie PER
Marjorie PER
Julie PER
Julie PER
Julie PER
Laurence PER
Julie PER
Mimi PER
Carole PER
Julie PER
Valérie PER
Moreau PER
Sophie Lambert PER
Sophie PER
Carole PER
Carole PER
Valérie PER
Carole PER
Marjorie PER
mange Julie PER
David PER
David PER
Marjorie PER
Marjorie PER
Marjorie PER
Bruno Lejeune PER
Cécile Gauthier PER
Valérie PER
Carole PER
Marjorie PER
Julie PER
Carole PER
Julie PER
Marjorie PER
D’ PER
madame Brinbeuf PER
Agnès PER
Mathieu PER
Brinbeuf PER
madame Brinbeuf PER
Mathieu PER
Carole PER
Mélanie PER
Julie PER
Valérie PER
Marjorie PER
Marjorie PER
Marjorie PER
“Hé PER
Justine PER
j’ PER
016_03_bg.txt
Jacqué PER
madame Dillon PER
madame Dillon PER
Coppélia PER
j’ai PER
j’ai dansé PER
Madame Dillon PER
Hélène PER
Catherine PER
Hélène PER
Catherine PER
madame Dillon PER
Hélène PER
Catherine PER
Madame Dillon PER
Catherine PER
Madame Dillon PER
j’espérais PER
Hélène PER
Catherine PER
madame Dillon PER
Madame Dillon PER
Marie Bernstein PER
Lise Jacqué PER
Carine Schmitt PER
Catherine PER
Coppélia PER
Madame Dillon PER
Hélène PER
Catherine PER
Coppélia PER
Coppélia PER
madame Dillon PER
Swanilda PER
Justine Victor” PER
madame Dillon PER
Justine Victor PER
Justine Victoire PER
Justine PER
Justine PER
J’ PER
C’ PER
madame Dillon PER
Swanilda PER
Franz PER
Justine PER
Marie Bernstein PER
Lise Jacqué PER
Justine PER
Marie PER
Lise PER
C’ PER
Coppélia PER
Catherine PER
Catherine PER
” “ PER
j’entendis PER
Hélène PER
madame Dillon PER
madame Dillon PER
j’ PER
016_04_bg.txt
madame Dillon PER
”
“ PER
madame Dillon PER
Hélène PER
Catherine PER
C’ PER
Mathieu Brinbeuf PER
Mathieu PER
Agnès PER
madame Brinbeuf PER
C’ PER
madame Brinbeuf PER
Mathieu PER
Agnès PER
Agnès PER
Mathieu PER
Brinbeuf PER
Agnès PER
Justine PER
madame Brinbeuf PER
Mathieu PER
Mathieu n’ PER
”
“ PER
Mathieu PER
Mathieu grandira PER
madame Brinbeuf PER
Mathieu PER
madame Brinbeuf PER
Mathieu PER
Mathieu PER
madame Brinbeuf PER
Mathieu PER
d’Agnès PER
Madame Brinbeuf PER
Mathieu PER
Mathieu PER
Madame Brinbeuf PER
Mathieu PER
“Mathieu PER
Justine PER
”
Mathieu PER
Madame Brinbeuf PER
Mathieu PER
Mathieu PER
Madame Brinbeuf PER
Agnès d’ PER
Mathieu PER
Agnès PER
Mathieu PER
Brinbeuf PER
Mathieu s’ PER
Agnès PER
Mathieu PER
Agnès PER
Agnès PER
Mathieu PER
Agnès PER
madame Brinbeuf PER
Madame Brinbeuf PER
j’avais PER
j’ PER
016_05_bg.txt
Justine PER
Mélanie Moreau PER
Aurélie Precisio PER
Madame Precisio PER
Roseline PER
Aurélie n’ PER
Madame Precisio PER
Aurélie s’ PER
ta maman PER
Mathieu PER
Agnès PER
J’ PER
Brinbeuf PER
Brinbeuf m’ PER
Mathieu PER
Agnès PER
Agnès PER
Mathieu PER
Mathieu PER
Agnès PER
Mathieu s’exécuta PER
Agnès PER
Mathieu PER
Agnès PER
“ PER
”
“ PER
Agnès PER
Justine PER
Agnès PER
Mathieu PER
Mathieu PER
Agnès PER
Agnès PER
Mathieu PER
Agnès PER
Mathieu PER
016_06_bg.txt
j’ai PER
Agnès PER
Mathieu PER
Coppélia PER
J’ PER
Mathieu n’ PER
Agnès n’ PER
Agnès PER
Mathieu n’ PER
madame
Brinbeuf PER
j’arrivai PER
madame Brinbeuf PER
Mathieu n’ PER
”
“ PER
Madame Brinbeuf m’ PER
Brinbeuf PER
Brinbeuf PER
Mathieu PER
Agnès PER
Madame Brinbeuf PER
Mathieu PER
Mathieu PER
Agnès PER
Justine PER
”
“ PER
Madame Brinbeuf PER
Agnès PER
Mathieu PER
madame Brinbeuf PER
Mathieu PER
Mathieu PER
Brinbeuf n’ PER
Mathieu PER
Mathieu PER
Mathieu PER
“P” PER
Agnès PER
Aurélie Precisio PER
Agnès PER
Mathieu PER
Roseline PER
Charlotte Cuvelier PER
Roseline PER
Mathieu PER
Agnès PER
Agnès PER
Agnès d’ PER
Mathieu PER
Agnès PER
Agnès PER
Mathieu PER
Agnès PER
Mathieu PER
Agnès PER
Mathieu PER
Mathieu s’ PER
Mathieu PER
Marjorie PER
Antoine PER
Agnès PER
Mathieu PER
C’ PER
Levêque PER
Agnès PER
Marjorie PER
Marjorie PER
Mathieu n’ PER
“Vraiment PER
Anaïs PER
Laurent PER
Agnès PER
Juliette PER
Mathieu PER
Juliette PER
Agnès PER
Juliette PER
Agnès PER
Antoine Godefroid PER
Agnès n’ PER
Mathieu PER
Antoine PER
Mathieu PER
madame Brinbeuf PER
Agnès PER
Mathieu PER
Mathieu PER
Agnès s’ PER
016_07_bg.txt
Justine PER
Marjorie PER
Justine PER
Monsieur PER
madame Levêque PER
Marjorie PER
Carole PER
Alain PER
Loïc PER
Samuel PER
Levêque PER
Levêque PER
Levêque PER
Marjorie PER
Vanessa PER
Anaïs PER
Juliette PER
Marjorie PER
Carole PER
Alain PER
Loïc PER
Alain PER
Marjorie PER
Loïc PER
Samuel PER
Carole PER
Marjorie PER
Marjorie PER
Alain PER
Carole PER
Marjorie PER
Carole n’ PER
Laurent PER
Marjorie PER
Carole PER
Anaïs PER
Carole PER
Marjorie PER
Anaïs PER
Vanessa PER
Anaïs PER
Anaïs PER
Carole PER
Laurent PER
Carole PER
Marjorie PER
Laurent PER
Carole PER
Marjorie PER
Carole PER
Carole PER
Vanessa PER
Laurent PER
”
“ PER
lapin’ PER
Samuel PER
Juliette PER
Carole PER
Marjorie PER
Anaïs PER
Vanessa PER
Loïc PER
Marjorie PER
Carole PER
Marjorie PER
Carole PER
Marjorie PER
Agnès PER
Mathieu PER
Marjorie PER
Mathieu PER
Carole PER
Marjorie PER
C’ PER
C’ PER
Swanilda PER
Mathieu PER
J’ PER
Agnès PER
Agnès PER
016_08_bg.txt
madame Dillon PER
OK PER
Mademoiselle Bersnstein PER
Victor PER
Swanilda PER
C’ PER
Prochaine PER
madame Dillon PER
Catherine PER
Catherine PER
Hélène PER
Catherine PER
J’ PER
Catherine PER
Catherine n’ PER
d’Agnès PER
Catherine PER
Roseline PER
Catherine PER
Adeline” PER
Catherine PER
Adeline PER
“J’ PER
madame Dillon PER
j’ PER
Catherine PER
Adeline PER
Catherine PER
Catherine PER
Roseline PER
Adeline PER
Adeline PER
Catherine PER
Adeline PER
Catherine PER
Catherine PER
Mathieu PER
Mathieu PER
Adeline PER
Mathieu PER
Adeline PER
Catherine PER
Catherine PER
“Comment PER
Adeline PER
Catherine PER
Adeline PER
Catherine PER
d’Adeline PER
Adeline PER
Catherine PER
Johanna PER
Marjorie PER
J’ PER
Justine PER
Coraline PER
Arnaud PER
Sébastien PER
Valérie PER
Valérie PER
Stéphane PER
Nicolas PER
016_09_bg.txt
Justine PER
Coralie Arnaud PER
Sébastien PER
Valérie PER
Valérie PER
Stéphane PER
Nicolas PER
Coraline PER
Mathieu PER
J’ PER
Julie PER
Julie PER
Valérie PER
Arnaud PER
Julie PER
Julie PER
Valérie PER
Julie l’ PER
Julie PER
d’Yvan PER
Arnaud PER
madame Rensonnet PER
Benoît PER
Julie PER
Arnaud PER
Sébastien PER
Valérie PER
Stéphane PER
Valérie PER
Julie PER
Valérie PER
Stéphane PER
Julie PER
Julie PER
Julie PER
C’ PER
Valérie PER
Arnaud PER
Monsieur Arnaud PER
”
“Oh PER
Arnaud PER
madame Arnaud PER
“ PER
”
“Donc PER
La mère de PER
Valérie soupira PER
Julie PER
Arnaud PER
Sébastien PER
Julie PER
Sébastien PER
Arnaud PER
Julie PER
Arnaud PER
Julie PER
Arnaud PER
Sébastien PER
Bonjour PER
Julie PER
Arnaud PER
Julie s’ PER
Julie PER
Julie PER
Julie n’ PER
Julie PER
Arnaud PER
N’ PER
Julie PER
Arnaud PER
Sébastien PER
j’avais PER
Julie PER
Julie PER
Justine Victoire PER
Julie PER
Roseline PER
Julie m’ PER
“Attends PER
Julie PER
Marjorie PER
“J’ PER
Raconter PER
” J’ PER
Julie PER
Julie PER
Julie PER
Arnaud PER
Arnaud PER
Arnaud PER
Julie PER
Julie PER
Arnaud PER
ton prénom PER
Julie PER
”
“ PER
Arnaud PER
”
Julie PER
Flocon PER
Julie PER
Mademoiselle Minet PER
Julie PER
Julie PER
Coralie PER
Doudou PER
Julie PER
Julie PER
Justine PER
Arnaud PER
Sébastien PER
016_10_bg.txt
Mathieu PER
Agnès PER
madame Brinbeuf PER
qu’Agnès PER
madame Brinbeuf PER
Agnès PER
Mathieu PER
Aurélie Precisio PER
Mathieu PER
Vanessa Levêque PER
C’ PER
Vanessa PER
Agnès PER
Laurent PER
Mathieu PER
Antoine Godefroid PER
Agnès PER
Mathieu PER
Antoine Godefroid PER
Mathieu PER
Vanessa PER
Laurent PER
Agnès PER
Agnès PER
Laurent PER
Mathieu PER
Mathieu PER
Loïc PER
Agnès PER
Mathieu PER
”
“ PER
” “ PER
Agnès PER
Mathieu” PER
”
“ PER
” “ PER
Mathieu PER
Mathieu PER
Agnès PER
Gringalet PER
“ PER
Mathieu n’ PER
Agnès PER
Mathieu PER
Mathieu PER
Laurent PER
Alain PER
Justine PER
”
“ PER
j’aimerais PER
Agnès PER
Roseline PER
Roseline PER
J’ PER
Roseline PER
”
“ PER
Mathieu PER
J’ PER
Roseline PER
Agnès PER
Mathieu PER
Mathieu PER
Agnès PER
Mathieu PER
j’avais PER
Agnès PER
Agnès PER
Adeline PER
madame Dillon PER
madame Dillon PER
Coppélia PER
C’ PER
Agnès PER
madame Dillon PER
016_11_bg.txt
madame Dillon PER
madame Brinbeuf PER
j’arrivai PER
Julie PER
Mes PER
madame Dillon PER
Valérie PER
Marjorie PER
Julie PER
Valérie PER
Julie PER
Carole PER
Julie PER
Marjorie PER
Marjorie m’ PER
Julie PER
Julie PER
Julie PER
Marjorie PER
Julie PER
Julie PER
Valérie PER
Carole PER
Carole PER
Julie PER
Valérie PER
Valérie PER
Justine PER
Mathieu PER
Mathieu PER
Agnès PER
Agnès PER
Mathieu PER
“ PER
“Vraiment PER
Julie PER
Brinbeuf PER
L’ PER
Carole PER
Bruno! PER
“Bruno PER
Justine PER
Julie PER
Carole PER
Carole PER
Gringalet PER
Bruno” PER
Carole PER
Carole PER
Marjorie PER
Marjorie PER
Bonjour PER
Brinbeuf PER
Marjorie PER
Mathieu PER
Coppélia PER
“L’ PER
016_12_bg.txt
madame Brinbeuf PER
Madame Brinbeuf PER
Brinbeuf PER
Mathieu PER
L’ PER
j’ PER
madame Brinbeuf PER
Mathieu PER
Mathieu PER
Madame Brinbeuf PER
Justine PER
”
“ PER
Mathieu PER
Madame Brinbeuf PER
Mathieu PER
Brinbeuf” PER
Justine PER
madame Franck PER
Mathieu PER
madame Brinbeuf PER
Brinbeuf m’ PER
madame Franck PER
Mathieu PER
Madame Franck PER
Madame Brinbeuf PER
madame Franck PER
madame Franck PER
Madame Franck s’ PER
madame Franck PER
Justine Victoire PER
E V PER
E.
Mathieu PER
madame Franck PER
Madame Franck PER
“Justine PER
Mathieu Brinbeuf PER
“J’ PER
madame Franck PER
Coppélia PER
madame Brinbeuf m’ PER
Coppélia PER
Voulez PER
Mathieu PER
Madame Franck PER
Coppélia PER
Mathieu PER
madame Franck PER
“ PER
Madame Franck PER
madame Franck PER
madame Brinbeuf PER
Mathieu PER
Mathieu PER
Mathieu m’ PER
Mathieu PER
016_13_bg.txt
Mercredi
Bonjour PER
Justine PER
Gringalet PER
Roseline PER
Gringalet PER
Justine PER
Roseline PER
Charlotte Cuvelier PER
Charlotte PER
Valérie PER
Sophie Lambert PER
Gringalet PER
Roseline PER
Roseline PER
Roseline PER
“J’ PER
Gringalet PER
“Viens PER
” “Ouais PER
’’ PER
Valérie PER
Gringalet PER
Roseline PER
Viens PER
Gringalet” PER
Roseline PER
Valérie PER
Roseline PER
D’ PER
Mathieu PER
Ma mère PER
jean PER
Valérie PER
Roseline PER
Charlotte PER
Charlotte PER
Roseline PER
Charlotte PER
Sophie Lambert PER
Sophie PER
Roseline PER
Sophie PER
Charlotte PER
Charlotte PER
Roseline PER
Charlotte PER
Charlotte PER
Charlotte PER
Charlotte PER
Charlotte PER
Charlotte PER
Charlotte PER
Justine PER
Roseline PER
Charlotte PER
Justine PER
Justine PER
” “Quel grand spectacle PER
Justine PER
Roseline PER
Charlotte PER
Gringalet PER
Valérie PER
Charlotte PER
Justine PER
Valérie PER
Charlotte PER
Roseline PER
Charlotte PER
j’ PER
madame Brinbeuf PER
Valérie Demoulin PER
016_14_bg.txt
Coppélia PER
madame Brinbeuf PER
madame Dillon PER
madame Franck PER
Mathieu PER
C’ PER
Roseline PER
Charlotte PER
j’avais PER
Valérie PER
Mathieu PER
j’avais PER
Julie PER
Marjorie PER
Bruno Lejeune PER
Brinbeuf PER
Mathieu PER
Brinbeuf PER
Agnès PER
J’ PER
Coppélia PER
Coppélius l’ PER
Swanilda PER
J’ PER
madame Dillon PER
madame Dillon PER
Justine PER
Brinbeuf PER
Agnès PER
d’Agnès PER
Agnès PER
madame Brinbeuf PER
madame Brinbeuf PER
Caroline Brinbeuf PER
Agnès PER
Agnès PER
Coppélia PER
Agnès PER
j’ PER
Mathieu PER
Swanilda PER
madame Dillon PER
madame Dillon PER
“C’ PER
Swanilda” PER
Catherine PER
J’ PER
Mathieu PER
Catherine PER
Mathieu PER
Adeline PER
Catherine PER
j’entendis Agnès PER
“L’ PER
Franz PER
Swanilda PER
madame Brinbeuf PER
Coppelius PER
Franz PER
Swanilda PER
Coppélia PER
Swanilda PER
Coppélius PER
Franz PER
Swanilda PER
Agnès PER
madame Brinbeuf PER
Swanilda PER
Swanilda PER
madame Brinbeuf
 PER
Agnès PER
J’ PER
Swanilda PER
Christophe Gélin PER
Franz PER
Mathieu PER
Mathieu PER
Mathieu PER
Adeline PER
Catherine PER
Catherine PER
Catherine PER
j’avais PER
Adeline PER
Christophe PER
Catherine PER
Mathieu PER
Adeline PER
016_15_bg.txt
Catherine PER
Adeline PER
Adeline PER
Catherine PER
Justine PER
Catherine PER
Catherine PER
“Dis PER
Justine PER
Roseline PER
Agnès PER
Valérie PER
Julie PER
Carole PER
Mélanie PER
Mathieu PER
Marjorie PER
Marjorie PER
Justine PER
”
Je devais rêver PER
Johanna PER
Johanna PER
Marjorie PER
Marjorie PER
Marjorie PER
Johanna PER
Johanna” PER
Catherine PER
Adeline l’ PER
Johanna PER
Mathieu PER
madame Brinbeuf PER
d’Agnès PER
Catherine PER
Johanna PER
Marjorie PER
Es PER
Justine PER
Marjorie PER
Marjorie PER
Johanna PER
Johanna PER
Johanna PER
Johanna PER
Justine PER
J’ PER
Johanna PER
Marjorie PER
Mathieu PER
Marjorie PER
Johanna PER
j’ PER
”
“ PER
Catherine PER
Adeline PER
Johanna PER
Adeline PER
Mathieu PER
Adeline PER
Catherine PER
Catherine m’ PER
Johanna PER
Justine PER
madame Dillon PER
Swanilda PER
J’ PER
Swanilda PER
Franz PER
Catherine PER
Swanilda PER
”
“ PER
Catherine PER
sainte nitouche PER
”
Catherine PER
Adeline PER
Catherine PER
Mathieu PER
“Justine PER
Roseline PER
Johanna PER
Marjorie PER
Valérie PER
Julie PER
madame Brinbeuf PER
Mathieu PER
Agnès PER
Brinbeuf PER
Justine PER
Julie PER
Mathieu PER
Agnès PER
Mathieu PER
Mathieu PER
Agnès PER
Mathieu PER
Mathieu PER

And it worked… mostly? It was super weird that J’ai kept getting listed, but I wasn’t too worried. A quirk of the model, plus the source text! Probably the model wasn’t trained on first-person narratives like The Baby-Sitters Club. Yeah, there was also an example of C’ that was harder to explain, but it wasn’t until I saw an example of a double-curly-quote character (“) identified as an entity that I started getting suspicious. Could those be messing things up somehow?

6. Data cleaning

Time for some data cleaning! When Lee brought the ABBYY FineReader output .txt files into Word to correct my bad transcriptions, Word “helpfully” replaced all the regular, straight single and double quotes with their curly equivalents.

I wrote some code that opened every text file in my folder, searched for opening and closing curly quotes and replaced them with the “straight quote” character (a quotation mark that doesn’t differentiate opening and closing quotes). While I was at it, I saw that some of the texts weren’t using the straight single quote for the apostrophe, so I put that in there, too. This code overwrites the text files in the folder (rather than creating a new version) so if you want to keep your originals, make sure you have a copy elsewhere.

# Look for files in the source directory that end in .txt
for filename in os.listdir(filedirectory):
    if filename.endswith(".txt"):
        
        #Open each file that ends in .txt
        f = open(filename, 'r')
        #Read the text
        text = f.read()
        #Replace curly double-quote with straight double-quote
        lines = text.replace("“", '"')
        lines = lines.replace('”', '"')
        #Replace curly singl-quote with straight single-quote
        lines = lines.replace('’', "'")

        #Write output to a new file with the same name as the original, overwriting the original file.
        with open(filename, 'w') as out:
            out.writelines(lines)

7. French NER, second try

I didn’t make any changes to the code from step 5, but check out the difference in the results. Gone are those quotation marks as so-called entities – along with all the examples of j’ai, c’, etc. All of those were showing up because they were using the curly single quote character, and that was messing up spaCy’s model.

#Sort all the files in the directory you specified above, alphabetically.
#For each of those files...
for filename in sorted(os.listdir(filedirectory)):
    #If the filename ends with .txt (i.e. if it's actually a text files)
    if filename.endswith('.txt'):
        #Write out below the name of the file
        print(filename)
        #The file name of the output file adds _ner_per to the end of the file name of the input file
        outfilename = filename.replace('.txt', '_ner_per.txt')
        #Open the infput filename
        with open(filename, 'r') as f:
            #Create and open the output filename
            with open(outfilename, 'w') as out:
                #Read the contents of the input file
                chaptertext = f.read()
                #Do French NLP on the contents of the input file
                chapterner = frnlp(chaptertext)
                #For each recognized entity
                for ent in chapterner.ents:
                    #If that entity is labeled as a person
                    if ent.label_ == 'PER':
                        #Print the entity, and the label (which should be PER)
                        print(ent.text, ent.label_)
                        #Write the entity to the output file
                        out.write(ent.text)
                        #Write a newline character to the output file
                        out.write('\n')
016_01_bg.txt
Roseline PER
Victor surnommé Gringalet PER
Justine Victoire PER
Gringalet PER
Johanna PER
Marjorie Levêque PER
Marjorie PER
Ouagadougou PER
Roseline PER
Justine PER
Coppélia PER
Coppélia PER
Roseline PER
Coppélius PER
Coppélia PER
Franz PER
Roseline PER
Franz PER
Swanilda PER
Franz PER
Swanilda PER
Coppélia PER
Coppé PER
Swanilda PER
Franz PER
Roseline PER
madame Dillon PER
Roseline PER
Roseline PER
016_01_frmod.txt
Stonebrook PER
Rebecca PER
Becca PER
John Philip Ramsey Junior PER
P'tit Bout PER
Jessica Ramsey PER
Rebecca PER
Keisha PER
Mallory Pike PER
Mallory PER
Oakley PER
Bonjour PER
Ooh bla PER
Stamford PER
Coppélia PER
Coppélius PER
Coppélia PER
Franz PER
Rebecca PER
Franz PER
Swanilda PER
Franz PER
Swanilda PER
Coppélia PER
Coppélius PER
Swanilda PER
Franz PER
Mme Noelle PER
Maman PER
Rebecca PER
016_01_qu.txt
Becca PER
Rebecca PER
Jean PER
Philippe PER
Jessica Raymond PER
Becca PER
Jaja PER
Kara PER
Oakley PER
Marjorie Picard PER
Marjorie PER
Becca PER
Becca PER
Jaja PER
Coppélia PER
Coppélia PER
Coppélia PER
Franz PER
Franz PER
Swanilda PER
Coppélia PER
Dr Coppélius PER
Swanilda PER
Franz PER
Becca PER
Becca PER
Becca PER
016_02_bg.txt
Marjorie Levêque PER
Marjorie PER
Valérie PER
Marjorie PER
Valérie PER
Sébastien PER
Madame Demoulin PER
Valérie PER
Mélanie Moreau PER
Julie Kishi PER
Sophie Lambert PER
Sophie PER
Carole Leroy PER
Valérie PER
Valérie PER
Valérie PER
Valérie PER
Valérie PER
Stéphane PER
Nicolas PER
Sébastien PER
Valérie PER
Yvan Arnould PER
Arnaud PER
Valérie PER
Valérie PER
Julie Kishi PER
Julie PER
Yvan PER
Demoulin PER
Yvan PER
Yvan PER
Valérie PER
Valérie PER
Julie PER
Julie Kishi PER
Julie PER
Marjorie PER
Julie PER
Marjorie PER
Julie PER
Julie PER
Julie PER
Laurence PER
Julie PER
Mimi PER
Carole PER
Valérie PER
Julie PER
Valérie PER
Moreau PER
Sophie Lambert PER
Sophie PER
Carole PER
Carole PER
Valérie PER
Carole PER
Marjorie PER
mange Julie PER
David PER
David PER
Marjorie PER
Marjorie PER
Marjorie PER
Marjorie PER
Bruno Lejeune PER
Cécile Gauthier PER
Valérie PER
Carole PER
Marjorie PER
Valérie PER
Valérie PER
Julie PER
Carole PER
Julie PER
Marjorie PER
madame Brinbeuf PER
Agnès PER
Mathieu PER
Brinbeuf PER
madame Brinbeuf PER
Mathieu PER
Carole PER
Mélanie PER
Julie PER
Valérie PER
Marjorie PER
Marjorie PER
Marjorie PER
Justine PER
016_02_frmod.txt
– Salut ! PER
Sitters PER
Mallory Pike PER
Mallory PER
Kristy PER
Kristy PER
Kristy PER
Mallory PER
Mallory PER
Kristy PER
David Michael PER
Mme Parker PER
Mary Anne Cook PER
Claudia Koshi PER
Lucy MacDouglas PER
Mallory PER
Carla Schafer PER
Mary Anne PER
Carla PER
Mallory PER
Kristy PER
Kristy PER
Kristy PER
Kristy PER
Samuel PER
Charlie PER
David Michael PER
Kristy PER
Jim Lelland PER
Karen PER
Andrew PER
Kristy PER
Claudia Koshi PER
Mary Anne PER
Jim PER
Parker PER
Jim PER
Jim PER
Kristy PER
Samuel PER
Mary Anne PER
Claudia Koshi PER
Mallory PER
Kristy PER
Kristy PER
Mallory PER
Mallory PER
Mary Anne PER
Mallory PER
Claudia PER
Mimi PER
Carla PER
Mary Anne PER
Kristy PER
Kristy PER
Mary Anne PER
Mary Anne PER
Kristy PER
Mary Anne PER
Tigrou PER
M. Cook PER
Lucy MacDouglas PER
Ramsey PER
Carla Schafer PER
Kristy PER
Mary Anne PER
Mallory PER
Californienne PER
David PER
David PER
Mallory PER
Mallory PER
Mallory PER
Mary Anne ! PER
Louisa Kilbourne PER
Kristy PER
Carla PER
Mallory PER
Kristy PER
Bonjour PER
Mme Braddock PER
Helen PER
Matthew PER
Mme Braddock PER
Matthew PER
Carla PER
Mary PER
Kristy PER
Mallory PER
Mallory PER
Mallory PER
Jessica PER
016_02_qu.txt
Bonjour PER
Christine Thomas PER
Marjorie Picard PER
Christine Thomas PER
Marjorie PER
David PER
Sophie Ménard PER
Anne PER
Marie Lapierre PER
Claudia Kishi PER
Sophie PER
Marjorie PER
Diane Dubreuil PER
Anne PER
Marie PER
Diane PER
Marjorie PER
Christine PER
Charles PER
Sébastien PER
David PER
Guillaume PER
Karen PER
Anne PER
Marie PER
Guillaume PER
Guillaume PER
Charles PER
Anne PER
Marie PER
Claudia Kishi PER
Marjo PER
Marjorie PER
Claudia PER
Anne PER
Marie PER
Josée PER
Mimi PER
Marie Lapierre PER
Diane PER
Christine PER
Anne PER
Marie PER
Anne PER
Marie PER
Anne PER
Marie PER
Tigrou PER
Monsieur Lapierre PER
Sophie Ménard PER
Diane PER
Diane PER
Anne PER
Marie PER
Marjorie PER
Diane PER
Julien PER
Julien PER
Diane PER
Marjorie PER
Marjorie PER
Marjorie PER
Louis Brunet PER
Anne PER
Marie PER
Chantal Chrétien PER
Diane PER
Marjorie PER
Anne PER
Marie PER
Diane PER
Marjorie PER
Biron PER
Hélène PER
madame Biron PER
Madame Biron PER
Madame Biron PER
Diane PER
Anne PER
Marie PER
Marjorie PER
Biron PER
Marjorie PER
016_03_bg.txt
Jacqué PER
madame Dillon PER
madame Dillon PER
Coppélia PER
Madame Dillon PER
Hélène PER
Catherine PER
Hélène PER
Catherine PER
Victor PER
Mademoiselle Victor PER
madame Dillon PER
Hélène PER
Catherine PER
Madame Dillon PER
Hélène PER
Catherine PER
Madame Dillon PER
Madame Dillon PER
Hélène PER
Catherine PER
madame Dillon PER
Madame Dillon PER
Marie Bernstein PER
Lise Jacqué PER
Carine Schmitt PER
Catherine PER
Coppélia PER
Madame Dillon PER
Hélène PER
Catherine PER
Coppélia PER
Coppélia PER
madame Dillon PER
Swanilda PER
Swanilda PER
Justine Victor PER
madame Dillon PER
Justine Victor PER
Justine Victoire PER
Justine PER
Justine PER
madame Dillon PER
Swanilda PER
Franz PER
Justine PER
Marie Bernstein PER
Lise Jacqué PER
Justine PER
Marie PER
Lise PER
Hélène PER
Catherine PER
Coppélia PER
Catherine PER
Catherine PER
Coppélia PER
Hélène PER
Catherine PER
madame Dillon PER
Catherine PER
Ouais PER
madame Dillon PER
016_03_frmod.txt
Mme Noelle PER
Mme Noelle PER
Coppélia PER
Mme Noelle PER
Hilary PER
mademoiselle Romsey PER
Ramsey PER
Mme Noelle PER
Hilary PER
Katie PER
Mme Noelle PER
Hilary PER
Mme Noelle PER
Mme Noelle PER
Hilary PER
Katie PER
Mme Noelle PER
Mary Bramstedt PER
Lisa Jones PER
Carrie Steinfeld PER
Hilary PER
Katie PER
Coppélia PER
Mme Noelle PER
Hilary PER
Katie PER
Coppélia PER
Coppélia PER
Mme Noelle PER
Swanilda PER
Swanilda PER
Mlle Jessica Romsey PER
Jessica Romsey PER
Jessica Ramsey PER
Swanilda PER
Jessica PER
Mme Noelle PER
Swanilda PER
Franz PER
Jessica PER
Mary Bramstedt PER
Lisa Jones PER
Jessica PER
Mary PER
Lisa PER
Imaginez PER
Hilary PER
Coppélia PER
Katie PER
Coppélia PER
Katie PER
Hilary PER
Hilary PER
Mme Noelle PER
Ouais PER
Hilary PER
Mme Noelle PER
Coppélia PER
016_03_qu.txt
mademoiselle Marcil PER
Mademoiselle Noëlle PER
Becca PER
Mademoiselle PER
Coppélia PER
Mademoiselle Noëlle PER
Catherine PER
Élizabeth PER
Raymond PER
Mademoiselle Raymond PER
Catherine PER
Élizabeth PER
Catherine PER
Élizabeth PER
Coppélia PER
Catherine PER
Élizabeth PER
Marie Brazeau PER
Lise Jordan PER
Carole PER
Catherine PER
Élizabeth PER
Coppélia PER
Catherine PER
Elizabeth PER
Swanilda PER
Swanilda PER
Jessica Raymond PER
Jessica Raymond PER
Swanilda PER
Jessica PER
Franz PER
Marie PER
Lise PER
Jessie PER
Marie PER
Lise PER
Imaginez PER
Coppélia PER
Élizabeth PER
Élizabeth PER
Coppélia PER
Catherine PER
Élizabeth PER
Catherine PER
016_04_bg.txt
madame Dillon PER
madame Dillon PER
Hélène PER
Catherine PER
Mathieu Brinbeuf PER
Mathieu PER
Agnès PER
Bonjour PER
Justine PER
Agnès PER
madame Brinbeuf PER
Agnès PER
madame Brinbeuf PER
Mathieu PER
Mathieu PER
Agnès PER
Agnès PER
Agnès PER
Mathieu PER
Brinbeuf PER
Agnès PER
Justine PER
Agnès PER
madame Brinbeuf PER
Mathieu PER
Mathieu PER
Mathieu PER
Mathieu grandira PER
madame Brinbeuf PER
Mathieu PER
Roseline PER
madame Brinbeuf PER
Mathieu PER
Mathieu PER
madame Brinbeuf PER
Mathieu PER
Agnès PER
Madame Brinbeuf PER
Mathieu PER
Mathieu PER
Madame Brinbeuf PER
Mathieu PER
Mathieu PER
Justine PER
Mathieu PER
Madame Brinbeuf PER
Mathieu PER
Mathieu PER
Madame Brinbeuf PER
Agnès PER
Mathieu PER
Agnès PER
Mathieu PER
Brinbeuf PER
Mathieu PER
Agnès PER
Mathieu PER
Agnès PER
Agnès PER
Mathieu PER
Agnès PER
madame Brinbeuf PER
Madame Brinbeuf PER
016_04_frmod.txt
Coppélia PER
Stamford PER
Mme Noelle PER
Crois PER
Ben PER
Hilary PER
Katie PER
Matthew Braddock PER
Matthew PER
Bonjour PER
Mme Braddock PER
Matthew PER
Matthew PER
Helen PER
Mme Braddock PER
Helen PER
Helen PER
Matthew PER
Mme Braddock PER
Jessica PER
Helen PER
Matthew PER
Pourquoi PER
Parce PER
Matthew PER
Matthew grandira PER
Matthew PER
Rebecca PER
Mme Braddock PER
Matthew PER
Matthew PER
Mme Braddock PER
Matthew PER
Helen PER
Mme Braddock PER
Matthew PER
Existe PER
Matthew PER
Mme Braddock PER
Matthew PER
Jessica PER
Matthew PER
Matthew PER
Helen PER
Demande PER
Matthew PER
Matthew PER
Helen PER
Matthew PER
Helen PER
Mme Braddock PER
Mme Braddock PER
016_04_qu.txt
Catherine PER
Élizabeth PER
madame Biron PER
Biron PER
Biron PER
Marjorie PER
Gougeon PER
madame Biron PER
Biron PER
Matthieu PER
Hélène PER
Hélène PER
Biron PER
madame Biron PER
Biron PER
Madame Biron PER
Becca PER
Biron PER
Biron PER
Biron PER
Madame Biron PER
Biron PER
Hélène PER
Madame Biron PER
Hélène PER
madame Biron PER
Madame Biron PER
madame Biron PER
Madame Biron PER
016_05_bg.txt
Justine PER
Mélanie Moreau PER
Aurélie Precisio PER
Madame Precisio PER
Roseline PER
Madame Precisio PER
Mathieu PER
Agnès PER
Brinbeuf PER
Brinbeuf PER
Brinbeuf PER
Mathieu PER
Agnès PER
Agnès PER
Mathieu PER
Mathieu PER
Mathieu PER
Agnès PER
Mathieu PER
Agnès PER
Mathieu PER
Agnès PER
Agnès PER
Justine PER
Agnès PER
Mathieu PER
Mathieu PER
Agnès PER
Agnès PER
Mathieu PER
Brinbeuf PER
Agnès PER
Mathieu PER
016_05_frmod.txt
Jenny PER
Jenny PER
Mary Anne Cook PER
Jenny Prezzioso PER
Mary Anne PER
Jenny PER
Mary Anne PER
Mme Prezzioso PER
Jenny PER
Jenny PER
Jenny PER
Mme Prezzioso PER
Mary Anne PER
Jenny PER
Mary Anne PER
Désolée PER
Mary Anne PER
Jenny PER
Jenny PER
Mary Anne PER
Jenny PER
Bambi PER
Jenny PER
Mary Anne PER
Mary Anne PER
Jenny PER
Jenny PER
Jenny PER
Mary Anne PER
Matthew PER
Helen PER
Mme Braddock PER
Matthew PER
Helen PER
Mary Anne PER
Helen PER
Matthew PER
Jenny PER
Matthew PER
Jenny PER
Matthew PER
Jenny PER
Jenny PER
Matthew PER
Jenny PER
Mary Anne PER
Helen PER
Helen PER
Mary Anne PER
Helen PER
Matthew PER
Matthew PER
Helen PER
Matthew PER
Matthew PER
016_05_qu.txt
Jeanne PER
Jessie PER
Biron PER
Jeanne PER
Anne PER
Marie PER
Jeanne Prieur PER
Anne PER
Marie PER
Jeanne PER
Anne PER
Marie PER
madame Prieur PER
Jeanne PER
Becca PER
Anne PER
Marie PER
Jeanne PER
Jeanne PER
madame Prieur PER
Anne PER
Marie PER
Jeanne PER
Jeanne PER
Jeanne PER
Anne PER
Marie PER
Anne PER
Marie PER
Jeanne PER
Anne PER
Marie PER
Jeanne PER
Caillou PER
Anne PER
Marie PER
Anne PER
Marie PER
Jeanne PER
Anne PER
Marie PER
Anne PER
Marie PER
Anne PER
Marie PER
Anne PER
Marie PER
ta maman PER
Marie PER
Jeanne PER
Anne PER
Marie PER
Marie PER
Jeanne PER
Anne PER
Marie PER
Hélène PER
Biron PER
madame Biron PER
Marie PER
Jeanne PER
Biron PER
Anne PER
Marie PER
Biron PER
Jeanne PER
Biron PER
Jeanne PER
Anne PER
Marie PER
Jeanne PER
Jeanne PER
Jeanne PER
Anne PER
Marie PER
Hélène PER
Anne PER
Marie PER
Jessie PER
Hélène PER
Biron PER
Hélène PER
016_06_bg.txt
Agnès PER
Mathieu PER
Coppélia PER
Mathieu PER
Agnès PER
Agnès PER
Mathieu PER
madame
Brinbeuf PER
madame Brinbeuf PER
Mathieu PER
Madame Brinbeuf PER
Brinbeuf PER
madame Brinbeuf PER
Mathieu PER
Agnès PER
Madame Brinbeuf PER
Mathieu PER
Mathieu PER
Agnès PER
Justine PER
Madame Brinbeuf PER
Agnès PER
Mathieu PER
madame Brinbeuf PER
Mathieu PER
Mathieu PER
madame Brinbeuf PER
Mathieu PER
Mathieu PER
Mathieu PER
le P. C' PER
Agnès PER
Aurélie Precisio PER
Agnès PER
Mathieu PER
Roseline PER
Charlotte Cuvelier PER
Roseline PER
Mathieu PER
Agnès PER
Agnès PER
Agnès PER
Mathieu PER
Agnès PER
Agnès PER
Vanessa PER
Mathieu PER
Agnès PER
Mathieu PER
Agnès PER
Mathieu PER
Dis PER
Mathieu PER
Mathieu PER
Marjorie PER
Antoine PER
Agnès PER
Mathieu PER
Marjorie PER
Levêque PER
Agnès PER
Marjorie PER
Marjorie PER
Mathieu PER
Anaïs PER
Marjorie PER
Laurent PER
Vanessa PER
Agnès PER
Dis PER
Juliette PER
Mathieu PER
Juliette PER
Agnès PER
Juliette PER
Agnès PER
Mathieu PER
Antoine Godefroid PER
Agnès PER
Mathieu PER
Antoine PER
Mathieu PER
Marjorie PER
madame Brinbeuf PER
Agnès PER
Mathieu PER
Mathieu PER
Agnès PER
016_06_frmod.txt
Helen PER
Matthew PER
Coppélia PER
Matthew PER
Helen PER
Helen PER
Matthew PER
Mme Braddock PER
Sois PER
Matthew PER
Mme Braddock PER
V PER
madame Braddock PER
Matthew PER
Helen PER
Mme Braddock PER
Matthew PER
Matthew PER
Helen PER
Jessica PER
Matthew PER
Mme Braddock PER
Matthew PER
Matthew PER
Matthew PER
le P. C' PER
Mary Anne PER
Jenny Prezzioso PER
Helen PER
Matthew PER
Charlotte Johanssen PER
Matthew PER
Helen PER
Matthew PER
Helen PER
Vanessa PER
Matthew PER
Helen PER
Matthew PER
Dis PER
Matthew PER
N PER
K PER
Y.

 PER
Matthew PER
Mallory PER
Barrett PER
Matthew PER
Helen PER
Mallory PER
Mallory PER
Vous savez PER
Matthew PER
Margot PER
Imaginez PER
Mallory PER
– Waouh PER
Matthew PER
Claire PER
Helen PER
Matthew PER
Buddy Barrett PER
Matthew PER
Buddy PER
Mme Braddock PER
Helen PER
Matthew PER
Matthew PER
Helen PER
016_06_qu.txt
Hélène PER
Coppélia PER
Matthieu PER
madame Biron PER
Madame Biron PER
madame Biron PER
Regardez PER
J PER
V PER
Biron PER
Hélène PER
Biron PER
Matthieu PER
Hélène PER
Biron PER
madame Biron PER
madame Biron PER
Jeanne Prieur PER
Becca PER
Charlotte PER
Becca PER
Hélène PER
Vanessa PER
Hélène PER
Matthieu PER
N PER
I PER
Marjorie PER
Margot PER
Vanessa PER
Claire PER
Hélène PER
Hélène PER
Bruno Barrette PER
Bruno PER
Matthieu PER
Marjorie PER
madame Biron PER
Hélène PER
016_07_bg.txt
Justine PER
Marjorie PER
Justine PER
Monsieur PER
madame Levêque PER
Marjorie PER
Carole PER
Alain PER
Loïc PER
Samuel PER
Levêque PER
Levêque PER
Levêque PER
Marjorie PER
Vanessa PER
Anaïs PER
Juliette PER
Marjorie PER
Carole PER
Alain PER
Alain PER
Loïc PER
Juliette PER
Alain PER
Alain PER
Marjorie PER
Loïc PER
Samuel PER
Carole PER
Marjorie PER
Marjorie PER
Alain PER
Carole PER
Marjorie PER
Carole PER
Laurent PER
Laurent PER
Marjorie PER
Marjorie PER
Carole PER
Anaïs PER
Carole PER
Marjorie PER
Anaïs PER
Vanessa PER
Anaïs PER
Anaïs PER
Anaïs PER
Carole PER
Laurent PER
Carole PER
Marjorie PER
Laurent PER
Carole PER
Marjorie PER
Carole PER
Carole PER
Vanessa PER
Laurent PER
Samuel PER
Juliette PER
Carole PER
Marjorie PER
Anaïs PER
Vanessa PER
Loïc PER
Marjorie PER
Carole PER
Marjorie PER
Carole PER
Marjorie PER
Agnès PER
Mathieu PER
Marjorie PER
Mathieu PER
Carole PER
Marjorie PER
Swanilda PER
Mathieu PER
Agnès PER
Agnès PER
016_07_frmod.txt
Mallory PER
Pike hier PER
Carla PER
Mallory PER
Mme Pike PER
Mallory PER
Carla PER
Mallory PER
Byron PER
Adam PER
Jordan PER
Vanessa PER
Claire PER
M. PER
Mme Pike PER
Mme Pike PER
Mallory PER
Vanessa PER
Claire PER
Mallory PER
Adam PER
Jordan PER
Claire PER
Adam PER
Mallory PER
Eh PER
Byron PER
Jordan PER
Nicky PER
Carla PER
Mallory PER
Adam PER
Carla PER
Mallory PER
Carla PER
Nicky PER
Mallory PER
Mallory PER
Margot PER
Carla PER
Byron PER
Margot PER
Vanessa PER
Margot PER
Margot PER
Moui PER
Mallory PER
Mallory PER
Carla PER
Vanessa PER
Nicky PER
Margot PER
Jordan PER
Claire PER
Vanessa PER
Jordan PER
Carla PER
Mallory PER
Mallory PER
Helen PER
Matthew PER
Matthew PER
Carla PER
Mallory PER
Swanilda PER
Helen PER
Helen PER
016_07_qu.txt
Marjorie PER
Jessie PER
Diane PER
madame Picard PER
Diane PER
Marjorie PER
Joël PER
Claire PER
madame Picard PER
madame Picard PER
Marjorie PER
Diane PER
Marjorie PER
Diane PER
Picard PER
Diane PER
Antoine PER
Joël PER
Claire PER
Antoine PER
Marjorie PER
Antoine PER
Diane PER
Antoine PER
Marjorie PER
Antoine PER
Diane PER
Marjorie PER
Diane PER
crie Margot PER
Diane PER
hurle Margot PER
Vanessa PER
Margot PER
Margot PER
Diane PER
Marjorie PER
Diane PER
Diane PER
Diane PER
Diane PER
Vanessa PER
Claire PER
Diane PER
Marjorie PER
Marjorie PER
Vanessa PER
Joël PER
Diane PER
Marjorie PER
Diane PER
Diane PER
Hélène PER
Marjorie PER
Diane PER
Matthieu PER
Swanilda PER
Us PER
Hélène PER
Hélène PER
016_08_bg.txt
Swanilda PER
madame Dillon PER
Mademoiselle Perron PER
OK PER
Mademoiselle Bersnstein PER
Victor PER
Swanilda PER
Prochaine PER
madame Dillon PER
Catherine PER
Hélène PER
Catherine PER
Ouais PER
Catherine PER
Catherine PER
Catherine PER
Agnès PER
Catherine PER
Roseline PER
Catherine PER
Adeline PER
Bonjour Adeline PER
Catherine PER
Adeline PER
Catherine PER
madame Dillon PER
Catherine PER
Adeline PER
Catherine PER
Catherine PER
Roseline PER
Catherine PER
Adeline PER
Adeline PER
Catherine PER
Adeline PER
Catherine PER
Catherine PER
Mathieu PER
Mathieu PER
Adeline PER
Brinbeuf PER
Mathieu PER
Adeline PER
Catherine PER
Catherine PER
Adeline PER
Catherine PER
Adeline PER
Catherine PER
Adeline PER
Catherine PER
Adeline PER
Adeline PER
Catherine PER
Johanna PER
Marjorie PER
Justine PER
Coraline PER
Arnaud PER
Sébastien PER
Valérie PER
Valérie PER
Stéphane PER
Nicolas PER
016_08_frmod.txt
Swanilda PER
Coppélia PER
Mme Noelle PER
Mademoiselle Parson PER
Katie PER
Mademoiselle Bramstedt PER
mademoiselle Romsey PER
Swanilda PER
Mme Noelle PER
Katie PER
Hilary PER
Ouais PER
Katie PER
Papa PER
Katie PER
Katie PER
Helen PER
Adèle PER
Bonjour PER
Adèle PER
Mme Noelle PER
Rebecca PER
Adèle PER
Matthew PER
Matthew PER
Adèle PER
Matthew PER
Adèle PER
Matthew PER
Adèle PER
Adèle PER
Adèle PER
Katie PER
Keisha PER
Mallory PER
016_08_qu.txt
Mademoiselle Croteau PER
Marie PER
Mademoiselle Raymond PER
Swanilda PER
Élizabeth PER
Catherine PER
Élizabeth PER
Élizabeth PER
Élizabeth PER
Élizabeth PER
Hélène PER
Élizabeth PER
Becca PER
Élizabeth PER
Adèle PER
Adèle PER
Adèle PER
Élizabeth PER
Becca PER
Adèle PER
Élizabeth PER
Adèle PER
Élizabeth PER
Adèle PER
Élizabeth PER
Biron PER
Élizabeth PER
Adèle PER
Élizabeth PER
Adèle PER
Hélène PER
Élizabeth PER
Adèle PER
Adèle PER
Élizabeth PER
Adèle PER
Adèle PER
Élizabeth PER
Élizabeth PER
Élizabeth PER
Kara PER
Marjorie PER
016_09_bg.txt
Justine PER
Coralie Arnaud PER
Sébastien PER
Valérie PER
Valérie PER
Stéphane PER
Nicolas PER
Coraline PER
Coraline PER
Mathieu PER
Julie PER
Julie PER
Valérie PER
Arnaud PER
Julie PER
Coralie PER
Julie PER
Valérie PER
Julie PER
Arnould PER
Julie PER
Coralie PER
Yvan PER
Valérie PER
Arnaud PER
Arnaud PER
madame Rensonnet PER
Vieille Sorcière PER
Benoît PER
Julie PER
Arnaud PER
Sébastien PER
Valérie PER
Stéphane PER
Valérie PER
Julie PER
Valérie PER
Stéphane PER
Julie PER
Valérie PER
Julie PER
Julie PER
Bonjour PER
Julie PER
Valérie PER
Arnaud PER
Monsieur Arnaud PER
Edith PER
Arnaud PER
madame Arnaud PER
Valérie soupira PER
Julie PER
Arnaud PER
Sébastien PER
Julie PER
Sébastien PER
Arnaud PER
Julie PER
Arnaud PER
Julie PER
Arnaud PER
Sébastien PER
Bonjour PER
Julie PER
Bonjour PER
Julie PER
Julie PER
Julie PER
Arnaud PER
Julie PER
Julie PER
Arnaud PER
Julie PER
Arnaud PER
Sébastien PER
Julie PER
Julie PER
Julie PER
Justine Victoire PER
Julie PER
Roseline PER
Julie PER
Julie PER
Marjorie PER
Raconter PER
Julie PER
Julie PER
Julie PER
Arnaud PER
Julie PER
Arnaud PER
Arnaud PER
Arnaud PER
Arnaud PER
Julie PER
Julie PER
Julie PER
Arnaud PER
Julie PER
Arnaud PER
Flocon PER
Julie PER
Mademoiselle Minet PER
Julie PER
Julie PER
Coralie PER
Doudou PER
Julie PER
Julie PER
Justine PER
Arnaud PER
Arnaud PER
Sébastien PER
016_09_frmod.txt
Karen PER
Andrew PER
David Michael PER
Kristy PER
Charlie PER
Karen PER
Matthew PER
Kristy PER
Karen PER
Andrew PER
Karen PER
Kristy PER
Karen PER
Jim PER
Kristy PER
Karen PER
Andrew PER
Mme Porter PER
Ben Lelland PER
Karen PER
Karen PER
Andrew PER
David Michael PER
Kristy PER
Charlie PER
Samuel PER
Kristy PER
Bonjour PER
Karen ! PER
Bonjour PER
Karen PER
Bonjour PER
Kristy PER
Mme Lelland PER
Jim PER
Edith PER
Karen PER
Mme Lelland PER
Kristy PER
Karen PER
David Michael PER
Karen PER
Karen PER
David Michael PER
Andrew PER
Mme Lelland PER
Andrew PER
David Michael PER
Andrew PER
Karen PER
Lego PER
Andrew PER
Karen PER
David Michael PER
Karen PER
Andrew PER
Karen PER
Karen PER
Andrew PER
David Michael PER
Karen PER
David Michael PER
Karen PER
Karen PER
Jessica Ramsey PER
Mallory PER
Sitters PER
Karen PER
Karen PER
Raconter PER
Karen PER
Karen PER
Beurk ! PER
Andrew PER
Karen PER
Andrew PER
Karen PER
Andrew PER
Écoute PER
Karen PER
Andrew PER
Andrew PER
Karen PER
Tickly PER
Karen PER
Karen PER
Karen PER
Karen PER
Andrew PER
Karen PER
Andrew PER
David Michael PER
016_09_qu.txt
Karen PER
Sébastien PER
Charles PER
Karen PER
Karen PER
Marchand PER
Karen PER
Karen PER
madame Portai PER
Karen PER
Karen PER
Karen PER
Karen PER
Karen PER
madame Marchand PER
Christine PER
Guillaume PER
Karen PER
Karen PER
David PER
Karen PER
Guillaume PER
Karen PER
Lego PER
Karen PER
David PER
Karen PER
David PER
Karen PER
Karen PER
Jessie Raymond PER
Marjorie PER
Karen PER
Karen PER
Mimer PER
Christine PER
Karen PER
Karen PER
Karen PER
Claudia PER
Karen PER
Karen PER
Karen PER
Karen PER
Karen PER
016_10_bg.txt
Mathieu PER
Agnès PER
madame Brinbeuf PER
Agnès PER
madame Brinbeuf PER
Agnès PER
Mathieu PER
Aurélie Precisio PER
Mathieu PER
Vanessa Levêque PER
Vanessa PER
Agnès PER
Laurent PER
Mathieu PER
Antoine Godefroid PER
Agnès PER
Mathieu PER
Levêque PER
Antoine Godefroid PER
Mathieu PER
Vanessa PER
Agnès PER
Laurent PER
Agnès PER
Agnès PER
Laurent PER
Mathieu PER
Mathieu PER
Loïc PER
Agnès PER
Mathieu PER
Agnès PER
Mathieu PER
Mathieu PER
Mathieu PER
Agnès PER
Gringalet PER
Mathieu PER
Agnès PER
Agnès PER
Mathieu PER
Mathieu PER
Mathieu PER
Laurent PER
Alain PER
Justine PER
Agnès PER
Roseline PER
Roseline PER
Roseline PER
Roseline PER
Mathieu PER
Mathieu PER
Roseline PER
Roseline PER
Roseline PER
Agnès PER
Mathieu PER
Mathieu PER
Agnès PER
Mathieu PER
Agnès PER
Mathieu PER
Agnès PER
Agnès PER
Adeline PER
madame Dillon PER
madame Dillon PER
Coppélia PER
Agnès PER
madame Dillon PER
016_10_frmod.txt
Matthew PER
Helen PER
Mme Braddock PER
Helen PER
Mme Braddock PER
Helen PER
Matthew PER
Jenny Prezzioso PER
Matthew PER
Matthew PER
Helen PER
Vanessa Pike PER
Vanessa PER
Helen PER
Nicky PER
Buddy Barrett PER
Helen PER
Matthew PER
Matthew PER
Nicky Pike PER
Buddy Barrett PER
Bonjour PER
Matthew PER
Vanessa PER
Jordan PER
Helen PER
Matthew PER
Helen PER
Matthew PER
Matthew PER
Matthew PER
Ouais PER
Helen PER
Matthew PER
Matthew PER
Nicky PER
Byron PER
Matthew PER
Nicky PER
Adam PER
Jessica PER
– Parfois PER
Helen PER
Rebecca PER
Stamford PER
Matthew PER
Helen PER
Matthew PER
Helen Keller PER
Matthew PER
Matthew PER
Helen PER
Adèle PER
Mme Noelle PER
Coppélia PER
Mme Noelle PER
016_10_qu.txt
Biron PER
Biron PER
Madame Biron PER
Hélène PER
Madame Biron PER
Jeanne Prieur PER
Vanessa PER
Vanessa PER
Bruno Barrette PER
Hélène PER
Bruno Barrette PER
Vanessa PER
Bruno PER
Nicolas mime PER
Joël PER
Hélène PER
Hélène PER
Jessie PER
Hélène PER
Becca PER
Jaja PER
Becca PER
Becca PER
Quelquefois PER
Becca PER
Becca PER
Jaja PER
Becca PER
Helen Keller PER
Hélène PER
Coppélia' PER
Hélène PER
016_11_bg.txt
madame Dillon PER
madame Brinbeuf PER
Julie PER
Mes PER
madame Dillon PER
Bonjour PER
Valérie PER
Marjorie PER
Julie PER
Valérie PER
Julie PER
Carole PER
Julie PER
Marjorie PER
Marjorie PER
Julie PER
Julie PER
Julie PER
Marjorie PER
Julie PER
Julie PER
Valérie PER
Carole PER
Valérie PER
Carole PER
Julie PER
Valérie PER
Valérie PER
Valérie PER
Justine PER
Mathieu PER
Mathieu PER
Agnès PER
Agnès PER
Mathieu PER
Julie PER
Valérie PER
Brinbeuf PER
J PER
Carole PER
Bruno PER
Bruno PER
Bruno PER
Justine PER
Brinbeuf PER
taquina Julie PER
Carole PER
Carole PER
Gringalet PER
Bruno PER
Carole PER
Savez PER
Valérie PER
Julie PER
Carole PER
Valérie PER
Julie PER
Marjorie PER
Marjorie PER
Bonjour PER
Brinbeuf PER
Marjorie PER
Mathieu PER
Valérie PER
Coppélia PER
016_11_frmod.txt
Mme Noelle PER
Mme Braddock PER
Mes PER
Mme Noelle PER
Bonjour PER
Kristy PER
Mallory PER
Désolée PER
Kristy PER
Mary Anne PER
Carla PER
Mallory PER
Kristy PER
Mallory PER
Kristy PER
Kristy PER
Samuel PER
Kristy PER
Kristy PER
Kristy PER
Kristy PER
Matthew PER
Matthew PER
Helen PER
Helen PER
Matthew PER
Kristy PER
J PER
Mary Anne PER
Mary Anne PER
Jessica PER
Mary Anne PER
Kristy PER
Tigrou PER
Mary Anne PER
Logan PER
Mary Anne PER
Savez PER
Kristy PER
Samuel PER
Ouais PER
Kristy PER
Mallory PER
Mallory PER
Mallory PER
Bonjour PER
madame Braddock PER
Matthew PER
Mary Anne PER
Mallory PER
016_11_qu.txt
Noëlle PER
madame Biron PER
Anne PER
Marie PER
Claudia PER
Diane PER
Marjorie PER
Diane PER
Christine
 PER
Charles PER
Charles PER
Diane PER
Christine PER
Diane PER
Marie PER
Hélène PER
Anne PER
Marie PER
Marjorie PER
Diane PER
Claudia PER
J PER
Anne PER
Marie PER
Louis!
 PER
Louis PER
Anne PER
Marie PER
Marie PER
Marie PER
Diane PER
Diane PER
Jaja PER
Anne PER
Marie PER
Avez PER
Diane PER
Louis PER
Anne PER
Marie PER
Savez PER
Charles PER
Diane PER
Marjorie PER
Bonjour PER
madame Biron PER
Marjorie PER
Becca PER
016_12_bg.txt
madame Brinbeuf PER
Prête PER
Madame Brinbeuf PER
Brinbeuf PER
Mathieu PER
Mathieu PER
madame Brinbeuf PER
Mathieu PER
Mathieu PER
Madame Brinbeuf PER
Justine PER
Mathieu PER
Madame Brinbeuf PER
Mathieu PER
Bonjour PER
Brinbeuf PER
Justine PER
madame Franck PER
Mathieu PER
madame Brinbeuf PER
madame Brinbeuf PER
madame Franck PER
Mathieu PER
Madame Franck PER
Madame Brinbeuf PER
madame Franck PER
madame Franck PER
Madame Franck PER
madame Franck PER
Justine Victoire PER
E V PER
E.
Mathieu PER
madame Franck PER
Madame Franck PER
Justine PER
Mathieu Brinbeuf PER
madame Franck PER
Coppélia PER
madame Brinbeuf PER
Coppélia PER
Voulez PER
Mathieu PER
Madame Franck PER
Coppélia PER
Mathieu PER
madame Franck PER
Madame Franck PER
madame Franck PER
madame Brinbeuf PER
Mathieu PER
Mathieu PER
Mathieu PER
Mathieu PER
016_12_frmod.txt
Mme Braddock PER
Prête PER
– Quel PER
Mme Braddock PER
Matthew PER
Mme Braddock PER
Matthew PER
Matthew PER
Matthew PER
Matthew PER
Mme Braddock PER
Matthew PER
Bonjour PER
madame Braddock PER
Mme Franck PER
Matthew PER
Mme Braddock PER
Mme Franck PER
Matthew PER
Matthew PER
Assieds PER
Jessica Ramsey PER
Matthew PER
Mme Franck PER
Matthew Braddock PER
Mme Franck PER
Coppélia PER
Mme Braddock PER
Coppélia PER
Voulez PER
Matthew PER
Mme Franck PER
Coppélia PER
Matthew PER
Mme Franck PER
Mme Franck PER
Mme Braddock PER
Matthew PER
Matthew PER
Jessica PER
Matthew PER
016_12_qu.txt
Ma mère PER
madame Biron PER
madame Biron PER
Madame Biron PER
Biron PER
Us PER
Biron PER
Madame Biron PER
madame Biron PER
Madame Biron PER
Madame Biron PER
Jessie Raymond PER
E R PER
Matthieu Biron PER
Jessie PER
Coppélia PER
Coppélia PER
Voulez PER
Coppélia PER
016_13_bg.txt
Mercredi
Bonjour PER
Justine PER
Gringalet PER
Roseline PER
Gringalet PER
Justine PER
Roseline PER
Charlotte Cuvelier PER
Charlotte PER
Valérie PER
Sophie Lambert PER
Gringalet PER
Roseline PER
Roseline PER
Roseline PER
Aga PER
Gringalet PER
Ouais PER
Roseline PER
Roseline PER
Gringalet PER
Roseline PER
Viens PER
Gringalet PER
Roseline PER
Roseline PER
Valérie PER
Roseline PER
Mathieu PER
Roseline PER
Roseline PER
Ma mère PER
Roseline PER
Charlotte PER
Roseline PER
Charlotte Cuvelier PER
Roseline PER
Charlotte PER
Roseline PER
Charlotte PER
Sophie Lambert PER
Sophie PER
Roseline PER
Sophie PER
Charlotte PER
Charlotte PER
Roseline PER
Charlotte PER
Charlotte PER
Charlotte PER
Charlotte PER
Bonjour PER
Charlotte PER
Bonjour Charlotte PER
Charlotte PER
Justine PER
Roseline PER
Charlotte PER
Justine PER
Justine PER
Coppernicus PER
Justine PER
Roseline PER
Charlotte PER
Gringalet PER
Valérie PER
Charlotte PER
Justine PER
Valérie PER
Charlotte PER
Roseline PER
Charlotte PER
madame Brinbeuf PER
Valérie Demoulin PER
016_13_frmod.txt
Jessica PER
Charlotte Johanssen PER
Rebecca PER
Charlotte PER
Kristy PER
Lucy MacDouglas PER
Stamford PER
Viens PER
Ouais PER
Kristy PER
Kristy PER
Kristy PER
Matthew PER
Ma mère PER
Dis PER
Charlotte PER
Charlotte Johanssen ! PER
Lucy MacDouglas PER
Charlotte PER
Charlotte PER
Charlotte PER
Charlotte PER
– Salut PER
Charlotte PER
Jessica PER
Charlotte PER
Jessica PER
Jessica PER
Coppernicus PER
Coppélia PER
Kristy PER
Charlotte PER
Kristy PER
Rebecca PER
Charlotte PER
Charlotte PER
Charlotte PER
Mme Braddock PER
Kristy Parker PER
016_13_qu.txt
Jaja! PER
Becca PER
Jessie PER
Biron PER
Becca PER
Charlotte Jasmin PER
Sophie PER
Jaja PER
Becca PER
Becca PER
Jaja PER
Viens PER
Becca PER
Becca PER
Becca PER
Becca PER
Becca PER
Ma mère PER
Becca PER
Charlotte Jasmin PER
Becca PER
Charlotte PER
Sophie PER
Becca PER
Becca PER
Sophie PER
Becca PER
Charlotte PER
Charlotte PER
Becca PER
Charlotte PER
Charlotte PER
Becca PER
Christinel PER
Bonjour PER
Charlotte PER
Jessie PER
Polanski PER
Jessie PER
Becca PER
Coppélia PER
Polanski PER
Charlotte PER
madame Biron PER
Christine Thomas PER
016_14_bg.txt
Coppélia PER
madame Brinbeuf PER
madame Dillon PER
madame Franck PER
Mathieu PER
Roseline PER
Charlotte PER
Marjorie PER
Valérie PER
Mathieu PER
Roseline PER
Julie PER
Marjorie PER
Bruno Lejeune PER
Brinbeuf PER
Mathieu PER
Brinbeuf PER
Agnès PER
Coppélia PER
Coppélius PER
Swanilda PER
Prête PER
madame Dillon PER
madame Dillon PER
Justine PER
Brinbeuf PER
Agnès PER
Agnès PER
Agnès PER
madame Brinbeuf PER
madame Brinbeuf PER
Madame Brinbeuf PER
Caroline Brinbeuf PER
Agnès PER
Agnès PER
Coppélia PER
Agnès PER
Agnès PER
Mathieu PER
Swanilda PER
madame Dillon PER
madame Dillon PER
Swanilda PER
Swanilda PER
Catherine PER
Adeline PER
Mathieu PER
Catherine PER
Mathieu PER
Maman PER
Adeline PER
Catherine PER
Agnès PER
Franz PER
Swanilda PER
madame Brinbeuf PER
Coppelius PER
Franz PER
Swanilda PER
Coppélia PER
Swanilda PER
Coppélius PER
Franz PER
Swanilda PER
Agnès PER
madame Brinbeuf PER
Swanilda PER
Swanilda PER
madame Brinbeuf
 PER
Agnès PER
Swanilda PER
Christophe Gélin PER
Franz PER
Mathieu PER
Mathieu PER
Christophe PER
Mathieu PER
Adeline PER
Catherine PER
Catherine PER
Catherine PER
Adeline PER
Christophe PER
Catherine PER
Mathieu PER
Adeline PER
016_14_frmod.txt
Mme Braddock PER
Mme Noelle PER
Mme Franck PER
Matthew PER
Charlotte PER
Mallory PER
Kristy PER
Matthew PER
Mary Anne PER
Kristy PER
Carla PER
Mary Anne PER
Claudia PER
Mallory PER
Mary Anne PER
M. Braddock PER
Matthew PER
Mme Braddock PER
Helen PER
Coppélia PER
Coppélius PER
Swanilda PER
Prête PER
Mme Noelle PER
Mme Braddock PER
Helen PER
Helen PER
Helen PER
– Bonsoir PER
Mme Braddock PER
Stamford PER
Mme Braddock PER
Voici Carolyn Braddock PER
Helen PER
Helen PER
Coppélia PER
Helen PER
Matthew PER
Swanilda PER
Mme Noelle PER
Swanilda PER
Swanilda PER
Adèle PER
Matthew PER
Matthew PER
Helen PER
Franz PER
Swanilda PER
Mme Braddock PER
Coppélius PER
Franz PER
Swanilda PER
Coppélia PER
Swanilda PER
Coppélius PER
Franz PER
Swanilda PER
Helen PER
Swanilda PER
Swanilda PER
Mme Braddock PER
Helen PER
Swanilda PER
Christopher Gerber PER
Franz PER
Matthew PER
Matthew PER
Matthew PER
Christopher PER
Matthew PER
Adèle PER
Christopher PER
Matthew PER
Adèle PER
016_14_qu.txt
Biron PER
Becca PER
Charlotte PER
Louis Brunet PER
Jaja PER
Biron PER
Matthieu PER
Biron PER
Hélène PER
Coppélia PER
Swanilda PER
madame Biron PER
Hélène PER
Hélène PER
Hélène PER
madame Biron PER
Madame Biron PER
Caroline Biron PER
Hélène PER
Hélène PER
Swanilda PER
Swanilda PER
Swanilda PER
Adèle PER
Élizabeth PER
Élizabeth PER
Franz PER
Swanilda PER
Coppélia PER
Swanilda PER
Franz PER
Swanilda PER
Hélène PER
Hélène PER
madame Biron PER
Christophe Baril PER
Franz PER
Matthieu PER
Élizabeth PER
Adèle PER
Christophe PER
016_15_bg.txt
Catherine PER
Adeline PER
Adeline PER
Adeline PER
Catherine PER
Adeline PER
Justine PER
Catherine PER
Catherine PER
Dis PER
Justine PER
Justine PER
Roseline PER
Brinbeuf PER
Agnès PER
Valérie PER
Julie PER
Carole PER
Mélanie PER
Mathieu PER
Justine PER
Marjorie PER
Marjorie PER
Bonjour PER
Justine PER
Avais PER
Johanna PER
Johanna PER
Marjorie PER
Marjorie PER
Marjorie PER
Johanna PER
Johanna PER
Johanna PER
Catherine PER
Adeline PER
Johanna PER
Mathieu PER
madame Brinbeuf PER
Agnès PER
Catherine PER
Johanna PER
Marjorie PER
Justine PER
Marjorie PER
Marjorie PER
Johanna PER
Johanna PER
Marjorie PER
Johanna PER
Johanna PER
Justine PER
Johanna PER
Marjorie PER
Mathieu PER
Marjorie PER
Johanna PER
Catherine PER
Adeline PER
Johanna PER
Adeline PER
Mathieu PER
Adeline PER
Catherine PER
Catherine PER
Johanna PER
Justine PER
madame Dillon PER
Swanilda PER
Swanilda PER
Franz PER
Catherine PER
Swanilda PER
Catherine PER
sainte nitouche PER
ma mère PER
Adeline PER
Catherine PER
Mathieu PER
Justine PER
Roseline PER
Johanna PER
Marjorie PER
Valérie PER
Julie PER
madame Brinbeuf PER
Mathieu PER
Agnès PER
Brinbeuf PER
Agnès PER
Justine PER
Julie PER
Julie PER
Mathieu PER
Agnès PER
Mathieu PER
Mathieu PER
Agnès PER
Mathieu PER
Mathieu PER
016_15_frmod.txt
Adèle PER
Adèle PER
Katie PER
Jessica ! PER
Mary Anne PER
Matthew PER
Jessica PER
Mallory PER
Bonjour PER
Jessica PER
Keisha PER
Keisha PER
Mallory PER
Keisha PER
Mallory PER
Keisha PER
As PER
Keisha PER
Keisha PER
Matthew PER
Mme Braddock PER
Helen PER
Keisha PER
Mallory PER
Jessica PER
Mallory PER
Pardon PER
Mallory PER
Keisha PER
Keisha PER
Mallory PER
Keisha PER
Mallory PER
Mallory PER
Keisha PER
Keisha PER
Mallory PER
Mallory PER
Matthew PER
Mallory PER
Keisha PER
Keisha PER
Adèle PER
Matthew PER
Keisha PER
Mme Noelle PER
Swanilda PER
Swanilda PER
Franz PER
Swanilda PER
Pourquoi PER
Katie PER
Adèle PER
Katie PER
Matthew PER
Keisha PER
Mallory PER
Kristy PER
Carla PER
Mary Anne PER
Claudia PER
M. PER
Mme Braddock PER
Matthew PER
Helen PER
Charley PER
Helen PER
Mmm PER
Kristy PER
Claudia PER
Mary Anne PER
Matthew PER
Helen PER
Matthew PER
Helen PER
Matthew PER
016_15_qu.txt
Élizabeth PER
Adèle PER
Jessie PER
Élizabeth PER
Bonjour PER
Marjorie PER
Kara PER
Marjorie PER
Kara PER
Élizabeth PER
Kara PER
Marjorie PER
Marjorie PER
Marjorie PER
Kara PER
Marjorie PER
Marjorie PER
Marjorie PER
Kara PER
Marjorie PER
Marjorie PER
Élizabeth PER
Adèle PER
Jessie PER
Élizabeth PER
Swanilda PER
Swanilda PER
Franz PER
Swanilda PER
Fleur PER
Adèle PER
Gaston PER
Anne PER
Marie PER
Claudia PER

It’s not perfect: chapter 1 of the Quebec translation flags “mange Julie” (Julie eats) as a person instead of a name and noun. But it’s a lot better.

8. French NER for places

I moved all the _ner_per.txt files into their own folder, so that spaCy wouldn’t try to run NER on text files of its own NER results, and would instead just use the files with the chapter texts as the objects of investigation.

I changed the code from step 7 (and 5) to replace PER with LOC and ran it again to get location entities.

#Sort all the files in the directory you specified above, alphabetically.
#For each of those files...
for filename in sorted(os.listdir(filedirectory)):
    #If the filename ends with .txt (i.e. if it's actually a text files)
    if filename.endswith('.txt'):
        #Write out below the name of the file
        print(filename)
        #The file name of the output file adds _ner_lov to the end of the file name of the input file
        outfilename = filename.replace('.txt', '_ner_loc.txt')
        #Open the infput filename
        with open(filename, 'r') as f:
            #Create and open the output filename
            with open(outfilename, 'w') as out:
                #Read the contents of the input file
                chaptertext = f.read()
                #Do French NLP on the contents of the input file
                chapterner = frnlp(chaptertext)
                #For each recognized entity
                for ent in chapterner.ents:
                    #If that entity is labeled as a person
                    if ent.label_ == 'LOC':
                        #Print the entity, and the label (which should be PER)
                        print(ent.text, ent.label_)
                        #Write the entity to the output file
                        out.write(ent.text)
                        #Write a newline character to the output file
                        out.write('\n')
016_01_bg.txt
Espagne LOC
Neuville LOC
Neuville LOC
Burkina Faso LOC
Neuville LOC
Ouagadougou LOC
Burkina Faso LOC
France LOC
Neuville LOC
Aubrives LOC
Neuville LOC
Neuville LOC
Noirs LOC
Johanna LOC
chambre de Gringalet LOC
Bonjour LOC
Roseline LOC
Gringalet LOC
Gringalet LOC
Swanilda LOC
Roseline LOC
016_01_frmod.txt
Mexique LOC
Connecticut LOC
Stonebrook LOC
Oakley LOC
New Jersey LOC
hôpital d'Oakley LOC
Oakley LOC
New Jersey LOC
Noirs LOC
Stamford LOC
Connecticut LOC
Stonebrook LOC
Stamford LOC
Keisha LOC
chambre de P'tit Bout LOC
Aah LOC
Oakley LOC
Swanilda LOC
016_01_qu.txt
Mexique LOC
Nouville LOC
Nouville LOC
États LOC
Oakley LOC
New Jersey LOC
hôpital d'Oakley LOC
Jessie LOC
Noirs LOC
Blancs LOC
Noirs LOC
Nouville LOC
Noirs LOC
Lentement LOC
Kara LOC
Jessie LOC
Nouville LOC
Oakley LOC
Noëlle LOC
016_02_bg.txt
Valérie LOC
Valérie LOC
Neuville LOC
Marjorie LOC
Coralie LOC
Mélanie LOC
Neuville LOC
Mélanie LOC
Mélanie LOC
Valérie LOC
Mélanie LOC
Mélanie LOC
Paris LOC
Provence LOC
Mélanie LOC
Provençale LOC
Neuville LOC
Provence LOC
Mélanie LOC
Mélanie LOC
016_02_frmod.txt
Stonebrook LOC
Sitters LOC
Lucy LOC
Stonebrook LOC
Claudia LOC
Claudia LOC
Claudia LOC
Claudia LOC
Claudia LOC
New York LOC
Lucy LOC
Carla LOC
Californie LOC
Carla LOC
Claudia LOC
Californie LOC
Braddock LOC
Claudia LOC
016_02_qu.txt
Nouville LOC
Claudia LOC
Claudia LOC
Agenda LOC
Toronto LOC
Californie LOC
Californienne LOC
Californie LOC
Donnez LOC
Nouville LOC
Claudia LOC
Maijorie LOC
016_03_bg.txt
Poupée Chinoise LOC
Poupée Chinoise LOC
Noirs LOC
Swanilda LOC
Avais LOC
016_03_frmod.txt
Pliez LOC
Katie LOC
Katie LOC
Noirs LOC
Katie LOC
Katie LOC
Katie LOC
Avais LOC
016_03_qu.txt
Poupée chinoise LOC
Europe LOC
Jessie LOC
Élizabeth LOC
Poupée chinoise LOC
Attends LOC
016_04_bg.txt
Gringalet LOC
Aubrives LOC
Brinbeuf LOC
016_04_frmod.txt
Braddock LOC
Braddock LOC
Helen LOC
Helen LOC
Helen LOC
Helen LOC
Voudrais LOC
Helen LOC
Matthew LOC
Braddock LOC
Helen LOC
016_04_qu.txt
Noëlle LOC
Penses LOC
Noëlle LOC
Jessie LOC
Jessie LOC
Jessie LOC
Jessie LOC
Jessie LOC
016_05_bg.txt
Aurélie LOC
après-midi-là LOC
Mélanie LOC
Mélanie LOC
Mélanie LOC
Aurélie LOC
Roseline LOC
Aurélie LOC
Aurélie LOC
Aurélie LOC
Mélanie LOC
Aurélie LOC
Aurélie LOC
Mélanie LOC
Aurélie LOC
Aurélie LOC
Mélanie LOC
Aurélie LOC
Mélanie LOC
Ricky l'Ecureuil LOC
Aurélie LOC
Mélanie LOC
Mélanie LOC
Aurélie LOC
Mélanie LOC
Veux LOC
Aurélie LOC
Aurélie LOC
Aurélie LOC
Mélanie LOC
Aurélie LOC
Mélanie LOC
Aurélie LOC
Mélanie LOC
Aurélie LOC
Agnès LOC
Aurélie LOC
Aurélie LOC
Aurélie LOC
Mélanie LOC
Mélanie LOC
Neuville LOC
016_05_frmod.txt
après-midi-là LOC
Prezzioso LOC
Jen LOC
Braddock LOC
Braddock LOC
Helen LOC
Braddock LOC
Helen LOC
016_05_qu.txt
après-midi-là LOC
Échelles LOC
Noire LOC
Nouville LOC
016_06_bg.txt
Agnès LOC
P LOC
P LOC
Mélanie LOC
Levêque LOC
Agnès LOC
Levêque LOC
Godefroid LOC
Levêque LOC
Godefroid LOC
Ouais LOC
Marseille LOC
Monaco LOC
Marjorie LOC
Levêque LOC
016_06_frmod.txt
J LOC
J LOC
Helen LOC
P LOC
Helen LOC
Helen LOC
Buddy LOC
Liz LOC
Helen LOC
016_06_qu.txt
J LOC
E LOC
S LOC
S LOC
I LOC
Jessie LOC
Jessie LOC
Nouville LOC
Matthieu LOC
O LOC
L LOC
A LOC
Les Barrette LOC
Hélène LOC
Venez LOC
Marjorie LOC
Maijorie LOC
Claire LOC
Patriotes LOC
Marjorie LOC
016_07_bg.txt
Levêque LOC
Levêque LOC
Levêque LOC
Levêque LOC
Levêque LOC
Levêque LOC
Tenez LOC
Levêque LOC
Levêque LOC
016_07_frmod.txt
Carla LOC
Mallory LOC
Margot LOC
Carla LOC
Carla LOC
Nicky LOC
Tenez LOC
Carla LOC
Carla LOC
Carla LOC
Mallory LOC
Margot LOC
Carla LOC
016_07_qu.txt
Jessie LOC
Mathurin LOC
Margot LOC
Jessie LOC
016_08_bg.txt
Perron LOC
Coralie LOC
Coralie LOC
016_08_frmod.txt
Katie LOC
Essaie LOC
Adèle LOC
Katie LOC
Katie LOC
Katie LOC
Katie LOC
Adèle LOC
Adèle LOC
Katie LOC
Massachusetts LOC
Braddock LOC
Katie LOC
Katie LOC
016_08_qu.txt
Noëlle LOC
Pellerin LOC
Élizabeth LOC
Élizabeth LOC
Noëlle LOC
Essaye LOC
Matthieu LOC
Matthieu LOC
Pellerin LOC
016_09_bg.txt
Coralie LOC
Coralie LOC
Coralie LOC
Coralie LOC
Coralie LOC
Bonjour LOC
Coralie LOC
Coralie LOC
Coralie LOC
Coralie LOC
Coralie LOC
Coralie LOC
Dumont LOC
Coralie LOC
Arnaud LOC
Coralie LOC
Coralie LOC
Sébastien LOC
Coralie LOC
Coralie LOC
Arnould LOC
Coralie LOC
Aimeriez LOC
Coralie LOC
Sébastien LOC
Coralie LOC
Coralie LOC
Attends LOC
Coralie LOC
Coralie LOC
Coralie LOC
Coralie LOC
Coralie LOC
Merci Julie LOC
Coralie LOC
Coralie LOC
Appelle LOC
Coralie LOC
Doudou LOC
Coralie LOC
Coralie LOC
Flocon LOC
Julie LOC
Coralie LOC
Coralie LOC
Coralie LOC
016_09_frmod.txt
Claudia LOC
Claudia LOC
Lelland LOC
Claudia LOC
Claudia LOC
Claudia LOC
Claudia LOC
Claudia LOC
Papadakis LOC
Lelland LOC
Karen LOC
Claudia LOC
Lego LOC
Claudia LOC
Claudia LOC
Lelland LOC
Claudia LOC
Lelland LOC
Claudia LOC
Karen LOC
Andrew LOC
Moosie LOC
Claudia LOC
Moosie LOC
Tickly LOC
Claudia LOC
Claudia LOC
016_09_qu.txt
Karen LOC
André LOC
Claudia LOC
Papadakis LOC
Karen LOC
Claudia LOC
Lego LOC
Claudia LOC
Claudia LOC
Claudia LOC
Claudia LOC
Claudia LOC
Venez LOC
Claudia LOC
Claudia LOC
Karen LOC
chambre de Karen LOC
Claudia LOC
Claudia LOC
Jessie LOC
016_10_bg.txt
Brinbeuf LOC
Neuville LOC
Levêque LOC
Brinbeuf LOC
Ouais LOC
Neuville LOC
A LOC
Gringalet LOC
016_10_frmod.txt
Braddock LOC
Braddock LOC
Helen LOC
H LOC
Helen LOC
016_10_qu.txt
Centre LOC
Jaja LOC
H LOC
Noëlle LOC
Noëlle LOC
Noëlle LOC
016_11_bg.txt
Excusez LOC
Mélanie LOC
Valérie LOC
Valérie LOC
Mélanie LOC
Mélanie LOC
Mélanie LOC
Mélanie LOC
Mélanie LOC
Provence LOC
Crois LOC
Valérie LOC
Provence LOC
Avez LOC
Mélanie LOC
Mélanie LOC
Ouais LOC
Marjorie LOC
Mélanie LOC
Marjorie LOC
016_11_frmod.txt
Claudia LOC
Claudia LOC
Mallory LOC
Claudia LOC
Claudia LOC
Carla LOC
Carla LOC
Claudia LOC
Braddock LOC
Carla LOC
Claudia LOC
Californie LOC
Carla LOC
Carla LOC
Californie LOC
Carla LOC
Claudia LOC
Carla LOC
Claudia LOC
016_11_qu.txt
Claudia LOC
Claudia LOC
Claudia LOC
Trousses LOC
Trousses LOC
Californie LOC
Californie LOC
Aimeriez LOC
première de Cappella LOC
016_12_bg.txt
Principal LOC
Aubrives LOC
Assieds LOC
U LOC
S LOC
T LOC
I LOC
N LOC
T LOC
O LOC
I LOC
016_12_frmod.txt
Stamford LOC
016_12_qu.txt
Matthieu LOC
Jessie LOC
France LOC
France LOC
France LOC
France LOC
France LOC
France LOC
E LOC
S LOC
S LOC
I LOC
M LOC
O LOC
A LOC
L LOC
L LOC
E LOC
T LOC
France LOC
France LOC
Matthieu LOC
de France LOC
France LOC
France LOC
016_13_bg.txt
Roseline LOC
Gringalet LOC
Aubrives LOC
Valérie LOC
Roseline LOC
Valérie LOC
Viens LOC
chambre de Gringalet LOC
Gringalet LOC
Gringalet LOC
Merci LOC
Valérie LOC
Roseline LOC
Valérie LOC
Aubrives LOC
Neuville LOC
Roseline LOC
Valérie LOC
Valérie LOC
Neuville LOC
Roseline LOC
Roseline LOC
Valérie LOC
Roseline LOC
Valérie LOC
Valérie LOC
Valérie LOC
Roseline LOC
Valérie LOC
016_13_frmod.txt
Mercredi LOC
Salut LOC
Braddock LOC
Kristy LOC
chambre de P'tit Bout LOC
Kristy LOC
Viens LOC
Lucy LOC
Jessica LOC
016_13_qu.txt
Jessie LOC
Oakley LOC
Nouville LOC
Us LOC
Becca LOC
Jessie LOC
Jessie LOC
016_14_bg.txt
Aubrives LOC
Mélanie LOC
Valérie LOC
Valérie LOC
Carole LOC
Mélanie LOC
Gringalet LOC
Mélanie LOC
016_14_frmod.txt
Stamford LOC
New Jersey LOC
Kristy LOC
Katie LOC
Katie LOC
Katie LOC
Katie LOC
Katie LOC
016_14_qu.txt
France LOC
Noëlle LOC
Matthieu LOC
Noëlle LOC
Noëlle LOC
Nouville LOC
Noëlle LOC
Matthieu LOC
016_15_bg.txt
Bonjour LOC
Félicitations LOC
France LOC
France LOC
Carole LOC
Mélanie LOC
la Taverne LOC
Melba LOC
Chantilly LOC
Mélanie LOC
016_15_frmod.txt
Katie LOC
Adèle LOC
Adèle LOC
Katie LOC
Braddock LOC
Helen LOC
Kristy LOC
Claudia LOC
Carla LOC
Katie LOC
Katie LOC
Adèle LOC
Adèle LOC
Katie LOC
Katie LOC
Celles LOC
Braddock LOC
Chantilly LOC
016_15_qu.txt
Élizabeth LOC
Adèle LOC
Adèle LOC
Jessie LOC
Kara LOC
Jessie LOC
Kara LOC
Kara LOC
Élizabeth LOC
Noëlle LOC
Coupable LOC
Claudia LOC

Skimming the results, it was interesting to see how much worse it performed than the person entity recognition. It feels like a minority of the results are legit places, and most of the results are people’s names.

9. French NER for orgs

I was curious what I’d get by looking for entities flagged as organizations. I mean, this entire book series is about an organization: would that get flagged correctly? (Once again, I moved the _ner_loc.txt files into their own folder first.)

The verdict: Club des Baby (France-French translation) and Club des baby (Quebec translation) get marked as organizations; the “sitters” gets lost when “Baby-sitters” gets separated at the hyphen. There’s also various things that are most definitely not organizations that get tagged, like “Bonjour!” in Belgian ch. 10, or “PLIÉ” in Quebec ch. 3 (maybe spaCy thought it was an acronym and not a yelling dance teacher?)

#Sort all the files in the directory you specified above, alphabetically.
#For each of those files...
for filename in sorted(os.listdir(filedirectory)):
    #If the filename ends with .txt (i.e. if it's actually a text files)
    if filename.endswith('.txt'):
        #Write out below the name of the file
        print(filename)
        #The file name of the output file adds _ner_org to the end of the file name of the input file
        outfilename = filename.replace('.txt', '_ner_org.txt')
        #Open the infput filename
        with open(filename, 'r') as f:
            #Create and open the output filename
            with open(outfilename, 'w') as out:
                #Read the contents of the input file
                chaptertext = f.read()
                #Do French NLP on the contents of the input file
                chapterner = frnlp(chaptertext)
                #For each recognized entity
                for ent in chapterner.ents:
                    #If that entity is labeled as a person
                    if ent.label_ == 'ORG':
                        #Print the entity, and the label (which should be PER)
                        print(ent.text, ent.label_)
                        #Write the entity to the output file
                        out.write(ent.text)
                        #Write a newline character to the output file
                        out.write('\n')

10. French NER for misc

The French entity model also has a “MISC” type, so for the sake of completeness, I couldn’t not try it. And the results are as advertised. Lots of names. Lots of “Ça”. There’s a “Tu es atroce” (You’re excruciating) from Ch. 5 of the France French version. “Le Langage Secret” in Ch. 6 of the Belgian translation gets flagged, and “P’tit” makes an appearance more than once. Only in the France French version do “Noirs” (Black people) and “Noire de mon école” (Black person in my school) get flagged.

#Sort all the files in the directory you specified above, alphabetically.
#For each of those files...
for filename in sorted(os.listdir(filedirectory)):
    #If the filename ends with .txt (i.e. if it's actually a text files)
    if filename.endswith('.txt'):
        #Write out below the name of the file
        print(filename)
        #The file name of the output file adds _ner_misc to the end of the file name of the input file
        outfilename = filename.replace('.txt', '_ner_misc.txt')
        #Open the infput filename
        with open(filename, 'r') as f:
            #Create and open the output filename
            with open(outfilename, 'w') as out:
                #Read the contents of the input file
                chaptertext = f.read()
                #Do French NLP on the contents of the input file
                chapterner = frnlp(chaptertext)
                #For each recognized entity
                for ent in chapterner.ents:
                    #If that entity is labeled as a person
                    if ent.label_ == 'MISC':
                        #Print the entity, and the label (which should be PER)
                        print(ent.text, ent.label_)
                        #Write the entity to the output file
                        out.write(ent.text)
                        #Write a newline character to the output file
                        out.write('\n')
016_01_bg.txt
Maman MISC
Oh-ada MISC
Aubrives MISC
Merci MISC
Maman MISC
016_01_frmod.txt
P'tit Bout MISC
Stonebrook MISC
Noirs MISC
Rebecca MISC
P'tit Bout MISC
P'tit Bout MISC
Rebecca MISC
P'tit Bout MISC
Jessica MISC
Coppélia MISC
Rebecca MISC
Oh MISC
Ça MISC
Rebecca MISC
Merci MISC
Allez MISC
Rebecca MISC
Rebecca MISC
Rebecca MISC
016_01_qu.txt
Jaja MISC
Oakley MISC
Noire de mon école! MISC
Noirs MISC
Jaja MISC
Dr Coppélius MISC
016_02_bg.txt
Valérie MISC
Nicolas MISC
Si MISC
Merci MISC
Merci MISC
016_02_frmod.txt
Kristy MISC
Kristy MISC
Claudia MISC
Je sais MISC
Stonebrook MISC
Jane MISC
Claudia MISC
Claudia MISC
Logan Rinaldi – MISC
Merci MISC
Oh MISC
016_02_qu.txt
Ça MISC
Christine MISC
Christine MISC
Christine MISC
II MISC
Christine MISC
Christine MISC
Claudia MISC
Christine MISC
Une fois par semaine MISC
Christine MISC
Je sais MISC
Pourquoi ne pas...» MISC
André MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Marjo MISC
Christine MISC
Marjorie MISC
Claudia MISC
Marjo MISC
Claudia MISC
Christine MISC
Claudia MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Merci MISC
Matthieu MISC
Matthieu MISC
Matthieu MISC
Christine MISC
Jessie! MISC
016_03_bg.txt
Maman MISC
Victoire MISC
Allez MISC
Coppélia MISC
la Danse des Heures MISC
Hélène MISC
MOI! MISC
Swanilda noire MISC
Merci MISC
Hélène MISC
016_03_frmod.txt
– Et un et deux et trois et quatre et plié MISC
Maman MISC
Allez MISC
Coppélia MISC
Valse MISC
XIXe siècle MISC
Jessica MISC
Swanilda noire MISC
Oh MISC
016_03_qu.txt
Et un (bang! MISC
Ça MISC
la Danse des heures MISC
Poupée chinoise MISC
Coppélia?
 MISC
Jessica MISC
Si c'est vrai MISC
Une fois habillée MISC
016_04_bg.txt
Si MISC
Allons MISC
lettre C MISC
Ça MISC
Brinbeuf MISC
Ça MISC
016_04_frmod.txt
Si MISC
Si MISC
Jessica MISC
Allons MISC
lettre C MISC
Ça MISC
Ça MISC
016_04_qu.txt
Jaja MISC
Les Petits Chaussons MISC
Matthieu MISC
Matthieu MISC
Hélène MISC
Jessica MISC
Hélène MISC
Jessica MISC
Hélène MISC
Hélène MISC
Matthieu MISC
Hélène MISC
Matthieu MISC
Matthieu MISC
Matthieu MISC
Matthieu MISC
Matthieu MISC
Matthieu MISC
Ça MISC
Matthieu MISC
Matthieu MISC
Matthieu MISC
Matthieu MISC
lettre M MISC
Matthieu MISC
lettre H MISC
comète de Halley MISC
Qu'en pensez-vous? MISC
Matthieu MISC
Hélène MISC
Matthieu MISC
Hélène MISC
Matthieu MISC
Hélène MISC
Hélène MISC
Hélène MISC
Matthieu MISC
016_05_bg.txt
Mercredi
Nous sommes toutes d'accord MISC
Brinbeuf!
 MISC
Precisio MISC
Precisio MISC
Hippo gloutons MISC
Piqu' MISC
Ça MISC
Mathieu! MISC
016_05_frmod.txt
Mercredi



 MISC
Jessica MISC
Prezzioso MISC
Jen MISC
Jessica MISC
Tu es atroce MISC
Matthew ! MISC
Ah ! MISC
Stonebrook MISC
016_05_qu.txt
Mercredi
D'accord MISC
les Prieur MISC
les Prieur MISC
les Prieur MISC
Serpents et MISC
Matthieu MISC
Les Biron MISC
Matthieu MISC
Matthieu MISC
TU MISC
Hélène MISC
Jeanne MISC
Hélène MISC
Matthieu MISC
Hélène MISC
Matthieu MISC
Matthieu! MISC
Matthieu MISC
Hélène MISC
Matthieu MISC
Hélène MISC
Matthieu MISC
Matthieu MISC
016_06_bg.txt
Brinbeuf MISC
Je hochai la tête MISC
V MISC
Merci MISC
LAURENT MISC
Elodie MISC
Levêque MISC
Le Langage Secret MISC
016_06_frmod.txt
Rebecca MISC
Rebecca MISC
Pike MISC
Pike MISC
Pike MISC
Barrett MISC
Le Langage secret MISC
Nicky MISC
Vanessa MISC
Pike MISC
Patriots MISC
Super Bowl MISC
Pike MISC
016_06_qu.txt
Matthieu MISC
Hélène MISC
Matthieu MISC
Hélène MISC
Matthieu MISC
Biron MISC
Matthieu MISC
Un J MISC
Matthieu MISC
Hélène MISC
Matthieu MISC
Matthieu MISC
Un signe et un sourire MISC
Matthieu MISC
Matthieu MISC
Hélène MISC
Hélène MISC
Hélène MISC
Matthieu MISC
Matthieu MISC
Matthieu MISC
Hélène MISC
Matthieu MISC
Matthieu MISC
Matthieu MISC
Matthieu MISC
Nicolas MISC
Génial MISC
Matthieu MISC
Picard MISC
Hélène et Matthieu MISC
Matthieu MISC
016_07_bg.txt
Levêque hier MISC
Majorie MISC
Levêque sur les chaises MISC
Allons MISC
Marjorie MISC
016_07_frmod.txt
Jessica MISC
Pike MISC
Pike MISC
Nicky MISC
Margot MISC
Pike MISC
Pike MISC
Pike MISC
Pike MISC
Carla ! MISC
Ça MISC
Carla MISC
Allez MISC
Pike MISC
Matthew MISC
016_07_qu.txt
Vendredi
Oh MISC
Qu'as-tu commencé MISC
Picard MISC
Journal de bord ici MISC
Antoine MISC
Nicolas MISC
Bernard MISC
Joël MISC
Nicolas MISC
Nicolas MISC
Marjorie MISC
ver! MISC
Nicolas MISC
Nicolas MISC
Nicolas MISC
Allez MISC
Restez tranquilles MISC
Nicolas MISC
Margot MISC
Ça MISC
Bernard MISC
Matthieu MISC
Matthieu MISC
016_08_bg.txt
Mes os MISC
ONNE MISC
Catherine MISC
Perron MISC
016_08_frmod.txt
Mes os MISC
Si c'était le cas MISC
Rebecca MISC
Katie MISC
Oh MISC
Katie MISC
016_08_qu.txt
Répétition MISC
Mes muscles me font mal MISC
Merci MISC
Catherine MISC
Adèle MISC
Adèle MISC
Adèle MISC
Matthieu MISC
Adèle MISC
Matthieu MISC
Matthieu MISC
Merci MISC
Ça MISC
016_09_bg.txt
Nicolas MISC
Nicolas MISC
Valérie! MISC
Bonjour Coralie MISC
Super MISC
Ça MISC
Julie MISC
Arnould MISC
Oh MISC
Berk! MISC
HOU MISC
Oh MISC
Mon voisin MISC
016_09_frmod.txt
Jessica MISC
Samuel MISC
Claudia MISC
Claudia MISC
Claudia MISC
Morbidda Destiny MISC
Claudia MISC
Claudia MISC
Kristy MISC
Karen MISC
Claudia MISC
Andrew MISC
Claudia MISC
Andrew MISC
Super ! MISC
Merci MISC
Ça MISC
Claudia MISC
Claudia MISC
Rebecca MISC
Claudia MISC
Claudia MISC
Claudia MISC
Claudia MISC
Claudia MISC
Claudia MISC
Oh MISC
Claudia MISC
Oh MISC
Claudia MISC
Tiens MISC
Héloïse MISC
Claudia MISC
Claudia MISC
Jessica MISC
016_09_qu.txt
Samedi
Jessie MISC
André MISC
David MISC
Christine MISC
Matthieu MISC
Claudia MISC
Christine MISC
Claudia MISC
Christine MISC
Claudia MISC
Christine MISC
André MISC
Christine MISC
Destinée Morbide MISC
André MISC
David MISC
Christine MISC
Christine MISC
Karen!
 MISC
Christine MISC
Claudia MISC
Claudia MISC
André MISC
Christine MISC
André MISC
Merci MISC
Christine MISC
André MISC
Claudia MISC
André MISC
David MISC
David MISC
Où vas MISC
Claudia MISC
André MISC
André MISC
David MISC
Ça MISC
Claudia MISC
André MISC
Claudia MISC
Karen MISC
André MISC
Claudia MISC
Claudia MISC
Claudia MISC
Je suis bien contente d'aider MISC
Claudia MISC
Claudia MISC
André MISC
André MISC
André MISC
Claudia MISC
Salut MISC
André MISC
Bou! MISC
Claudia MISC
André MISC
La sorcière d'à côté MISC
Claudia MISC
André MISC
David MISC
Claudia MISC
016_10_bg.txt
Oh MISC
Aubrives MISC
Noël MISC
Dada Bo? MISC
016_10_frmod.txt
Pike MISC
Matthew MISC
Nicky MISC
Oh MISC
Nicky MISC
Nicky MISC
P'tit Bout MISC
Rebecca MISC
P'tit Bout MISC
Rebecca MISC
Rebecca MISC
Stonebrook MISC
Noël MISC
Rebecca MISC
P'tit Bout MISC
Rebecca MISC
Rebecca MISC
P'tit Bout MISC
016_10_qu.txt
Hélène MISC
Matthieu MISC
Matthieu MISC
Picard MISC
Barrette MISC
Matthieu MISC
Matthieu MISC
Hélène MISC
Nicolas MISC
Hélène MISC
Matthieu MISC
Nicolas MISC
Nicolas MISC
Matthieu MISC
Hélène MISC
Nicolas MISC
Nicolas MISC
Matthieu MISC
Matthieu MISC
Matthieu MISC
Hélène MISC
Matthieu MISC
Matthieu MISC
Hélène MISC
Jaja MISC
Matthieu MISC
Hélène MISC
Matthieu MISC
Regarde Matthieu MISC
Nicolas MISC
Antoine MISC
Matthieu MISC
Nicolas MISC
Antoine MISC
Hélène MISC
Matthieu MISC
Noël MISC
Matthieu MISC
Jaja MISC
Ga MISC
Ça MISC
Biron MISC
Matthieu MISC
Hélène MISC
Hélène MISC
Hélène MISC
Adèle MISC
016_11_bg.txt
Nicolas MISC
Je voulais... MISC
Nicolas MISC
Valérie MISC
Allô MISC
016_11_frmod.txt
Claudia MISC
Un biscuit ? MISC
Claudia MISC
Jessica MISC
Claudia MISC
Logan !

 MISC
P'tit Bout MISC
Kristy MISC
Oh MISC
Coppélia MISC
Stamford ! MISC
016_11_qu.txt
Christine MISC
Marjorie MISC
Christine MISC
Christine MISC
Claudia MISC
Claudia MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Claudia MISC
Christine MISC
Christine MISC
Christine MISC
Jessie MISC
Matthieu MISC
Hélène MISC
Matthieu MISC
Christine MISC
Claudia MISC
Christine MISC
Matthieu MISC
Claudia MISC
Claudia MISC
Christine MISC
Christine MISC
Christine MISC
Claudia MISC
Christine MISC
Marjorie MISC
Claudia MISC
Christine MISC
016_12_bg.txt
Merci MISC
Merci MISC
Vendredi prochain MISC
Merci MISC
016_12_frmod.txt
Jessica MISC
Jessica MISC
Merci MISC
Merci MISC
016_12_qu.txt
Oh! MISC
Matthieu MISC
Matthieu MISC
Matthieu MISC
Matthieu MISC
Jessie MISC
Matthieu MISC
Matthieu MISC
Merci MISC
Ça MISC
Matthieu MISC
Matthieu MISC
Matthieu MISC
Merci MISC
Matthieu MISC
016_13_bg.txt
Brinbeuf MISC
Livre du Chat MISC
Oh! MISC
Valérie MISC
Oh! MISC
Valérie MISC
Roseline MISC
Valérie MISC
Ah! MISC
Japonaise de l'école MISC
Valérie et MISC
Oh MISC
Oh! MISC
016_13_frmod.txt
Rebecca MISC
Jessica MISC
Rebecca MISC
P'tit Bout MISC
Rebecca MISC
Rebecca MISC
Rebecca MISC
Livre du chat MISC
P'tit Bout MISC
Kristy MISC
Rebecca MISC
Oh ! MISC
Rebecca MISC
Rebecca MISC
Rebecca MISC
P'tit Bout MISC
Rebecca MISC
P'tit Bout MISC
P'tit Bout MISC
Rebecca MISC
Kristy MISC
Rebecca MISC
Stonebrook MISC
Rebecca MISC
Stonebrook MISC
Ça MISC
Rebecca MISC
Japonaise de l'école MISC
Kristy MISC
Rebecca MISC
Oh MISC
Rebecca MISC
Rebecca MISC
Rebecca MISC
Stonebrook MISC
Rebecca MISC
Rebecca MISC
Rebecca MISC
Kristy MISC
Rebecca MISC
Bout MISC
P'tit Bout MISC
Jessica MISC
Oh ! MISC
016_13_qu.txt
Mercredi
Salut MISC
Christine MISC
Christine MISC
Christine MISC
Jaja MISC
Christine MISC
Allons MISC
Christine MISC
Jaja MISC
Christine MISC
Christine MISC
Christine MISC
Christine MISC
Matthieu MISC
Japonaise dans ton école MISC
Christine MISC
Miss Nouville MISC
Copemicus MISC
Christine MISC
Jaja MISC
Jaja MISC
Ça MISC
Jaja MISC
Christine MISC
Christine MISC
016_14_bg.txt
Dieu! MISC
Je hochai la tête MISC
Allez MISC
Aubrives MISC
Merci MISC
acte II MISC
acte III MISC
016_14_frmod.txt
Seigneur ! MISC
Rebecca MISC
Rebecca MISC
P'tit Bout ? MISC
Oh MISC
Allez MISC
Katie MISC
Maman MISC
Adèle MISC
acte II MISC
Adèle MISC
Oh MISC
Katie MISC
016_14_qu.txt
Dr Coppélius MISC
Si MISC
Allons MISC
ballet Merci MISC
Hélène MISC
Je serai contente de la voir MISC
Matthieu MISC
Adèle MISC
Si MISC
acte II MISC
Hélène MISC
Dr Coppélius MISC
Dr Coppélius MISC
Matthieu MISC
Matthieu MISC
Matthieu MISC
Adèle MISC
Matthieu MISC
Matthieu MISC
Adèle MISC
016_15_bg.txt
Merci MISC
Oh! MISC
Ça MISC
Super! MISC
Pêche Melba MISC
016_15_frmod.txt
Adèle MISC
Katie MISC
Jessica MISC
Rebecca MISC
Jessi ! MISC
Katie MISC
Adèle MISC
Oh ! MISC
Jessica MISC
Oh MISC
Super ! MISC
Jessica MISC
Si c'était le cas MISC
Rebecca MISC
Claudia MISC
Jessica MISC
016_15_qu.txt
Adèle MISC
Matthieu MISC
Félicitations MISC
Kara MISC
Kara MISC
Jessie MISC
Merci MISC
Matthieu MISC
Matthieu MISC
Adèle MISC
Matthieu MISC
Jessie MISC
Hélène MISC
Christine MISC
Jessie MISC
Hélène MISC

Let’s do it all again for English

The steps below in the notebook basically repeat the process described above, but using the spaCy model for English instead of French. At first, I just copied over the code cells from the French section (switching out enfiledirectory for filedirectory in the first line so it looked in the right place for the English files), but kept getting some bizarre output: namely, it wasn’t able to find any entities labeled PER or LOC.

To figure out what was going on, I removed the line if ent.label_ == 'PER': and un-indented the code nested inside it, to avoid Python indentation errors, then commented-out the following out.write lines by putting a # in front of them. I didn’t want to write any results, I just wanted to see what entities it found.

Lo and behold, there were lots of entities, with more different entity labels than available for French. There’s GPE (geopolitical entity, AKA location, but not any of the locations that are like “so-and-so’s room”), DATE, CARDINAL (number type), LANGUAGE (seems relevant for Jessi’s Secret Language), TIME (e.g. “the morning of the day”), DATE (things like “a few weeks”, or, strangely, “eight-year-old”), PERSON (not PER).

First, we have to put in the path to the English files, then change to that directory:

#Put in the path here, using the same conventions as described above in step 4
enfiledirectory = '/Users/qad/Documents/dsc/dscm2/en'

11. English NER for people

#Sort all the files in the directory you specified above, alphabetically.
#For each of those files...
for filename in sorted(os.listdir(enfiledirectory)):
    #If the filename ends with .txt (i.e. if it's actually a text files)
    if filename.endswith('.txt'):
        #Write out below the name of the file
        print(filename)
        #The file name of the output file adds _ner_per to the end of the file name of the input file
        outfilename = filename.replace('.txt', '_ner_per.txt')
        #Open the infput filename
        with open(filename, 'r') as f:
            #Create and open the output filename
            with open(outfilename, 'w') as out:
                #Read the contents of the input file
                chaptertext = f.read()
                #Do English NLP on the contents of the input file
                chapterner = ennlp(chaptertext)
                #For each recognized entity
                for ent in chapterner.ents:
                    #If that entity is labeled as a person
                    if ent.label_ == 'PERSON':
                        #Print the entity, and the label (which should be PER)
                        print(ent.text, ent.label_)
                        #Write the entity to the output file
                        out.write(ent.text)
                        #Write a newline character to the output file
                        out.write('\n')
016_01_en.txt
chuck PERSON
Becca PERSON
Rebecca PERSON
Squirt PERSON
John Philip Ramsey PERSON
Jessi Ramsey PERSON
Jessica Davis Ramsey PERSON
Becca PERSON
Squirt PERSON
Keisha PERSON
Daddy PERSON
Mallory Pike PERSON
Keisha PERSON
Mallory PERSON
Barre PERSON
Daddy PERSON
Becca PERSON
Jessi PERSON
Squirt PERSON
Jessi PERSON
Becca PERSON
Coppelius PERSON
Becca PERSON
Franz PERSON
Coppelius PERSON
Becca PERSON
Daddy PERSON
Madame Noelle PERSON
Becca PERSON
Becca PERSON
016_02_en.txt
Kristy Thomas PERSON
Mallory Pike PERSON
Kristy PERSON
Kristy PERSON
Kristy PERSON
Mallory PERSON
Mal PERSON
Kristy PERSON
Kristy PERSON
David Michael PERSON
Thomas PERSON
Mary Anne Spier PERSON
Claudia Kishi PERSON
Stacey McGill PERSON
Stacey PERSON
Mal PERSON
Dawn Schafer PERSON
Kristy PERSON
Mary Anne PERSON
Dawn PERSON
Mal PERSON
Kristy PERSON
Kristy PERSON
Kristy PERSON
Kristy PERSON
Sam PERSON
Charlie PERSON
David Michael PERSON
Kristy PERSON
Watson Brewer PERSON
Karen PERSON
Andrew PERSON
Kristy PERSON
Kristy PERSON
Mary Anne Spier PERSON
Watson PERSON
Brewer PERSON
Kristy PERSON
Kristy PERSON
Charlie PERSON
Mary Anne PERSON
Claudia Kishi PERSON
Once Mal PERSON
Kristy PERSON
Kristy PERSON
Mary Anne PERSON
Mal PERSON
Janine PERSON
Mimi PERSON
Mary Anne Spier PERSON
Dawn Schafer's PERSON
Mary Anne PERSON
Kristy PERSON
Kristy PERSON
Mary Anne PERSON
Mary Anne PERSON
Mary Anne PERSON
Kristy PERSON
Mary Anne PERSON
Tigger PERSON
Spier PERSON
Stacey McGill PERSON
Stacey PERSON
Dawn Schafer PERSON
Dawn PERSON
Kristy PERSON
Mary Anne PERSON
Dawn PERSON
Jeff PERSON
Jeff PERSON
Schafer PERSON
Dawn PERSON
Dawn PERSON
Mal PERSON
Mal PERSON
Mal PERSON
Mal PERSON
Logan Bruno PERSON
Mary Anne's PERSON
Shannon Kilbourne PERSON
Kristy PERSON
Dawn PERSON
Mal PERSON
Kristy PERSON
Kristy PERSON
Mary Anne PERSON
Dawn PERSON
Mal PERSON
Braddock PERSON
Matthew PERSON
Braddocks PERSON
Matthew PERSON
Braddock PERSON
Dawn PERSON
Mary Anne PERSON
Kristy PERSON
Mal PERSON
Mal PERSON
Kristy PERSON
Braddock PERSON
Mal PERSON
016_03_en.txt
Mademoiselle Jones PERSON
Mme Noelle PERSON
Becca PERSON
Mme Noelle PERSON
Mme Noelle PERSON
Katie Beth PERSON
Hilary PERSON
Katie Beth PERSON
Mademoiselle Romsey PERSON
Mademoiselle Romsey PERSON
Ramsey PERSON
Mme Noelle's PERSON
Katie Beth PERSON
Tour jetés PERSON
Katie Beth PERSON
Gother PERSON
Mme Noelle PERSON
Hilary PERSON
Katie Beth PERSON
Mary Bramstedt PERSON
Lisa Jones PERSON
Carrie Steinfeld PERSON
Hilary PERSON
Katie Beth PERSON
Coppélia PERSON
Hilary PERSON
Katie Beth PERSON
Coppélia PERSON
Mme Noelle PERSON
Mademoiselle Jessica Romsey PERSON
Jessica Romsey PERSON
Jessica Ramsey PERSON
Mme Noelle PERSON
Jessica PERSON
Mary Bramstedt PERSON
Lisa Jones PERSON
Jessi PERSON
Mary PERSON
Lisa PERSON
Katie Beth PERSON
Katie Beth PERSON
Katie Beth PERSON
Katie Beth PERSON
Hilary PERSON
Katie PERSON
Katie Beth PERSON
Hilary PERSON
016_04_en.txt
Squirt PERSON
Madame Noelle PERSON
Hilary PERSON
Katie Beth PERSON
Matthew Braddock PERSON
Ameslan PERSON
Matthew PERSON
Matt PERSON
Braddocks PERSON
Braddocks PERSON
Haley PERSON
Jessica PERSON
Jessi PERSON
Haley PERSON
Mallory PERSON
Geiger PERSON
Braddock PERSON
Reeboks PERSON
Jessica PERSON
Jessi PERSON
Mommy PERSON
Haley PERSON
Jessi PERSON
Braddock PERSON
Braddock PERSON
Matt PERSON
Matt PERSON
Haley PERSON
Braddock PERSON
Mommy PERSON
Haley PERSON
Braddock PERSON
Matt PERSON
Haley PERSON
Braddock PERSON
Matt PERSON
Matt PERSON
Matt PERSON
Braddock PERSON
Matt PERSON
Matt PERSON
Braddocks PERSON
Matt PERSON
Braddock PERSON
Matt PERSON
Braddock PERSON
Becca PERSON
K. PERSON
Braddock PERSON
Braddock PERSON
Matt PERSON
Braddocks PERSON
Braddock PERSON
Matt PERSON
Haley PERSON
Braddock PERSON
Braddock PERSON
Matt PERSON
Braddock PERSON
Matt PERSON
Braddock PERSON
Matt PERSON
Matt PERSON
Jessi PERSON
Matt PERSON
Braddock PERSON
Matt PERSON
Matt PERSON
Neat PERSON
Braddock PERSON
Matt PERSON
Halley PERSON
Comet PERSON
Braddock PERSON
Haley PERSON
Matt PERSON
Braddock PERSON
Braddpck PERSON
Matt PERSON
Haley PERSON
Matt PERSON
Matt PERSON
Haley PERSON
Haley PERSON
Matt PERSON
Haley PERSON
Tons PERSON
Braddock PERSON
Braddock PERSON
016_05_en.txt
Jenny PERSON
Jessi PERSON
Braddocks PERSON
Jenny PERSON
Mary Anne Spier PERSON
Jenny Prezzioso PERSON
Mary Anne's PERSON
Jenny PERSON
Mary Anne's PERSON
Mary Anne PERSON
Jenny PERSON
Becca PERSON
Jenny PERSON
Mary Anne PERSON
Mary Janes PERSON
Jenny PERSON
Jenny PERSON
Jenny PERSON
Prezzioso PERSON
Mary Anne PERSON
Jenny PERSON
Mary Anne PERSON
Jenny PERSON
Jenny PERSON
Mary Anne PERSON
Mary Anne PERSON
Jenny PERSON
Mary Anne PERSON
Candy Land PERSON
Squirrel Nutkin PERSON
Jenny PERSON
Mary Anne PERSON
Jenny PERSON
Mary Anne PERSON
Finger PERSON
Finger PERSON
Jenny PERSON
Mary Anne PERSON
Mary Anne PERSON
Mary Anne PERSON
Mary Anne PERSON
Mary Anne PERSON
Jenny PERSON
Jenny PERSON
Windbreaker PERSON
Jenny PERSON
Jenny PERSON
Mary Anne PERSON
Jenny PERSON
Mary Anne PERSON
Matt PERSON
Haley PERSON
Braddocks PERSON
Braddocks PERSON
Braddock PERSON
Matt PERSON
Haley PERSON
Mary Anne PERSON
Jenny PERSON
Braddocks PERSON
Mary Anne PERSON
Braddocks PERSON
Jenny PERSON
Braddocks PERSON
Jenny PERSON
Mary Anne PERSON
Matt PERSON
Jenny PERSON
Haley PERSON
Matt PERSON
Matt PERSON
Jenny PERSON
Matt PERSON
Jenny PERSON
Jenny PERSON
Haley PERSON
Matt PERSON
Mary Anne PERSON
Haley PERSON
Haley PERSON
Mary Anne PERSON
Jessi PERSON
Matt PERSON
Matt PERSON
STINK PERSON
Matt PERSON
Haley PERSON
Matt PERSON
Deaf PERSON
Braddocks PERSON
Matt PERSON
Braddocks PERSON
Matt PERSON
016_06_en.txt
Matt PERSON
Matt PERSON
Haley PERSON
Matt PERSON
Haley PERSON
Matt PERSON
Braddock PERSON
Braddocks PERSON
Braddock PERSON
Matt PERSON
Braddock PERSON
Braddock PERSON
Braddock PERSON
Matt PERSON
Haley PERSON
Haley PERSON
Braddock PERSON
Matt PERSON
Matt PERSON
Haley PERSON
Jessi PERSON
Braddock PERSON
Haley PERSON
Matt PERSON
Braddock PERSON
Matt PERSON
Matt PERSON
Matt PERSON
Haley PERSON
Braddock PERSON
Matt PERSON
Matt PERSON
Matt PERSON
Matt PERSON
Braddocks PERSON
Mary Anne PERSON
Jenny Prezzioso PERSON
Matt PERSON
Haley PERSON
Becca PERSON
Charlotte Johanssen PERSON
Becca PERSON
Matt PERSON
Haley PERSON
Pikes PERSON
Haley PERSON
Haley PERSON
Matt PERSON
Haley PERSON
Matt PERSON
Vanessa PERSON
Haley PERSON
Matt PERSON
Matt PERSON
Matt PERSON
Matt PERSON
Mallory PERSON
Suzi PERSON
Haley PERSON
Matt PERSON
Mallory PERSON
Pike PERSON
Barretts PERSON
Haley PERSON
Ursula Nordstrom PERSON
Matt PERSON
Margo PERSON
Vanessa PERSON
Claire PERSON
Pike PERSON
Matt PERSON
Claire PERSON
Haley PERSON
Claire PERSON
Matt PERSON
Matt PERSON
Matt PERSON
Buddy Barrett PERSON
Matt PERSON
Buddy PERSON
Matt PERSON
Pikes PERSON
Braddock PERSON
Haley PERSON
Matt PERSON
Matt PERSON
Haley PERSON
Pikes PERSON
016_07_en.txt
Jessi PERSON
Jessi PERSON
Mal PERSON
Dawn PERSON
Mal PERSON
Dawn PERSON
Pike PERSON
Dawn PERSON
Adam PERSON
Pike PERSON
Pike PERSON
Pike PERSON
Mallory PERSON
Pike PERSON
Vanessa PERSON
Claire PERSON
Dawn PERSON
Pike PERSON
Dawn PERSON
Pike PERSON
Adam PERSON
Claire PERSON
Adam PERSON
bush PERSON
Mallory PERSON
Adam PERSON
Byron PERSON
Dawn PERSON
Mallory PERSON
Adam PERSON
Mallory PERSON
Mallory PERSON
Dawn PERSON
Nicky PERSON
Dawn PERSON
Byron PERSON
Margo PERSON
Vanessa Pike PERSON
Dawn PERSON
Mallory PERSON
Dawn PERSON
Mal PERSON
Dawn PERSON
Dawn PERSON
Vanessa PERSON
Wiggle PERSON
Claire PERSON
Dawn PERSON
Mal PERSON
Vanessa PERSON
Jordan PERSON
Dawn PERSON
Dawn PERSON
Jessi PERSON
Dawn PERSON
Haley PERSON
Matt PERSON
Mallory PERSON
Dawn PERSON
Matt PERSON
Pikes PERSON
Matt PERSON
Haley PERSON
016_08_en.txt
Madame Noelle PERSON
Mademoiselle Parsons PERSON
Katie Beth PERSON
Mademoiselle Bramstedt PERSON
Mary PERSON
Mademoiselle Romsey PERSON
Katie Beth PERSON
Katie Beth PERSON
Hilary PERSON
Katie Beth PERSON
Katie Beth PERSON
Katie Beth PERSON
Katie PERSON
Katie PERSON
Becca PERSON
Katie Beth PERSON
Adele PERSON
Adele PERSON
Katie Beth PERSON
Katie Beth PERSON
Katie Beth PERSON
Madame Noelle PERSON
Katie Beth PERSON
Katie Beth PERSON
Katie Beth PERSON
Becca PERSON
Katie PERSON
Adele PERSON
Adele PERSON
Katie Beth PERSON
Katie Beth PERSON
Adele PERSON
Katie Beth PERSON
Matt PERSON
Matt PERSON
Adele PERSON
Braddocks PERSON
Matt PERSON
Katie Beth PERSON
Katie Beth PERSON
Matt PERSON
Katie Beth PERSON
Katie Beth PERSON
Adele PERSON
Parsonses PERSON
Adele PERSON
Katie Beth PERSON
Adele PERSON
Adele PERSON
Katie Beth's PERSON
Katie Beth PERSON
Adele PERSON
Katie Beth PERSON
Mallory PERSON
016_09_en.txt
Karen Andrew PERSON
David Micheal PERSON
Kristy PERSON
Sam PERSON
Charlie PERSON
Karen PERSON
Karen PERSON
Matt PERSON
Kristy PERSON
Karen PERSON
Brewer PERSON
Karen Brewer PERSON
Kristy PERSON
Kristy PERSON
Karen PERSON
Andrew PERSON
Porter PERSON
Morbidda Destiny PERSON
Ben Brewer PERSON
Karen PERSON
Karen PERSON
Andrew PERSON
David Michael PERSON
Kristy PERSON
Kristy PERSON
Sam PERSON
Charlie PERSON
Kristy PERSON
Karen PERSON
Kristy PERSON
Karen PERSON
Karen PERSON
Karen PERSON
Kristy PERSON
Brewer PERSON
Brewer PERSON
Elizabeth PERSON
Karen PERSON
Andrew PERSON
Brewer PERSON
Andrew PERSON
Karen PERSON
David Michael's PERSON
Karen PERSON
Karen PERSON
David Michael's PERSON
Brewer PERSON
Andrew PERSON
Brewer PERSON
Papadakises PERSON
Karen PERSON
Andrew PERSON
David Michael PERSON
Andrew PERSON
Karen PERSON
Andrew PERSON
David Michael PERSON
Karen PERSON
Karen PERSON
Karen PERSON
Andrew PERSON
Karen PERSON
Karen PERSON
Andrew PERSON
David Michael PERSON
Karen PERSON
David Michael PERSON
Karen PERSON
Karen PERSON
Jessi Ramsey PERSON
Becca PERSON
Sometimes Mal PERSON
Karen PERSON
Karen PERSON
Karen PERSON
Kristy PERSON
Karen PERSON
Karen PERSON
Karen PERSON
Andrew PERSON
Karen PERSON
Andrew PERSON
Andrew PERSON
Andrew PERSON
Karen PERSON
Andrew PERSON
Karen PERSON
Andrew PERSON
Andrew PERSON
Andrew PERSON
Karen PERSON
Tickly PERSON
Moosie PERSON
Karen PERSON
Karen PERSON
Karen PERSON
Karen PERSON
Moosie PERSON
Andrew PERSON
Karen PERSON
Karen PERSON
Karen PERSON
Andrew PERSON
David Michael's PERSON
016_10_en.txt
Braddocks PERSON
Matt PERSON
Haley PERSON
Braddock PERSON
Haley PERSON
Braddock PERSON
Haley PERSON
Matt PERSON
Matt PERSON
Pikes PERSON
Jenny Prezzioso PERSON
Matt PERSON
Matt PERSON
Haley PERSON
Vanessa PERSON
Nicky Pike PERSON
Vanessa PERSON
Haley PERSON
Matt PERSON
Buddy Barrett PERSON
Haley PERSON
Matt PERSON
Matt PERSON
Buddy Barrett PERSON
Braddocks PERSON
Matt PERSON
Vanessa PERSON
Haley PERSON
Haley PERSON
Matt PERSON
Matt PERSON
Jordan PERSON
Haley PERSON
Matt PERSON
Haley PERSON
Matt PERSON
Matt PERSON
Matt PERSON
Haley PERSON
Matt PERSON
Haley PERSON
Mama PERSON
Squirt PERSON
Matt PERSON
Haley PERSON
Matt PERSON
Matt PERSON
Adam PERSON
Matt PERSON
Adam PERSON
Jessi PERSON
Squirt PERSON
Becca PERSON
Becca PERSON
Haley PERSON
Matt PERSON
Matt PERSON
Yup PERSON
Becca PERSON
Becca PERSON
Squirt PERSON
Squirt PERSON
Dur-bliss PERSON
Matt PERSON
Matt PERSON
Matt PERSON
Helen Keller PERSON
Haley PERSON
Matt PERSON
Haley PERSON
Haley PERSON
Adele PERSON
Mme Noelle's PERSON
Haley PERSON
Mme Noelle PERSON
016_11_en.txt
Mme Noelle PERSON
Braddock PERSON
Kristy PERSON
Mal PERSON
Kristy PERSON
Mary Anne PERSON
Dawn PERSON
Mal PERSON
Mal PERSON
Yodels PERSON
Kristy PERSON
Mal PERSON
Ring-Ding PERSON
Kristy PERSON
Kristy PERSON
Charlie PERSON
Kristy PERSON
Charlie PERSON
Kristy PERSON
Dawn PERSON
Kristy PERSON
Dawn PERSON
Kristy PERSON
Kristy PERSON
Mary Anne PERSON
Kristy PERSON
Mary Anne PERSON
Kristy PERSON
Jessi PERSON
Matt PERSON
Haley PERSON
Matt PERSON
Haley PERSON
Matt PERSON
Kristy PERSON
Dawn PERSON
Mary Anne PERSON
Kristy PERSON
Braddocks PERSON
Dawn PERSON
Logan PERSON
Logan PERSON
Mary Anne's PERSON
Mary Anne PERSON
Logan PERSON
Jessi PERSON
Braddocks PERSON
Mary Anne's PERSON
Dawn PERSON
Kristy PERSON
Dawn PERSON
Squirt PERSON
Daddy PERSON
Tigger PERSON
Mary Anne PERSON
Logan PERSON
Dawn PERSON
Mary Anne PERSON
Kristy PERSON
Kristy PERSON
Charlie PERSON
Dawn PERSON
Kristy PERSON
Mal PERSON
Mal PERSON
Mal PERSON
Braddock PERSON
Bye PERSON
Matt PERSON
Kristy PERSON
Mary Anne PERSON
Mal PERSON
Coppélia PERSON
016_12_en.txt
Braddock PERSON
Braddock PERSON
DEAF PERSON
Braddock PERSON
Braddock PERSON
Matt PERSON
Braddock PERSON
Matt PERSON
Braddock PERSON
Matt PERSON
Matt PERSON
Braddock PERSON
Braddock PERSON
Jessi PERSON
Matt PERSON
Braddock PERSON
Matt PERSON
Braddock PERSON
Jessi PERSON
Frank PERSON
Matt PERSON
Braddock PERSON
Braddock PERSON
Matt PERSON
Frank PERSON
Matt PERSON
Frank PERSON
Matt PERSON
Braddock PERSON
Frank PERSON
Frank PERSON
Frank PERSON
Jessi Ramsey PERSON
Matt PERSON
Frank PERSON
Jessi PERSON
Frank PERSON
Matt Braddock PERSON
Jessi PERSON
Frank PERSON
Frank PERSON
Braddock PERSON
Coppélia PERSON
Matt PERSON
Frank PERSON
Matt PERSON
Matt PERSON
Frank PERSON
Frank PERSON
Frank PERSON
Braddock PERSON
Matt PERSON
Matt PERSON
Matt PERSON
Matt PERSON
016_13_en.txt
Squirt PERSON
Becca PERSON
Squirt PERSON
Jessi PERSON
Braddocks PERSON
Becca PERSON
Becca PERSON
Kristy PERSON
Stacey McGill PERSON
Squirt PERSON
Mama PERSON
Squirt PERSON
Becca PERSON
Kristy PERSON
Becca PERSON
Millikan PERSON
Millikan PERSON
Becca PERSON
Kristy PERSON
Becca PERSON
Becca PERSON
Becca PERSON
Kristy PERSON
Pinky Pye PERSON
Cats PERSON
Squirt PERSON
Kristy PERSON
Becca PERSON
Becca PERSON
Squirt PERSON
Kristy PERSON
Becca PERSON
Becca PERSON
Kristy PERSON
Show Squirt PERSON
Ga-ga PERSON
Becca PERSON
Becca PERSON
Squirt PERSON
Becca PERSON
Kristy PERSON
Kristy PERSON
Kristy PERSON
Kristy PERSON
Becca PERSON
Kristy PERSON
Matt PERSON
Becca PERSON
Becca PERSON
Mom PERSON
Kristy PERSON
Becca PERSON
Kristy PERSON
Charlotte PERSON
Becca PERSON
Becca PERSON
Becca PERSON
Stacey McGill PERSON
Stacey PERSON
Becca PERSON
Stacey PERSON
Becca PERSON
Charlotte PERSON
Becca PERSON
Charlotte PERSON
Becca PERSON
Becca PERSON
Kristy PERSON
Charlotte PERSON
Char PERSON
Becca PERSON
Jessi PERSON
Charlotte PERSON
Becca PERSON
Jessi PERSON
Kristy PERSON
Becca PERSON
Kristy PERSON
Becca PERSON
Kristy PERSON
Kristy PERSON
Squirt PERSON
Squirt PERSON
Char PERSON
Kristy PERSON
Kristy PERSON
Becca PERSON
Braddock PERSON
Kristy Thomas PERSON
016_14_en.txt
Braddocks PERSON
Mme Noelle PERSON
Frank PERSON
Matt PERSON
Becca PERSON
Kristy PERSON
Matt PERSON
Mary Anne PERSON
Kristy PERSON
Kristy PERSON
Dawn PERSON
Mary Anne PERSON
Logan Bruno PERSON
Mary Anne's PERSON
Braddock PERSON
Matt PERSON
Braddock PERSON
Haley PERSON
Coppélia PERSON
Coppelius PERSON
Ready PERSON
Mme Noelle PERSON
Braddock PERSON
Braddock PERSON
Braddock PERSON
Braddock PERSON
Carolyn Braddock PERSON
Haley PERSON
Matt PERSON
Mme Noelle PERSON
Katie Beth PERSON
Katie Beth PERSON
Adele PERSON
Matt PERSON
Katie Beth PERSON
Matt PERSON
Adele PERSON
Katie Beth PERSON
Burgomaster PERSON
Braddock PERSON
Coppelius PERSON
Coppelius PERSON
Braddock PERSON
Braddock PERSON
Christopher Gerber PERSON
Franz PERSON
Matt PERSON
Matt PERSON
Matt PERSON
Christopher PERSON
Matt PERSON
Adele PERSON
Katie Beth PERSON
Katie PERSON
Katie Beth PERSON
Adele PERSON
Christopher PERSON
Katie Beth PERSON
Matt PERSON
Adele PERSON
016_15_en.txt
Katie Beth PERSON
Katie Beth PERSON
Katie Beth PERSON
Adele PERSON
Jessi PERSON
Katie Beth PERSON
Katie PERSON
babe PERSON
Jessi PERSON
Jessi PERSON
Becca PERSON
Grandma PERSON
Grandpa PERSON
Braddock PERSON
Haley PERSON
Dawn PERSON
Mary Anne PERSON
Matt PERSON
Mal PERSON
Mal PERSON
Jessi PERSON
Keisha PERSON
Mal PERSON
Mal PERSON
Grandma PERSON
Grandpa PERSON
Keisha PERSON
Keisha PERSON
Keisha PERSON
Katie Beth PERSON
Keisha PERSON
Matt PERSON
Haley PERSON
Braddock PERSON
Katie Beth PERSON
Mal PERSON
Jessi PERSON
Mal PERSON
Mal PERSON
Keisha PERSON
Keisha PERSON
Mallory PERSON
Keisha PERSON
Mal PERSON
Keisha PERSON
Jessi PERSON
Keisha PERSON
Mal PERSON
Mal PERSON
Mal PERSON
Adele PERSON
Katie Beth PERSON
Matt PERSON
Adele PERSON
Katie Beth PERSON
Katie Beth PERSON
Jessi PERSON
Jessi PERSON
Katie Beth PERSON
Katie Beth PERSON
Katie Beth PERSON
Adele PERSON
Katie Beth PERSON
Matt PERSON
Jessi PERSON
Daddy PERSON
Becca PERSON
Grandma PERSON
Grandpa PERSON
Mal PERSON
Kristy PERSON
Dawn PERSON
Mary Anne PERSON
Braddock PERSON
Matt PERSON
Braddocks PERSON
Kristy PERSON
Jessi PERSON
Mary Anne PERSON
Matt PERSON
Haley PERSON
Matt PERSON

12. English NER for places

#Sort all the files in the directory you specified above, alphabetically.
#For each of those files...
for filename in sorted(os.listdir(enfiledirectory)):
    #If the filename ends with .txt (i.e. if it's actually a text files)
    if filename.endswith('.txt'):
        #Write out below the name of the file
        print(filename)
        #The file name of the output file adds _ner_loc to the end of the file name of the input file
        outfilename = filename.replace('.txt', '_ner_loc.txt')
        #Open the infput filename
        with open(filename, 'r') as f:
            #Create and open the output filename
            with open(outfilename, 'w') as out:
                #Read the contents of the input file
                chaptertext = f.read()
                #Do English NLP on the contents of the input file
                chapterner = ennlp(chaptertext)
                #For each recognized entity
                for ent in chapterner.ents:
                    #If that entity is labeled as a person
                    if ent.label_ == 'GPE':
                        #Print the entity, and the label (which should be PER)
                        print(ent.text, ent.label_)
                        #Write the entity to the output file
                        out.write(ent.text)
                        #Write a newline character to the output file
                        out.write('\n')
016_01_en.txt
Mexico GPE
Stoneybrook GPE
Connecticut GPE
Stoneybrook GPE
Oakley GPE
New Jersey GPE
Stoneybrook GPE
Oakley GPE
Stamford GPE
Connecticut GPE
Stoneybrook GPE
Stamford GPE
Stoneybrook GPE
Oakley GPE
Becca GPE
Coppélia GPE
Stamford GPE
Oakley GPE
Coppélia GPE
Daddy GPE
016_02_en.txt
Stoneybrook GPE
Stoneybrook GPE
Claudia GPE
Claudia GPE
Claudia GPE
New York City GPE
Stoneybrook GPE
California GPE
Mallory GPE
California GPE
California GPE
016_03_en.txt
Coppélia GPE
Hilary GPE
Hilary GPE
Hilary GPE
Coppélia GPE
the Donce of the Hours GPE
Coppélia GPE
Coppélia GPE
Hilary GPE
016_04_en.txt
Coppélia GPE
Stamford GPE
016_05_en.txt
brat GPE
brat GPE
Stamford GPE
U.S. GPE
Stoneybrook GPE
016_06_en.txt
Coppélia GPE
Stoneybrook GPE
Mallory GPE
016_07_en.txt
Jordan GPE
Jordan GPE
Jordan GPE
Nicky GPE
Nicky GPE
Nicky GPE
Nicky GPE
016_08_en.txt
Coppélia GPE
Hilary GPE
Stamford GPE
Adele GPE
Massachusetts GPE
Adele GPE
016_09_en.txt
Claudia GPE
Claudia GPE
Claudia GPE
Legos GPE
Legos GPE
Claudia GPE
Tickly GPE
016_10_en.txt
Nicky GPE
Nicky GPE
Nicky GPE
Becca GPE
Squirt GPE
Becca GPE
Stamford GPE
Becca GPE
Braddock GPE
Coppélia GPE
016_11_en.txt
Kishis GPE
Claudia GPE
Mallory GPE
California GPE
California GPE
Stamford GPE
016_12_en.txt
Stamford GPE
Coppélia GPE
Coppélia GPE
016_13_en.txt
Charlotte GPE
Becca GPE
Squirt GPE
Stamford GPE
Becca GPE
Becca GPE
New Jersey GPE
Oakley GPE
Stoneybrook GPE
Squirt GPE
Stoneybrook GPE
Stoneybrook GPE
Charlotte GPE
Charlotte GPE
Charlotte GPE
Copernicus GPE
Charlotte GPE
Squirt GPE
the Polanski Sisters GPE
Charlotte GPE
Charlotte GPE
016_14_en.txt
Coppélia GPE
Charlotte GPE
Stamford GPE
New Jersey GPE
Claudia GPE
Mal GPE
Stamford GPE
Coppélia GPE
016_15_en.txt
Adele GPE
Kristy GPE
Claudia GPE
Mal GPE
Adele GPE
Keisha GPE
Keisha GPE
Keisha GPE
Keisha GPE
Claudia GPE
us GPE
Ambrosia GPE

13. English NER for organizations

I think my favorite entity type is ORG, which gets you everything from Oakley Elementary to Swanilda to Mama. (Yes, friends, “Daddy” is a PERSON but “Mama” is an ORG.)

#Sort all the files in the directory you specified above, alphabetically.
#For each of those files...
for filename in sorted(os.listdir(enfiledirectory)):
    #If the filename ends with .txt (i.e. if it's actually a text files)
    if filename.endswith('.txt'):
        #Write out below the name of the file
        print(filename)
        #The file name of the output file adds _ner_org to the end of the file name of the input file
        outfilename = filename.replace('.txt', '_ner_org.txt')
        #Open the infput filename
        with open(filename, 'r') as f:
            #Create and open the output filename
            with open(outfilename, 'w') as out:
                #Read the contents of the input file
                chaptertext = f.read()
                #Do English NLP on the contents of the input file
                chapterner = ennlp(chaptertext)
                #For each recognized entity
                for ent in chapterner.ents:
                    #If that entity is labeled as a person
                    if ent.label_ == 'ORG':
                        #Print the entity, and the label (which should be PER)
                        print(ent.text, ent.label_)
                        #Write the entity to the output file
                        out.write(ent.text)
                        #Write a newline character to the output file
                        out.write('\n')
016_01_en.txt
Mama ORG
Oakley General Hospital ORG
Oakley General ORG
Oakley ORG
Oakley Elementary ORG
Oakley ORG
Baby ORG
Club ORG
Squirt ORG
Squirt ORG
Squirt ORG
Squirt ORG
Mama ORG
Mama ORG
Franz ORG
Swanilda ORG
Swanilda ORG
Franz ORG
Swanilda ORG
Coppélia ORG
Swanilda ORG
Franz ORG
Mama ORG
Mama ORG
Baby ORG
Club ORG
016_02_en.txt
Baby ORG
Club ORG
Baby ORG
Club ORG
Baby ORG
Club ORG
Claudia Kishi ORG
Thomases ORG
Claudia ORG
Stoneybrook Middle School ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Baby ORG
Club ORG
Claudia ORG
Claudia ORG
Claudia ORG
Haley ORG
Claudia ORG
016_03_en.txt
Madame ORG
the Chinese Doll ORG
The Chinese Doll ORG
Swanilda ORG
Swanilda ORG
ME ORG
Swanilda ORG
Swanilda ORG
Swanilda ORG
Franz ORG
Jessi ORG
Swanilda ORG
016_04_en.txt
Mama ORG
Madame Noelle ORG
Mama ORG
Mama ORG
American Sign Language ORG
Baby ORG
Club ORG
Haley ORG
Haley ORG
the American Sign Language ORG
Jessi ORG
Haley ORG
Haley ORG
Haley ORG
the American Sign Language Dictionary ORG
016_05_en.txt
Prezziosos ORG
Prezziosos ORG
Prezziosos ORG
Chutes ORG
Ladders ORG
Haley ORG
Haley ORG
Haley ORG
016_06_en.txt
Haley ORG
Haley ORG
Haley ORG
Haley ORG
N-I-C-K-Y.

 ORG
Pikes ORG
Buddy ORG
Pikes ORG
Haley ORG
Patriots ORG
Haley ORG
016_07_en.txt
Pikes ORG
Pikes ORG
Margo ORG
Mallory ORG
Margo ORG
Margo ORG
Margo ORG
Margo ORG
Pikes ORG
Pikes ORG
Margo ORG
Byron ORG
Pikes ORG
Margo ORG
Swanilda ORG
American Sign Language ORG
Haley ORG
016_08_en.txt
Swanilda ORG
Haley ORG
Adele ORG
Adele ORG
American Sign Language ORG
Adele ORG
Adele ORG
Haley ORG
Parsonses ORG
Daddy ORG
Keisha ORG
016_09_en.txt
Kristys ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Kristy ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Fair ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Silence ORG
Claudia ORG
Brewers ORG
Claudia ORG
Brewers ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Baby ORG
Club ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Baby ORG
Club ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Claudia ORG
Jessi ORG
016_10_en.txt
the Stoneybrook Community Center ORG
Stoneybrook Elementary ORG
Barretts ORG
Haley ORG
Haley ORG
Stoneybrook Elementary ORG
Mama ORG
Squirt ORG
Haley ORG
Madame ORG
016_11_en.txt
Claudia ORG
Claudia ORG
Baby ORG
Club ORG
Claudia ORG
Claudia ORG
Claudia ORG
Ho-Ho's ORG
Claudia ORG
Claudia ORG
Claudia ORG
Oreo ORG
Claudia ORG
Haley ORG
Claudia ORG
Finger ORG
Claudia ORG
Claudia ORG
Claudia ORG
Club ORG
016_12_en.txt
Stoneybrook Middle School ORG
Mama ORG
016_13_en.txt
Jessi ORG
Charlotte Johanssen ORG
Mama ORG
Baby-Wipes ORG
Squirt ORG
Squirt ORG
Squirt ORG
Baby ORG
Club ORG
Chadotte ORG
Baby ORG
Club ORG
the Polanski Sisters ORG
Jessi ORG
Jessi ORG
Jessi ORG
016_14_en.txt
Baby ORG
Club ORG
Mama ORG
Squirt ORG
Swanilda ORG
Haley ORG
Haley ORG
Haley ORG
Haley ORG
Haley ORG
Swanilda ORG
Madame ORG
Swanilda ORG
Swanilda ORG
Haley ORG
Franz ORG
Swanilda ORG
Franz ORG
Swanilda ORG
Coppélia ORG
Swanilda ORG
Franz ORG
Swanilda ORG
Haley ORG
Swanilda ORG
Swanilda ORG
Haley ORG
Swanilda ORG
016_15_en.txt
Adele ORG
Adele ORG
Mama ORG
Keisha ORG
Oakley ORG
Swanilda ORG
Swanildas ORG
Franz ORG
Swanilda ORG
Swanilda ORG
Haley ORG
Mama ORG
Daddy ORG
Haley ORG
Claudia ORG
Claudia ORG

14. English NER for works of art

That said, there’s also a rare WORK_OF_ART entity type, exemplified by “Morning, Squirts”, “Hey, Jessi”, “On Top of Old Smoky” (depends on how you feel about folk music, I guess), and, my very favorite work of art, “Nope”.

#Sort all the files in the directory you specified above, alphabetically.
#For each of those files...
for filename in sorted(os.listdir(enfiledirectory)):
    #If the filename ends with .txt (i.e. if it's actually a text files)
    if filename.endswith('.txt'):
        #Write out below the name of the file
        print(filename)
        #The file name of the output file adds _ner_art to the end of the file name of the input file
        outfilename = filename.replace('.txt', '_ner_art.txt')
        #Open the infput filename
        with open(filename, 'r') as f:
            #Create and open the output filename
            with open(outfilename, 'w') as out:
                #Read the contents of the input file
                chaptertext = f.read()
                #Do English NLP on the contents of the input file
                chapterner = ennlp(chaptertext)
                #For each recognized entity
                for ent in chapterner.ents:
                    #If that entity is labeled as a person
                    if ent.label_ == 'WORK_OF_ART':
                        #Print the entity, and the label (which should be PER)
                        print(ent.text, ent.label_)
                        #Write the entity to the output file
                        out.write(ent.text)
                        #Write a newline character to the output file
                        out.write('\n')
016_01_en.txt
Morning, Squirts WORK_OF_ART
016_02_en.txt
Hey, Jessi WORK_OF_ART
016_03_en.txt
The Chinese Doll WORK_OF_ART
016_04_en.txt
Nope WORK_OF_ART
016_05_en.txt
016_06_en.txt
Nope WORK_OF_ART
016_07_en.txt
Pikes WORK_OF_ART
Pikes WORK_OF_ART
On Top of Old Smoky WORK_OF_ART
Mallory WORK_OF_ART
016_08_en.txt
016_09_en.txt
The Witch Next Door WORK_OF_ART
016_10_en.txt
016_11_en.txt
Hey, Mary Anne WORK_OF_ART
Hmphh WORK_OF_ART
Opening night! WORK_OF_ART
016_12_en.txt
016_13_en.txt
Millikan WORK_OF_ART
016_14_en.txt
016_15_en.txt

What now?

I reran the notebook for English, reluctantly limiting myself to the entity types held in common between the English and French models. So now I had, chapter-by-chapter, translation-by-translation (plus original English), all the person, location, and organization type entities.

That’s nice.

But that wasn’t my question: what I wanted to know was how the translators adapted people and place names. I had to figure out what to do with all these text files to get me closer to an answer.

I had some thinking to do.

Lee

So, before getting into the finer differences between the various translations and the approaches taken by the translators, can we just take a second to appreciate the narrative of this BSC story? Now, far from perfect as a disability narrative -  for instance, the main deaf character never gets to “speak” for himself, with instead his older sister “speaking” both for him and for herself - this is a really nuanced portrayal of difference and empathy. All I remember from when I read the book (sigh) 30 years ago was the dancing J sign for Jessi’s name, but now I’m struck by how Jessi/Jessie/Justine/Jessica is wise beyond her years, a reflection of her own experiences with being different in her new hometown. I got a little choked up as I read (at least the first of the four different times I read it) her efforts to bring the deaf schoolkids to the show, and how her frenemy at ballet finally connects with her own deaf sister.

And when Jessi said that her performance could never be perfect because “There was no way Swanilda could have been black, so I wasn’t perfect, but I knew I was dancing very well”? Gutted. In every French version(s).

There’s another layer to this discussion of translation as there is another language in the text: sign language. As pointed out in the narrative(s), there are many different variations of sign language, which means that the translators had to accurately “translate” the signs that were being done, as it wasn’t the same in the various French dialects.

So, a fun little close-reading exercise on my part, once I stopped crying, I mean, YOU’RE CRYING NOT ME.

In order of levels of translation/cultural adaptation, it goes: Belgium, Quebec, France. The Belgium translation takes great pains to situate the narrative in an environment that would be familiar to a European reader: the names have all been francisized (even Claudia Kishi becomes Julie Kishi), and the places have been localized as well. Justine and her family are not from New Jersey but Burkina Faso. Interestingly, they move to Neuville, France, and not Belgium. Sophie Lambert (aka Stacey McGill) moves back to Paris and not New York City, while Carole (aka Dawn) is “une vraie Provençale” rather than a California girl. These localized adaptations extend to any cultural reference (such as books and board games) and particularly in this volume, the gross-out songs Mallory/Marjorie’s brothers sing at dinner about spaghetti.

At least I think it is. I’m not so up on Franco-Belgium gross-out songs about pasta dishes.

In the Quebec translation, the names are all again mostly francisized (seriously, though, I knew kids with like ¾ of the last names chosen for the kids here), and the place names stay also mostly the same: Jessie Raymond is from New Jersey and Diane Dubreuil is from California, but Sophie Ménard moves back to Toronto. The cultural references are either rendered local (OH GOD LOOK HERE IS A REFERENCE TO FREAKING CAILLOU) or completely neutralized (like Claudia’s stash of now non-brand name snacks - a missed opportunity to get some local references to Vachon snack cakes). They sing the same spaghetti song in the Quebec translation as the Belgium one, but there is one really interesting difference choice that the Quebec translators make in multiple (ok, the two that I’ve read) volumes: if it is a question of how to translate something, they just ignore it and erase it.

For example, in this volume, in the original English, Jessi’s new ballet teacher keeps getting her name wrong. That is preserved in both the French from France and French from Belgium translations, but not the Quebec one, where Jessie just gets confused that she is being called Madame Raymond rather than by her first name. This might just be a result of the initial name-choice for Jessie in Quebec: it’s kinda hard to mess up Raymond. So rather than force the issue, it was just tweaked.

More interestingly, when reading of the mysteries in Quebec translation, California figured prominently (it’s one of the ones featuring the child actor from their hometown), but other than one reference to the kid coming home, all other references to California or things that may or may not have happened in California in other volumes are just simply removed and erased.

(Not to mention that having a FRANCOPHONE child star starring in a FRANCOPHONE show for FRANCOPHONE kids be recorded in California when OH I DON’T KNOW THERE IS A LOCAL FRANCOPHONE TV/MOVIE INDUSTRY IN MONTREAL and then remove all other references to California seems to me like a missed opportunity, but I digress. Seriously. Have them live in a suburb outside of Quebec City or Sherbrooke and then send him to Montreal to be a star. It works. It makes sense. I mean, there is no movie industry that I know of in Provençal so having the other Baby-Sitters visit Provençal and run into said child star unless that’s where he summers now makes zero sense but it didn’t stop the Belgium translators…)

Ok, wait, what were we talking about…

So, Belgium goes all-in on translation and adaptation, Quebec splits the difference (it would make sense that a kid from New Jersey or California would end up at a French school because THANKS BILL 101!), while France makes zero effort to mask that these are American kids doing American things and that while the story is in French, the references and names and such remain firmly American.

Most of the names remain largely English (although bizarrely Claudia Kishi becomes Claudia Koshi), as do the place names, except for Stoneybrooke (which would almost be like a sci-fi or fantasy place name that you see and have no idea how to pronounce for any French audience), which becomes Stamford, Connecticut, unlike Neuville, France or Quebec’s generic Nouville with no province/state/country associated with it. Even a cultural reference, in this case a book, gets changed to a more recognizable American title: Bambi. It is a different gross-out spaghetti song sung at dinner, which is not recognizable to me either, and may be the only notable instance of adaptation in the French-from-France translation.

The varying approaches to translating/adapting the text is perhaps most noticeable in how the three different translations deal with sign language. The Belgium translation goes into great detail about which version of sign language that is being used and taught. The Quebec translation drops in a mention of Québécois sign language, while the French-from-France translation is like, meh, it’s sign language, you get it. I don’t know enough about the different sign languages to know if the adaptations are accurate or not for the various different sign languages, but the French-from-France translation explains sign language even less than the original English text (which does identify that they are learning and using American Sign Language as opposed to Signed English or British Sign Language, which is actually based off of French Sign Language).

I didn’t notice (so thank-you Quinn for that helpful chart below!) that the reference to the architect of the neighborhood isn’t mentioned in the France or Belgium translations, which again, makes cultural sense. These “planned communities” and post-WWII suburbs are a uniquely North American phenomenon, but one that wouldn’t have been foreign to a Quebec audience (my grandparent’s neighborhood was like that, for instance) but wouldn’t have made much sense to a European audience, so even with France going all-in on the Americanness of the books, they still took care to ensure that the references weren’t so foreign as to be unrecognizable to a reader.

So, to recap, Belgium goes all-in on a European adaptation of the book, Quebec splits the difference, and France doesn’t seem to care, and in fact would seem to emphasize the americanness of the text. This all makes sense from a marketing perspective. Belgium is a smaller market, and so making the books hyper-local would limit the sales appeal to other Francophone readers. Quebec had a basically captured market, and their efforts reflect real efforts to appeal to the local market, while also not trying to make things too complicated for the army of translators they were employing to churn out these translations. France, well, France is France and France is gonna France, and probably figured who cares, you know it’s the USA, we know it’s the USA, you’re going to buy the books in part for that reason, so…let’s just make sure you can pronounce all the names.

I realize that this is probably not the level of cultural analysis that you all are expecting from this. But when you ask someone from Quebec to reflect on cultural/linguistic choices that France makes…we in Quebec have an Office de la langue française, one of whose tasks is to make up new French words every time there is a new English word that comes along. France says things like “stopping” and “shopping” where in Quebec WE DO NOT SAY SUCH THINGS USE THE PROPER FRENCH VERBS NOT SOME BASTARDIZED ENGLISH GERUND SAID WITH A PARISIAN ACCENT.

So that France goes all-in on the Americanness of the series isn’t surprising. That Quebec didn’t go more all-in in the adaptation to a local audience was, but given the limitations due to pressures to produce, it is understandable.

Quinn

Lee’s take on BSC names in translation had got me thinking. For starters, score one for close reading! If you’ve got four variations of a children’s novel, and want to say something about some aspect of that novel, you should not start by opening a Jupyter notebook. Put down your laptop, find a comfy chair, and just read the books with your own eyeballs. It definitely took Lee less time to do that than it took me to scan and OCR the books, reformat them into individual chapters, clean up the punctuation characters, and write some Python code. And by the end, she could sit down and write something. I, on the other hand, had 180 additional small text files on my laptop to show for it, and nothing new to say yet. Woot.

But then I thought about what Lee didn’t do there. Knowledgeable humans are great at providing a synthesis of the interesting things they’ve noticed in a text. What they’re less great at is being comprehensive, pulling out details that aren’t individually interesting, but may become interesting in the aggregate. It’s tedious work, and it’s only possible with a lot of manual labor – unless you use digital tools. (And even still, let’s face it, there’s a lot of manual work that goes into getting your text ready for digital tools.)

BSC Bible cover

In my moment of doubt, I turned to the Bible – the BSC Bible, that is. Smith College’s Special Collections finding aid for the Ann M. Martin papers actually includes that as an alternate title for The Complete Guide to The Baby-Sitters Club, the complete (through 1996) compendium of all information about all people (my PER/PERSON entities), places (LOC/GPE), and things (including ORGs) in the Baby-Sitters Club universe. And by all, I mean all. Find any arbitrary character, however niche, and this book has all the information in the canon. Does anyone remember Nicole Lavista, one of the “Battle of the Bakers” daycare kids in BSC Mystery #21: Claudia and the Recipe for Danger? Me, neither. But she’s six years old, has “hair in black curls”, is “full of tricks”, and “loves to draw and paint”. There’s even page-level citations, though they do me no good since we chose to remove page numbers from the text file output of our OCR. All this information is already compiled for the English series through the BSC Bible, originally as a resource for the ghostwriters. But what might we learn from trying to recreate something similar from the ground up, for the universe of each of the French translations? In addition to the question of how much localization has been done, the question of consistency intrigues me. If you’re not localizing much of anything, it’s less of an issue, but was the pool of translators in Montreal comparing notes about how they adapted the names of peripheral but at least occasionally-recurring characters?

Rethinking NER

I wanted to write up NER for this book because, hey, names! And I’ve done it before, and it seemed like a fun opportunity to compare the performance of spaCy’s French and English models. Ultimately, though, NER makes more sense as the primary method of finding names and places when you’re dealing with a much larger corpus, and/or a much more heterogenous one than Baby-Sitters Club books. It’s feasible to make a list of characters and places within a single fictional universe, like what you can find in the BSC Bible, even if you have all 200ish novels in translation. It’s much less feasible if you have 200 novels, each in its own universe – let alone 2,000 or more.

So where to go from here? It’d be fun to try to annotate some texts and see if I can train a better NER model for the Baby-Sitters Club (especially the French), but that’s a topic for another DSC Multilingual Mystery. Instead, what I have now for locations, I can use to cross-check with the English, for an easy and interesting source of likely localizations and differences. With names, I can hopefully use NER to identify characters that are newly-introduced in each book (or perhaps re-introduced under a different localized name?), and then add those names to the list of known characters in that translated universe. That curated list, rather than NER itself, will be the basis for checking translations for references to characters. And another thing: for best chances of identifying new characters, I think I’ll stop limiting the NER results to just the entities flagged PER. Too many character names are showing up as LOC or ORG to trust the classification. As for the ORG entities specifically, they’re almost all errors of one sort or another. I can’t think of any good research questions offhand that deal with organizations in this universe, so I think I won’t worry about them.

Close-reading some distant-reading outputs

For lulz, I threw the location NER per-chapter output for the Belgian corpus into Voyant, and even the word cloud was mostly a testament to how badly the model performed at classifying entities as locations.

Voyant visualization of allegedly location named entities, but it's mostly names

So in the end, I deleted all those little NER output files I’d generated with the Jupyter notebook, with aspirations of somehow comparing them programmatically. Instead, those Python print() statements in the notebook were what I consulted – using my own eyeballs. Because the chapters are so short, it’s not hard to trace place names back to the context, and then find that same context in the original. I’m about halfway there to imagining how I’d implement something more scalable in Python for checking at least the entities that refer to people (whether or not they’re tagged that way). But honestly, for my own process, I find that I need to spend time cleaning data manually before I feel like I understand it well enough to come up with a workable programmatic approach.

Here’s some things I found following up on some of the words tagged as locations in the NER:

–In the Belgian translation, Justine Victoire is from Ouagadougou, Burkina Faso (as Lee mentioned). Her best friend / cousin in Ouagadougou is named Johanna. Neuville is a small town not far from Aubrives, where her father works. Justine’s linguistic aptitude was put to the test on a family trip to Spain. Her sister’s name is adapted as Roseline, and her brother is Victor, AKA Gringalet.

–In the Quebec translation, like in the English original and France French translation, Jessie picked up Spanish in Mexico. Jessi’s best friend/cousin Keisha becomes Kara. Her sister is still Becca, but her brother becomes Jean-Philippe, AKA Jaja. To Lee’s point of the Quebec translation just deleting things when they complicate matters too much: there’s no mention of Jessi’s father working in another city. They moved to Nouville for his job, end of discussion.

–In the France French translation… well, see Lee’s section for details. It’s just like the English original, just in French. The only exception of note is that her little brother, John Philip Ramsey Junior (yes, that whole thing is his name in French too, including the “Junior”), is nicknamed “P’tit Bout” instead of Squirt. (Sorry for being baffled by your inclusion of “P’tit” among the entities, spaCy. You were right.)

–As of Belgian BSC #16, Sophie Lambert has moved back with her family to Paris. Meanwhile, Carole Leroy has moved to Neuville from Provence along with her brother (still named David). Both the Belgian and the France-French versions take a “just-the-facts” approach to Dawn’s situation, but the translators in Montreal were willing to take on Jessi’s compassionate editorializing: “Comme le dit si souvent Diane, sa famille est déchirée en deux, mais je suis certaine qu’elle va s’en sortir.” / “As Dawn pointed out, her family is now ripped in half. I think Dawn is a survivor, though.”

–In the Belgian translation, Jessi’s ballet school is in Aubrives (consistent with her father also working there).

–Belgian Mathieu thinks that Marseille will beat Monaco in soccer, rather than the Patriots winning the Super Bowl. In Quebec, Matthieu talks about the “Patriotes” winning the soccer eliminations – a sort of mixed-reference there. The French are all in for the Patriots and the Super Bowl.

And here’s some notes and a list of the name correspondences, working from the PER tags:

–Chapter 5 starts with “Brat, brat, brat.” in English. All the French translations skip that part and go straight into the next sentence about how everyone agrees that Jenny Prezzioso is spoiled and a little bratty (Lee adds: She really, really is. Reading it four times really drives that point home. How many synonyms in French for “spoiled brat”? A lot and it is amazing).

–Even without having to compare across books, I managed to find a naming inconsistency! One of Jessi’s dance colleagues, Mary Bramstedt, is given the role of a villager. In the Quebec version, her name is first translated as Marie Brazeau. Later, there’s a reference to Mademoiselle Croteau, which Jessi explains: “(c’est Marie, une citoyenne dans le ballet)”. So either the Quebec translator innovated a second Marie, dancing the same part as the first one, or… the translator forgot the surname they used earlier.

English

Quebec

Belgium

France

Kristy Thomas

Christine Thomas

Valérie Demoulin

Kristy Parker

Mary-Anne Spier

Anne-Marie Lapierre

Mélanie Moreau

Mary Anne Cook

Claudia Kishi

Claudia Kishi

Julie Kishi

Claudia Koshi

Stacey McGill

Sophie Ménard

Sophie Lambert

Lucy MacDouglas

Dawn Schafer

Diane Dubreuil

Carole Leroy

Carla Schafer

Mallory Pike

Marjorie Picard

Marjorie Levêque

Mallory Pike

Jessi Ramsay

Jessie Raymond

Justine Victoire

Jessica Ramsey

Becca Ramsay

Becca Raymond

Roseline

Rebecca Ramsey

John Philip Ramsey Jr. (“Squirt”)

Jean-Philippe (“Jaja”) Raymond

Victor “Gringalet”

John Philip Ramsey Junior (“P’tit Bout”)

Keisha (Jessi’s cousin)

Kara

Johanna

Keisha

Sam Thompson

Sébastien Thompson

Stéphane Demoulin

Samuel Parker

Charlie Thompson

Charles Thompson

Nicolas Demoulin

Charlie Parker

David Michael Thompson

David Thompson

Sébastien Demoulin

David Michael Parker

Watson Brewer

Guillaume Marchand

Yvan Arnould

Jim Lelland

Karen Brewer

Karen Marchand

Coralie Arnould

Karen Lelland

Andrew Brewer

André Marchand

Arnaud Arnould

Andrew Lelland

Janine Kishi

Josée Kishi

Laurence Kishi

Jane Koshi

Jeff Schafer

Julien Dubreuil

David Leroy

David Schafer

Tigger (Mary-Anne’s cat)

Tigrou

N/A

Tigrou

Logan Bruno

Louis Brunet

Bruno Lejeune

Logan Rinaldi

Shannon Kilbourne

Chantal Chrétien

Cécile Gauthier

Louisa Kilbourne

Matthew Braddock

Matthieu Biron

Mathieu Brinbeuf

Matthew Braddock

Haley Braddock

Hélène Biron

Agnès Brinbeuf

Helen Braddock

Madame Noelle

Mademoiselle Noëlle

Madame Dillon

Mme Noelle

Hilary (dancer)

Élizabeth

Hélène

Hilary

Katie Beth Parsons (dancer/frenemy)

Élizabeth Pellerin

Catherine

Katie Parson

Mary Bramstedt (dancer)

Marie Brazeau (Croteau in ch. 8)

Marie Bernstein

Mary Bramstedt

Lisa Jones (dancer)

Lise Jordan

Lise Jacqué

Lisa Jones

Carrie Steinfeld (dancer)

Carole St-Onge

Carine Schmitt

Carrie Steinfeld

Mr. Geiger (architect in Stoneybrook)

monsieur Gougeon

N/A (omits paragraph about all the houses looking the same)

N/A (omits paragraph about all the houses looking the same)

Jenny Prezzioso

Jeanne Prieur

Aurélie Precisio

Jenny Prezzioso

Charlotte Johanssen

Charlotte Jasmin

Charlotte Cuvelier

Charlotte Johanssen

Nicky Pike

Nicolas Picard

Laurent Levêque

Nicky Pike

Vanessa Pike

Vanessa

Vanessa Levêque

Vanessa Pike

Buddy Barrett

Bruno Barrette

Antoine Godefroid

Buddy Barrett

Suzi Barrett

Elodie Godefroid

Liz Barrett

Margo Pike

Margot Picard

Anaïs Levêque

Margot Pike

Claire Pike

Claire Picard

Juliette Levêque

Claire Pike

Byron Pike

Bernard Picard

Alain Levêque

Byron Pike

Adam Pike

Antoine Picard

Loïc Levêque

Adam Pike

Jordan Pike

Joël Picard

Samuel Levêque

Jordan Pike

Adele Parson (Katie Beth’s sister)

Adèle Pellerin

Adeline

Adèle Parson

Ben Brewer (ghost)

vieux Ben

Benoît Arnould

Ben Lelland

Mrs. Porter

Madame Portai

madame Rensonnet

Mme Porter

Morbidda Destiny

Destinée Morbide

Vieille Sorcière

Morbidda Destiny

Moosie (Karen’s stuffed cat)

N/A

Flocon

Moosie

Mrs. Frank (Matt’s teacher)

France

madame Franck

Mme Franck

Carolyn Braddock (Matt’s mother)

Caroline Biron

Caroline Brinbeuf

Carolyn Braddock

Christopher Gerber (dancer)

Christophe Baril

Christophe Gélin

Christopher Gerber

Where to next?

I find it tantalizing that my idea about the translators getting sloppy with peripheral character names already seems to be playing out even over the course of a single book. The risks are biggest for the Belgian and Quebec translations, which do more by way of localization, and we know that at least in Quebec they had multiple different translators. Lee made some progress in DSC Multilingual Mystery #1 on identifying the translators of all the French translations, using national library metadata. I think it’s time we clean up the metadata mess and make a clean spreadsheet we can use to support further inquiry.

To be continued…

Suggested Citation

Skallerup Bessette, Lee and Quinn Quinn. “DSC Multilingual Mystery 2: Beware, Lee and Quinn!”. February 27, 2020. https://datasittersclub.github.io/site/dscm2.html.