Powered by OpenAIRE graph
Found an issue? Give us feedback

EXO-POPP

Optical Extraction of Handwritten Named Entities for the Marriage Certificates of the Population of Paris (1880-1940)
Funder: French National Research Agency (ANR)Project code: ANR-21-CE38-0004
Funder Contribution: 385,532 EUR

EXO-POPP

Description

In Europe, the history of urban and suburban populations between the end of the 19th century and the Second World War is poorly known even though it was a time of profound transformation, largely linked to industrialisation and urbanisation. In France, historical demography has, until now, largely focused on the 1750–1830 period and especially on villages. Cities are less well understood, and even less their suburbs, because the size of their populations makes data collection time-consuming. With their very wide variety of populations, Paris and its suburbs offer an ideal setting to undesrtand better the rise of love marriages, the increase in divorce, and major changes in gender relations between 1880 and 1940 that seem to appear first in large cities. Thanks to a collaboration with specialists in machine learning, the EXO-POPP project will develop a database of 300,000 marriage certificates from Paris and its suburbs between 1880 and 1940. These marriage certificates provide a wealth of information about the bride and groom, their parents and their marriage witnesses, that will be analysed from a host of new angles made possible by the new dataset. These studies of marriage, divorce, kinship and social networks covering a span 60 years will also intersect with transversal issues such as gender, class and origin. The geolocation of data will provide a rare opportunity to work on places and relocations within the city, and linkage with two other databases will make it possible to follow people from birth to death. Building such a database by hand would take at least 50,000 hours of work. But, thanks to the recent developments in deep learning and machine learning, it is now possible to construct huge databases with automated reading systems including handwriting recognition and natural language understanding. Indeed, because of these recent advances, optical printed named entity recognition (OP-NER) is now perfoming very well when analysing regular texts such as financial yearbooks and old newspapers, and similar performance is now expected with printed marriage certificates from 1923 to 1940. On the other hand, while handwriting recognition by machine has become a reality, also thanks to deep learning, optical handwritten named entity recognition (OH-NER) has not received much attention. OH-NER is expected to achieve promising results on handwritten marriage certificates dating from 1880 to 1922. This project’s research questions will focus on the best strategies for word disambiguation for handwritten named entity recognition. We will explore end-to-end deep learning architectures for OH-NER, writer adaptation of the recognition system, and named entity disambiguation by exploiting the French mortality database (INSEE) and the French POPP database. An additional benefit of this study is that a unique and very large dataset of handwritten material for named entity recognition will be built. The EXO-POPP dataset will be a rich new asset in the field. In addition to its major contribution toward better understanding of research questions about marriage, migration, family and friend networks, divorce and separation, among many others, between 1880 and 1940, the EXO-POPP project will foster new collaborations between computer scientists and researchers in the humanities and social sciences to improve the recognition and the optic of characters and handwriting, which are now essential to provide valuable new tools for the processing of data sources, especially historical ones.

Data Management Plans
Powered by OpenAIRE graph
Found an issue? Give us feedback

Do the share buttons not appear? Please make sure, any blocking addon is disabled, and then reload the page.

All Research products
arrow_drop_down
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=anr_________::15a00bd5515690aad30e0121fabff3e0&type=result"></script>');
-->
</script>
For further information contact us at helpdesk@openaire.eu

No option selected
arrow_drop_down