Data Collection and Cleaning

Beyond the initial data cleaning expected to make the dataset usable in analysis, we needed to perform some operations before the EveryPolitician dataset was fit for purpose.

2. Creating a dataframe for all countries

gender id identifiers image images name birth_date links other_names given_name ... contact_details family_name death_date email sort_name honorific_prefix honorific_suffix national_identity summary patronymic_name
0 female 0a93b26d-ebc5-44f5-b4fa-935ae209620f [{'identifier': '127', 'scheme': 'everypolitic... http://www.parlamentra.org/upload/iblock/d1a/g... [{'url': 'http://www.parlamentra.org/upload/ib... Гамисония Эмма Алексеевна NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 male 0ac7e64e-b723-4bdb-85f6-81ff217a70fa [{'identifier': '211', 'scheme': 'everypolitic... http://www.parlamentra.org/upload/iblock/4dd/I... [{'url': 'http://www.parlamentra.org/upload/ib... Барганджия Гурам Юрьевич NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 male 0b515281-445d-49dd-9044-d886d85f0970 [{'identifier': '157', 'scheme': 'everypolitic... http://www.parlamentra.org/upload/iblock/244/c... [{'url': 'http://www.parlamentra.org/upload/ib... Чамагуа Леонид Михайлович NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 male 12d1a33c-daa4-496c-965a-4cd4749eda78 [{'identifier': '130', 'scheme': 'everypolitic... http://www.parlamentra.org/upload/iblock/c3b/t... [{'url': 'http://www.parlamentra.org/upload/ib... Цвижба Отари Шотович NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 male 15a55ab3-c81c-4f5f-ae75-ea5446dd5ef6 [{'identifier': '138', 'scheme': 'everypolitic... http://www.parlamentra.org/upload/iblock/30b/y... [{'url': 'http://www.parlamentra.org/upload/ib... Язычба Заур Гайдарович NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
55 male ed4838aa-19f8-493c-b8c4-58780b5b0d84 [{'identifier': '112', 'scheme': 'everypolitic... http://www.lagtinget.ax/files/sjolund_folke.jpg [{'url': 'http://www.lagtinget.ax/files/sjolun... Sjölund Folke 1943-12-16 [{'note': 'Wikipedia (fi)', 'url': 'https://fi... [{'lang': 'en', 'name': 'Folke Sjölund', 'note... Folke ... NaN Sjölund 2013-12-13 NaN NaN NaN NaN NaN NaN NaN
56 male f105a4a0-5fef-4be7-ab73-9376923491a7 NaN https://www.lagtinget.ax/sites/www.lagtinget.a... [{'url': 'https://www.lagtinget.ax/sites/www.l... John Holmberg 1967 NaN NaN NaN ... [{'type': 'email', 'value': 'john.holmberg@lag... NaN NaN john.holmberg@lagtinget.ax NaN NaN NaN NaN NaN NaN
57 male f4f995ae-126c-468f-8f26-deec1e26adc2 [{'identifier': '372', 'scheme': 'everypolitic... http://www.lagtinget.ax/files/asumaa_tony.jpg [{'url': 'http://www.lagtinget.ax/files/asumaa... Asumaa Tony 1968-09-15 [{'note': 'Wikipedia (en)', 'url': 'https://en... [{'lang': 'en', 'name': 'Tony Asumaa', 'note':... Tony ... NaN Asumaa NaN NaN NaN NaN NaN NaN NaN NaN
58 male fe40bb08-5025-473d-af52-bf1e30ecc5e6 [{'identifier': '88', 'scheme': 'everypolitici... http://www.lagtinget.ax/files/sundblom_torsten... [{'url': 'http://www.lagtinget.ax/files/sundbl... Sundblom Torsten 1951-09-15 [{'note': 'Wikipedia (fi)', 'url': 'https://fi... [{'lang': 'en', 'name': 'Torsten Sundblom', 'n... Torsten ... [{'type': 'email', 'value': 'torsten.sundblom@... NaN NaN torsten.sundblom@lagtinget.ax NaN NaN NaN NaN NaN NaN
59 male ff7b67a7-d3bf-4179-801d-7005059524e6 NaN https://www.lagtinget.ax/sites/www.lagtinget.a... [{'url': 'https://www.lagtinget.ax/sites/www.l... Fredrik Fredlund 1978 NaN NaN NaN ... [{'type': 'email', 'value': 'fredrik.fredlund@... NaN NaN fredrik.fredlund@lagtinget.ax NaN NaN NaN NaN NaN NaN

78382 rows × 21 columns

3. Filtering the data for the January Skew

Initially, the proportion of politicians with a January birthday was more than 4 times greater than any of the other months. Upon deeper analysis, this was attributed to the standardised value of 1 January being assigned to politicians for which birth month data was not available (as the pd.to_datetime function defaulted to this when no day/month information is provided). We identified this phenomenon as the January skew.

Finding all records of 1st January

After exploring a few methods to address this, we chose to omit outlier countries that had an unreasonable number of 1 Jan values. The threshold for the omission was computed by excluding countries where the ratio of records on January 1st was more than 10 times the expected ratio of records for Jan 1st - seven countries for which this threshold was violated were excluded from the analysis, including Syria and Cameroon for which the proportion of people born on 1 January were 98.2% and 26.8% respectively.

Finding countries with high ratio of 1st Jan births

Excluding countries with >10 times expected ratio of 1st Jan and >10 records

1 Jan Total Percentage
country
Bangladesh 18 549 3.278689
Cameroon 286 1064 26.879699
Pakistan 32 352 9.090909
Syria 269 274 98.175182
Turkey 397 6899 5.754457
Yemen 15 302 4.966887

This differs greatly from the global proportion of 8.97% of births in January. Hence such results were removed from our birth month analysis to safeguard data quality, and then we were able to perform our analyses.

4. Saving the included the data

gender id identifiers image images name birth_date links other_names given_name ... contact_details family_name death_date email sort_name honorific_prefix honorific_suffix national_identity summary patronymic_name
8 male 34352d83-6fa1-463d-a02f-6157b3adf36b [{'identifier': '141', 'scheme': 'everypolitic... http://www.parlamentra.org/upload/iblock/bfe/u... [{'url': 'http://www.parlamentra.org/upload/ib... Убирия Бежан Михайлович 1967-03-07 [{'note': 'Wikipedia (ru)', 'url': 'https://ru... [{'name': 'Бежан Убириа', 'note': 'alternate'}... NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
19 male 9fd33b27-fd4c-4eba-9a8f-d4d23f603c63 [{'identifier': '/m/03fqqs', 'scheme': 'freeba... http://www.parlamentra.org/upload/iblock/e1f/s... [{'url': 'http://www.parlamentra.org/upload/ib... Шамба Сергей Миронович 1951-03-15 [{'note': 'Wikipedia (ab)', 'url': 'https://ab... [{'lang': 'ab', 'name': 'Сергеи Шамба', 'note'... Sergey ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
28 male da988bab-32d4-46c0-bb7b-5c6a6eb129e7 [{'identifier': '46', 'scheme': 'everypolitici... http://www.parlamentra.org/upload/iblock/b85/%... [{'url': 'http://www.parlamentra.org/upload/ib... Бганба Валерий Рамшухович 1953-08-26 [{'note': 'Wikimedia Commons', 'url': 'https:/... [{'lang': 'cs', 'name': 'Valerij Bganba', 'not... Valeri ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
78 female 522dff9d-d21d-41b9-a7d5-c2321c819b11 [{'identifier': '1854', 'scheme': 'everypoliti... http://www.wolesi.website/Media/Images/mine/fa... [{'url': 'http://www.wolesi.website/Media/Imag... Farkhunda Zahra Naderi-Kabul 1981-04-19 [{'note': 'Wikimedia Commons', 'url': 'https:/... [{'lang': 'en', 'name': 'Farkhunda Zahra Nader... NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
172 female aac6f415-446d-4070-80d9-195d4f7b77ac [{'identifier': '1823', 'scheme': 'everypoliti... http://www.wolesi.website/Media/Images/mine/ra... [{'url': 'http://www.wolesi.website/Media/Imag... Rangina Kargar-Faryab 1985-03-22 [{'note': 'Wikimedia Commons', 'url': 'https:/... [{'lang': 'en', 'name': 'Rangina Kargar', 'not... NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
51 female e57828f1-e51a-41ea-a61b-7fc1b92e2b37 [{'identifier': '109', 'scheme': 'everypolitic... http://www.lagtinget.ax/files/dahl_ulla_britt.jpg [{'url': 'http://www.lagtinget.ax/files/dahl_u... Dahl Ulla-Britt 1946-03-14 [{'note': 'Wikipedia (fi)', 'url': 'https://fi... [{'lang': 'en', 'name': 'Ulla-Britt Dahl', 'no... Ulla-Britt ... NaN Dahl NaN NaN NaN NaN NaN NaN NaN NaN
52 male e6f6e01b-bcb0-4374-8912-7e1eaf492f10 [{'identifier': '97', 'scheme': 'everypolitici... http://www.lagtinget.ax/files/lindfors_henry.jpg [{'url': 'http://www.lagtinget.ax/files/lindfo... Lindström Henry 1956-06-20 [{'note': 'Wikipedia (fi)', 'url': 'https://fi... [{'lang': 'en', 'name': 'Henry Lindström', 'no... Henry ... NaN Lindström NaN NaN NaN NaN NaN NaN NaN NaN
55 male ed4838aa-19f8-493c-b8c4-58780b5b0d84 [{'identifier': '112', 'scheme': 'everypolitic... http://www.lagtinget.ax/files/sjolund_folke.jpg [{'url': 'http://www.lagtinget.ax/files/sjolun... Sjölund Folke 1943-12-16 [{'note': 'Wikipedia (fi)', 'url': 'https://fi... [{'lang': 'en', 'name': 'Folke Sjölund', 'note... Folke ... NaN Sjölund 2013-12-13 NaN NaN NaN NaN NaN NaN NaN
57 male f4f995ae-126c-468f-8f26-deec1e26adc2 [{'identifier': '372', 'scheme': 'everypolitic... http://www.lagtinget.ax/files/asumaa_tony.jpg [{'url': 'http://www.lagtinget.ax/files/asumaa... Asumaa Tony 1968-09-15 [{'note': 'Wikipedia (en)', 'url': 'https://en... [{'lang': 'en', 'name': 'Tony Asumaa', 'note':... Tony ... NaN Asumaa NaN NaN NaN NaN NaN NaN NaN NaN
58 male fe40bb08-5025-473d-af52-bf1e30ecc5e6 [{'identifier': '88', 'scheme': 'everypolitici... http://www.lagtinget.ax/files/sundblom_torsten... [{'url': 'http://www.lagtinget.ax/files/sundbl... Sundblom Torsten 1951-09-15 [{'note': 'Wikipedia (fi)', 'url': 'https://fi... [{'lang': 'en', 'name': 'Torsten Sundblom', 'n... Torsten ... [{'type': 'email', 'value': 'torsten.sundblom@... NaN NaN torsten.sundblom@lagtinget.ax NaN NaN NaN NaN NaN NaN

41381 rows × 21 columns