233
Data Collection and Cleaning
Beyond the initial data cleaning expected to make the dataset usable in analysis, we needed to perform some operations before the EveryPolitician dataset was fit for purpose.
1. Get index file with links to all country datasets
This is the function to get the country and legislature wide data
2. Creating a dataframe for all countries
gender | id | identifiers | image | images | name | birth_date | links | other_names | given_name | ... | contact_details | family_name | death_date | sort_name | honorific_prefix | honorific_suffix | national_identity | summary | patronymic_name | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | female | 0a93b26d-ebc5-44f5-b4fa-935ae209620f | [{'identifier': '127', 'scheme': 'everypolitic... | http://www.parlamentra.org/upload/iblock/d1a/g... | [{'url': 'http://www.parlamentra.org/upload/ib... | Гамисония Эмма Алексеевна | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | male | 0ac7e64e-b723-4bdb-85f6-81ff217a70fa | [{'identifier': '211', 'scheme': 'everypolitic... | http://www.parlamentra.org/upload/iblock/4dd/I... | [{'url': 'http://www.parlamentra.org/upload/ib... | Барганджия Гурам Юрьевич | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | male | 0b515281-445d-49dd-9044-d886d85f0970 | [{'identifier': '157', 'scheme': 'everypolitic... | http://www.parlamentra.org/upload/iblock/244/c... | [{'url': 'http://www.parlamentra.org/upload/ib... | Чамагуа Леонид Михайлович | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | male | 12d1a33c-daa4-496c-965a-4cd4749eda78 | [{'identifier': '130', 'scheme': 'everypolitic... | http://www.parlamentra.org/upload/iblock/c3b/t... | [{'url': 'http://www.parlamentra.org/upload/ib... | Цвижба Отари Шотович | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | male | 15a55ab3-c81c-4f5f-ae75-ea5446dd5ef6 | [{'identifier': '138', 'scheme': 'everypolitic... | http://www.parlamentra.org/upload/iblock/30b/y... | [{'url': 'http://www.parlamentra.org/upload/ib... | Язычба Заур Гайдарович | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
55 | male | ed4838aa-19f8-493c-b8c4-58780b5b0d84 | [{'identifier': '112', 'scheme': 'everypolitic... | http://www.lagtinget.ax/files/sjolund_folke.jpg | [{'url': 'http://www.lagtinget.ax/files/sjolun... | Sjölund Folke | 1943-12-16 | [{'note': 'Wikipedia (fi)', 'url': 'https://fi... | [{'lang': 'en', 'name': 'Folke Sjölund', 'note... | Folke | ... | NaN | Sjölund | 2013-12-13 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
56 | male | f105a4a0-5fef-4be7-ab73-9376923491a7 | NaN | https://www.lagtinget.ax/sites/www.lagtinget.a... | [{'url': 'https://www.lagtinget.ax/sites/www.l... | John Holmberg | 1967 | NaN | NaN | NaN | ... | [{'type': 'email', 'value': 'john.holmberg@lag... | NaN | NaN | john.holmberg@lagtinget.ax | NaN | NaN | NaN | NaN | NaN | NaN |
57 | male | f4f995ae-126c-468f-8f26-deec1e26adc2 | [{'identifier': '372', 'scheme': 'everypolitic... | http://www.lagtinget.ax/files/asumaa_tony.jpg | [{'url': 'http://www.lagtinget.ax/files/asumaa... | Asumaa Tony | 1968-09-15 | [{'note': 'Wikipedia (en)', 'url': 'https://en... | [{'lang': 'en', 'name': 'Tony Asumaa', 'note':... | Tony | ... | NaN | Asumaa | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
58 | male | fe40bb08-5025-473d-af52-bf1e30ecc5e6 | [{'identifier': '88', 'scheme': 'everypolitici... | http://www.lagtinget.ax/files/sundblom_torsten... | [{'url': 'http://www.lagtinget.ax/files/sundbl... | Sundblom Torsten | 1951-09-15 | [{'note': 'Wikipedia (fi)', 'url': 'https://fi... | [{'lang': 'en', 'name': 'Torsten Sundblom', 'n... | Torsten | ... | [{'type': 'email', 'value': 'torsten.sundblom@... | NaN | NaN | torsten.sundblom@lagtinget.ax | NaN | NaN | NaN | NaN | NaN | NaN |
59 | male | ff7b67a7-d3bf-4179-801d-7005059524e6 | NaN | https://www.lagtinget.ax/sites/www.lagtinget.a... | [{'url': 'https://www.lagtinget.ax/sites/www.l... | Fredrik Fredlund | 1978 | NaN | NaN | NaN | ... | [{'type': 'email', 'value': 'fredrik.fredlund@... | NaN | NaN | fredrik.fredlund@lagtinget.ax | NaN | NaN | NaN | NaN | NaN | NaN |
78382 rows × 21 columns
3. Filtering the data for the January Skew
Initially, the proportion of politicians with a January birthday was more than 4 times greater than any of the other months. Upon deeper analysis, this was attributed to the standardised value of 1 January being assigned to politicians for which birth month data was not available (as the pd.to_datetime function defaulted to this when no day/month information is provided). We identified this phenomenon as the January skew.
Finding all records of 1st January
After exploring a few methods to address this, we chose to omit outlier countries that had an unreasonable number of 1 Jan values. The threshold for the omission was computed by excluding countries where the ratio of records on January 1st was more than 10 times the expected ratio of records for Jan 1st - seven countries for which this threshold was violated were excluded from the analysis, including Syria and Cameroon for which the proportion of people born on 1 January were 98.2% and 26.8% respectively.
Finding countries with high ratio of 1st Jan births
Excluding countries with >10 times expected ratio of 1st Jan and >10 records
1 Jan | Total | Percentage | |
---|---|---|---|
country | |||
Bangladesh | 18 | 549 | 3.278689 |
Cameroon | 286 | 1064 | 26.879699 |
Pakistan | 32 | 352 | 9.090909 |
Syria | 269 | 274 | 98.175182 |
Turkey | 397 | 6899 | 5.754457 |
Yemen | 15 | 302 | 4.966887 |
This differs greatly from the global proportion of 8.97% of births in January. Hence such results were removed from our birth month analysis to safeguard data quality, and then we were able to perform our analyses.
4. Saving the included the data
gender | id | identifiers | image | images | name | birth_date | links | other_names | given_name | ... | contact_details | family_name | death_date | sort_name | honorific_prefix | honorific_suffix | national_identity | summary | patronymic_name | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
8 | male | 34352d83-6fa1-463d-a02f-6157b3adf36b | [{'identifier': '141', 'scheme': 'everypolitic... | http://www.parlamentra.org/upload/iblock/bfe/u... | [{'url': 'http://www.parlamentra.org/upload/ib... | Убирия Бежан Михайлович | 1967-03-07 | [{'note': 'Wikipedia (ru)', 'url': 'https://ru... | [{'name': 'Бежан Убириа', 'note': 'alternate'}... | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
19 | male | 9fd33b27-fd4c-4eba-9a8f-d4d23f603c63 | [{'identifier': '/m/03fqqs', 'scheme': 'freeba... | http://www.parlamentra.org/upload/iblock/e1f/s... | [{'url': 'http://www.parlamentra.org/upload/ib... | Шамба Сергей Миронович | 1951-03-15 | [{'note': 'Wikipedia (ab)', 'url': 'https://ab... | [{'lang': 'ab', 'name': 'Сергеи Шамба', 'note'... | Sergey | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
28 | male | da988bab-32d4-46c0-bb7b-5c6a6eb129e7 | [{'identifier': '46', 'scheme': 'everypolitici... | http://www.parlamentra.org/upload/iblock/b85/%... | [{'url': 'http://www.parlamentra.org/upload/ib... | Бганба Валерий Рамшухович | 1953-08-26 | [{'note': 'Wikimedia Commons', 'url': 'https:/... | [{'lang': 'cs', 'name': 'Valerij Bganba', 'not... | Valeri | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
78 | female | 522dff9d-d21d-41b9-a7d5-c2321c819b11 | [{'identifier': '1854', 'scheme': 'everypoliti... | http://www.wolesi.website/Media/Images/mine/fa... | [{'url': 'http://www.wolesi.website/Media/Imag... | Farkhunda Zahra Naderi-Kabul | 1981-04-19 | [{'note': 'Wikimedia Commons', 'url': 'https:/... | [{'lang': 'en', 'name': 'Farkhunda Zahra Nader... | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
172 | female | aac6f415-446d-4070-80d9-195d4f7b77ac | [{'identifier': '1823', 'scheme': 'everypoliti... | http://www.wolesi.website/Media/Images/mine/ra... | [{'url': 'http://www.wolesi.website/Media/Imag... | Rangina Kargar-Faryab | 1985-03-22 | [{'note': 'Wikimedia Commons', 'url': 'https:/... | [{'lang': 'en', 'name': 'Rangina Kargar', 'not... | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
51 | female | e57828f1-e51a-41ea-a61b-7fc1b92e2b37 | [{'identifier': '109', 'scheme': 'everypolitic... | http://www.lagtinget.ax/files/dahl_ulla_britt.jpg | [{'url': 'http://www.lagtinget.ax/files/dahl_u... | Dahl Ulla-Britt | 1946-03-14 | [{'note': 'Wikipedia (fi)', 'url': 'https://fi... | [{'lang': 'en', 'name': 'Ulla-Britt Dahl', 'no... | Ulla-Britt | ... | NaN | Dahl | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
52 | male | e6f6e01b-bcb0-4374-8912-7e1eaf492f10 | [{'identifier': '97', 'scheme': 'everypolitici... | http://www.lagtinget.ax/files/lindfors_henry.jpg | [{'url': 'http://www.lagtinget.ax/files/lindfo... | Lindström Henry | 1956-06-20 | [{'note': 'Wikipedia (fi)', 'url': 'https://fi... | [{'lang': 'en', 'name': 'Henry Lindström', 'no... | Henry | ... | NaN | Lindström | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
55 | male | ed4838aa-19f8-493c-b8c4-58780b5b0d84 | [{'identifier': '112', 'scheme': 'everypolitic... | http://www.lagtinget.ax/files/sjolund_folke.jpg | [{'url': 'http://www.lagtinget.ax/files/sjolun... | Sjölund Folke | 1943-12-16 | [{'note': 'Wikipedia (fi)', 'url': 'https://fi... | [{'lang': 'en', 'name': 'Folke Sjölund', 'note... | Folke | ... | NaN | Sjölund | 2013-12-13 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
57 | male | f4f995ae-126c-468f-8f26-deec1e26adc2 | [{'identifier': '372', 'scheme': 'everypolitic... | http://www.lagtinget.ax/files/asumaa_tony.jpg | [{'url': 'http://www.lagtinget.ax/files/asumaa... | Asumaa Tony | 1968-09-15 | [{'note': 'Wikipedia (en)', 'url': 'https://en... | [{'lang': 'en', 'name': 'Tony Asumaa', 'note':... | Tony | ... | NaN | Asumaa | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
58 | male | fe40bb08-5025-473d-af52-bf1e30ecc5e6 | [{'identifier': '88', 'scheme': 'everypolitici... | http://www.lagtinget.ax/files/sundblom_torsten... | [{'url': 'http://www.lagtinget.ax/files/sundbl... | Sundblom Torsten | 1951-09-15 | [{'note': 'Wikipedia (fi)', 'url': 'https://fi... | [{'lang': 'en', 'name': 'Torsten Sundblom', 'n... | Torsten | ... | [{'type': 'email', 'value': 'torsten.sundblom@... | NaN | NaN | torsten.sundblom@lagtinget.ax | NaN | NaN | NaN | NaN | NaN | NaN |
41381 rows × 21 columns