Chichewa Speakers#
This notebook provides statistics about the language of Chichewa. More specifically, the goal is to generate an accurate estimate of total number of people who speak this language. Note that Chichewa also includes the alternate names: Chewa, Nyanja, Chinyanja. In order to provide context, first, let us indicate the countries where the language is spoken.
Mozamabique. Less than 1% of the population speak the language. [3]
Zimbabwe. Although Chichewa seem to be one of the official languages for Zimbabwe, I havent found any data yet showing how many people speak the language.
Tanzania. Has a border with Malawi in the northen region where people speak Tumbuka, so it makes sense that there maybe no Chichewa speaking people there. Otherwise, I didnot find any data on proportion of the population who speak the language.
Based on the analysis in this notebook, as of 2023, there are 21,482,292 people who speak Chichewa distributed across three countries: Malawi (70%), Zambia (18%) and Mozambique(12%).
The Humanitarian data website contains data about languages for some of these countries. The HUMDATA links are provided below:
Since I was not very sure of some of the numbers from humdata, I decided to check with the actual sources as follows:
Malawi and Zambia. I could not find any current surveys with data on languages spoken but I still found something in the DHS. In the DHS, they ask about survey respondent’s native language. Although this data is not included in DHS reports (as they seem to collect this piece of data as interview metadata), its still a useful source of data for languages spoken. For Zimbabwe, Tanzania and Mozambique the DHS does’nt have this information as they only provide languages as language-1, language-2 etc except for the major languages.
1. ^ 2015-16 MDHS and humdata
2. ^ 2018 Zambia DHS and humdata
3. ^ Humdata
import pandas as pd
pd.set_option('display.float_format', lambda x: '%.3f' % x)
Chichewa speakers in Malawi#
Malawi is the main country where Chichewa is spoken. Due to lack of exact data on languages spoken, I attempted to estimate based on tribe and district of origin.
def mw_estimate_chichewa_speaking_based_on_tribe():
"""
Get an estimate of people wjo speak Chichewa and Chinyanja based
on tribes
data source: 2018 census main report:
http://www.nsomalawi.mw/images/stories/data_on_line/demography/census_2018/
"""
# from MW 2018 census report
total_pop_malawi = 17506022
# =======================
# POPULATION BY TRIBE
# =======================
mw_chewa_trb = 6020945
mw_tumbuka_trb = 1614955
mw_lomwe_trb = 3302634
mw_tonga_trb = 310031
mw_yao_trb = 2321763
mw_sena_trb = 670908
mw_nkhonde_trb = 174430
mw_lambya_trb = 106769
mw_sukwa_trb = 93762
mw_manganja_trb = 559887
mw_nyanja_trb = 324272
mw_ngoni_tribe = 1819347
mw_other_trb = 186319
# check that totals from tribes match with given total pop
tot = mw_chewa_trb + mw_lambya_trb + mw_sukwa_trb + mw_other_trb+ mw_manganja_trb+ mw_sena_trb+\
mw_lomwe_trb + mw_ngoni_tribe + mw_tumbuka_trb + mw_tonga_trb + mw_nyanja_trb + mw_yao_trb + mw_nkhonde_trb
assert tot == total_pop_malawi
# =====================================
# SUM POP FOR CHICHEWA SPEAKING TRIBES
# ====================================
# the following tribes speak chichewa:
# chewa, ngoni,mang'anja, nyanja and other
chichewa_speaking_pop = mw_chewa_trb + mw_other_trb + mw_ngoni_tribe + \
mw_manganja_trb + mw_nyanja_trb
print('Chichewa speaking population: {:,}'.format(chichewa_speaking_pop))
estimate_chichewa_speaking_based_on_tribe()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[3], line 1
----> 1 estimate_chichewa_speaking_based_on_tribe()
NameError: name 'estimate_chichewa_speaking_based_on_tribe' is not defined
def mw_estimate_chichewa_speakers_by_district(dist_pop_csv):
"""
Get an estimate of people who speak Chichewa and Chinyanja based
on tribes using data 2018 census main report for population by tribe:
http://www.nsomalawi.mw/images/stories/data_on_line/demography/census_2018/
Parameters:
dist_pop_csv (str): CSV file with population by district for MW
Returns:
int:Population of Chichewa speakers in MW
"""
# =======================
# POPULATION BY DISTRICT
# =======================
df_dist = pd.read_csv(dist_pop_csv)
# ===========================
# CHICHEWA SPEAKING DISTRICTS
# ===========================
# List of chichewa speaking districts
chichewa_speaking_dists = ['Mzuzu City',
'Kasungu', 'Nkhotakota', 'Ntchisi', 'Dowa', 'Salima', 'Lilongwe',
'Mchinji', 'Dedza', 'Ntcheu', 'Lilongwe City',
'Zomba', 'Chiradzulu', 'Blantyre', 'Mwanza', 'Thyolo',
'Mulanje', 'Phalombe', 'Balaka', 'Neno',
'Zomba City', 'Blantyre City']
df_chichewa_speaking = df_dist.query('DIST_NAME in @chichewa_speaking_dists')
chich_pop = df_chichewa_speaking['TOTAL_POP'].sum()
print('Chichewa speaking population: {:,}'.format(chich_pop))
return chich_pop
df_pop = mw_estimate_chichewa_speakers_by_district(dist_pop_csv="../data/mw_2018_pop_by_dist.csv")
Chichewa speaking population: 12,747,340.0
def chich_speaking_pop_DHS_based(stata_file, total_pop, chichewa_lan_codes,
dhs_tot_hhs):
"""
Estimate Chichewa speaking people from DHS survey question
on native language of respondent.
Parameters:
stata_file (str): Path to STATA (.dta) file for household members. Data
can be accessed here: https://dhsprogram.com/data/dataset/Malawi_Standard-DHS_2015.cfm?flag=0
total_pop (int): Total population for the country
chichewa_lan_codes(list): which language codes represent chichewa. For example: [2], [2,3]
dhs_tot_hhs(int): Total number of househols in DHS, for verification purpose.
Returns:
int:Population of Chichewa speakers in MW
"""
# Load the stata file
df = pd.read_stata(stata_file, convert_categoricals=False)
# Rename the columns
# Grab these from STATA-Do file available in same folder as the data
cols = {'hv045b': 'intv_lan', 'hv045c':'resp_nativ_lan', 'hv046': 'translator',
'hv002':'hh_num','hv005': 'weight',
'hv045a': 'qn_lan', 'hv001': "cluster_number", 'hv004': "area_unit"}
keep_cols = ['hhid'] + list(cols.keys())
df = df[keep_cols]
df.rename(columns=cols, inplace=True)
df['hh_id'] = df.apply(lambda x:
str(x['cluster_number']).zfill(3) +
str(x['hh_num']).zfill(3), axis=1)
# Check that we have all households as expected: 26, 361 as indicated
# in the report
try:
assert df.hh_id.nunique() == dhs_tot_hhs
except:
print('{:,} households from this file compared to {:,} reported number'.format(
df.hh_id.nunique(), dhs_tot_hhs))
print()
# ========================================
# TABULATE RESPONDENT NATIVE LANGUAGE
# ========================================
# Since we are getting national stats, we
# will not weight
chich_prop = df.resp_nativ_lan.value_counts(normalize=True)[chichewa_lan_codes]
chich_prop_total = chich_prop.sum()
# Get population from the proportion
chich_pop = int(chich_prop_total*total_pop)
return chich_prop_total, chich_pop
# ===================
# Malawi
# ===================
language_names = {1: "English", 2:"Chichewa", 3:"Tumbuka", 6: "Other"}
# MW DHS-2015-16 HH members stata data file
data_file = "../data/other-inputs/MWPR7ADT/MWPR7AFL.DTA"
# MW DHS-2015-16 total sample households
mw_dhs_hhs = 26361
# MW 2018 census population projection
# From here: http://www.nsomalawi.mw/images/stories/data_on_line/demography/census_2018/\
# Thematic_Reports/Population%20Projections%202018-2050.pdf
mw_proj_pop_2023 = 19809511
mw_chich_prop, mw_chich_pop = chichewa_speaking_pop = chich_speaking_pop_DHS_based(stata_file=data_file, chichewa_lan_codes=[2],
total_pop=mw_proj_pop_2023,
dhs_tot_hhs=mw_dhs_hhs)
print('================================================')
print('Based on the 2015-16 Malawi DHS and 2018 census')
print('================================================')
print('Estimated number of Chichewa speaking people in Malawi is : {:,}'.format(mw_chich_pop ))
================================================
Based on the 2015-16 Malawi DHS and 2018 census
================================================
Estimated number of Chichewa speaking people in Malawi is : 15,050,638
Chichewa speakers in Zambia#
The next biggest population of Chichewa speakers is in Zambia.
# ===================
# ZAMBIA
# ===================
za_language_codes = {1: "English", 2:"Bemba", 3:"Kaonde",4: "Lozi",
5 :"Lunda", 6:"Luvale", 7: "Nyanja",
8: "Tonga", 96 : "Other"}
# MW DHS-2015-16 HH members stata data file
data_file = "../data/other-inputs/ZMPR71DT/ZMPR71FL.DTA"
# MW DHS-2015-16 total sample households
za_dhs_hhs = 13625
# ZA 2022 census population
# From here: https://www.zamstats.gov.zm/download/6815/?tmstv=1677767005&v=9623
za_pop_2022 = 19610769
za_chich_prop, za_chich_pop = chich_speaking_pop_DHS_based(stata_file=data_file,
chichewa_lan_codes=[7],
total_pop=za_pop_2022,
dhs_tot_hhs=za_dhs_hhs)
print('================================================')
print('Based on the 2018 Zambia DHS and 2022 census')
print('================================================')
print('Estimated number of Chichewa speaking people in Zambia is : {:,}'.format(za_chich_pop))
12,831 households from this file compared to 13,625 reported number
================================================
Based on the 2018 Zambia DHS and 2022 census
================================================
Estimated number of Chichewa speaking people in Zambia is : 3,787,688
Chichewa speakers in Mozambique#
There are also a sizable number of Chichewa speakers is in Mozambique.
# 2023 projected population for Mozambique
# source: https://www.imf.org/en/Countries/MOZ
mz_proj_pop_2023 = 33897000
# Proportion of chichewa speakers
# source: (https://data.humdata.org/dataset/mozambique-languages
prop_chichewa_pop = 0.078
# Estimate Chichewa speaking population
mz_chich_pop = mz_proj_pop_2023 * prop_chichewa_pop
print('==================================================')
print('Based on the HUM-DATA and 2023 projected population')
print('===================================================')
print('Estimated number of Chichewa speaking people in Mozambique is : {:,}'.format(int(mz_chich_speakers)))
==================================================
Based on the HUM-DATA and 2023 projected population
===================================================
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[9], line 15
13 print('Based on the HUM-DATA and 2023 projected population')
14 print('===================================================')
---> 15 print('Estimated number of Chichewa speaking people in Mozambique is : {:,}'.format(int(mz_chich_speakers)))
NameError: name 'mz_chich_speakers' is not defined
Total Chichewa speakers#
chich = {'mw': mw_chich_pop, 'za': za_chich_pop, 'mz': mz_chich_pop}
total_chich_pop = sum(chich.values())
mw_prop = round(mw_chich_pop/total_chich_pop*100, 2)
za_prop = round(za_chich_pop/total_chich_pop*100, 2)
mz_prop = round(mz_chich_pop/total_chich_pop*100, 2)
print('===========================')
print('Based on analysis above')
print('===========================')
print('Estimated number of Chichewa speaking people is : {:,}'.format(int(total_chich_pop)))
print()
print('Distributed across 3 countries as below:')
print('-'*40)
print(
f"{'Malawi:':<15}{mw_prop:>10}%",
f"\n{'Zambia:':<15}{za_prop:>10}%",
f"\n{'Mozambique:':<15}{mz_prop:>10}%",
)
===========================
Based on analysis above
===========================
Estimated number of Chichewa speaking people is : 21,482,292
Distributed across 3 countries as below:
----------------------------------------
Malawi: 70.06%
Zambia: 17.63%
Mozambique: 12.31%