Chichewa Speakers#

This notebook provides statistics about the language of Chichewa. More specifically, the goal is to generate an accurate estimate of total number of people who speak this language. Note that Chichewa also includes the alternate names: Chewa, Nyanja, Chinyanja. In order to provide context, first, let us indicate the countries where the language is spoken.

  1. Malawi. About 70% of the population speak Chichewa. [1]

  2. Zambia. About 20% of the population speak the language. [2]

  3. Mozamabique. Less than 1% of the population speak the language. [3]

  4. Zimbabwe. Although Chichewa seem to be one of the official languages for Zimbabwe, I havent found any data yet showing how many people speak the language.

  5. Tanzania. Has a border with Malawi in the northen region where people speak Tumbuka, so it makes sense that there maybe no Chichewa speaking people there. Otherwise, I didnot find any data on proportion of the population who speak the language.

Based on the analysis in this notebook, as of 2023, there are 21,482,292 people who speak Chichewa distributed across three countries: Malawi (70%), Zambia (18%) and Mozambique(12%).

The Humanitarian data website contains data about languages for some of these countries. The HUMDATA links are provided below:

Since I was not very sure of some of the numbers from humdata, I decided to check with the actual sources as follows:

Malawi and Zambia. I could not find any current surveys with data on languages spoken but I still found something in the DHS. In the DHS, they ask about survey respondent’s native language. Although this data is not included in DHS reports (as they seem to collect this piece of data as interview metadata), its still a useful source of data for languages spoken. For Zimbabwe, Tanzania and Mozambique the DHS does’nt have this information as they only provide languages as language-1, language-2 etc except for the major languages.

1. ^ 2015-16 MDHS and humdata

2. ^ 2018 Zambia DHS and humdata

3. ^ Humdata

import pandas as pd
pd.set_option('display.float_format', lambda x: '%.3f' % x)

Chichewa speakers in Malawi#

Malawi is the main country where Chichewa is spoken. Due to lack of exact data on languages spoken, I attempted to estimate based on tribe and district of origin.

def mw_estimate_chichewa_speaking_based_on_tribe():
    """
    Get an estimate of people wjo speak Chichewa and Chinyanja based
    on tribes
    data source: 2018 census main report: 
    http://www.nsomalawi.mw/images/stories/data_on_line/demography/census_2018/
    """
    # from MW 2018 census report 
    total_pop_malawi = 17506022
    
    # =======================
    # POPULATION BY TRIBE
    # =======================
    mw_chewa_trb = 6020945
    mw_tumbuka_trb = 1614955
    mw_lomwe_trb = 3302634
    mw_tonga_trb = 310031
    mw_yao_trb = 2321763
    mw_sena_trb = 670908
    mw_nkhonde_trb = 174430
    mw_lambya_trb = 106769
    mw_sukwa_trb = 93762
    mw_manganja_trb = 559887
    mw_nyanja_trb = 324272
    mw_ngoni_tribe = 1819347
    mw_other_trb = 186319
    
    # check that totals from tribes match with given total pop
    tot = mw_chewa_trb + mw_lambya_trb + mw_sukwa_trb + mw_other_trb+ mw_manganja_trb+ mw_sena_trb+\
    mw_lomwe_trb + mw_ngoni_tribe + mw_tumbuka_trb + mw_tonga_trb + mw_nyanja_trb + mw_yao_trb + mw_nkhonde_trb
    assert tot == total_pop_malawi
    
    # =====================================
    # SUM POP FOR CHICHEWA SPEAKING TRIBES
    # ====================================
    # the following tribes speak chichewa:
    # chewa, ngoni,mang'anja, nyanja and other
    chichewa_speaking_pop = mw_chewa_trb + mw_other_trb + mw_ngoni_tribe + \
    mw_manganja_trb + mw_nyanja_trb
    print('Chichewa speaking population: {:,}'.format(chichewa_speaking_pop))
estimate_chichewa_speaking_based_on_tribe()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 estimate_chichewa_speaking_based_on_tribe()

NameError: name 'estimate_chichewa_speaking_based_on_tribe' is not defined
def mw_estimate_chichewa_speakers_by_district(dist_pop_csv):
    """
    Get an estimate of people who speak Chichewa and Chinyanja based
    on tribes using data 2018 census main report for population by tribe:
    http://www.nsomalawi.mw/images/stories/data_on_line/demography/census_2018/
    Parameters:
    dist_pop_csv (str): CSV file with population by district for MW
    
    Returns:
    int:Population of Chichewa speakers in MW

   """
  
    
    # =======================
    # POPULATION BY DISTRICT
    # =======================
    df_dist = pd.read_csv(dist_pop_csv)
    
    # ===========================
    # CHICHEWA SPEAKING DISTRICTS
    # ===========================
    # List of chichewa speaking districts
    chichewa_speaking_dists = ['Mzuzu City',
       'Kasungu', 'Nkhotakota', 'Ntchisi', 'Dowa', 'Salima', 'Lilongwe',
       'Mchinji', 'Dedza', 'Ntcheu', 'Lilongwe City', 
       'Zomba', 'Chiradzulu', 'Blantyre', 'Mwanza', 'Thyolo',
       'Mulanje', 'Phalombe', 'Balaka', 'Neno',
       'Zomba City', 'Blantyre City']
    
    df_chichewa_speaking = df_dist.query('DIST_NAME in @chichewa_speaking_dists')
    chich_pop = df_chichewa_speaking['TOTAL_POP'].sum()
    print('Chichewa speaking population: {:,}'.format(chich_pop))
    return chich_pop
 df_pop = mw_estimate_chichewa_speakers_by_district(dist_pop_csv="../data/mw_2018_pop_by_dist.csv")
Chichewa speaking population: 12,747,340.0
def chich_speaking_pop_DHS_based(stata_file, total_pop, chichewa_lan_codes, 
                                    dhs_tot_hhs):
    """
    Estimate Chichewa speaking people from DHS survey question
    on native language of respondent. 
    Parameters:
    stata_file (str): Path to STATA (.dta) file for household members. Data
    can be accessed here: https://dhsprogram.com/data/dataset/Malawi_Standard-DHS_2015.cfm?flag=0
    total_pop (int): Total population for the country
    chichewa_lan_codes(list): which language codes represent chichewa. For example: [2], [2,3]
    dhs_tot_hhs(int): Total number of househols in DHS, for verification purpose.
    
    Returns:
    int:Population of Chichewa speakers in MW

   """
    # Load the stata file
    df = pd.read_stata(stata_file, convert_categoricals=False)
    
    # Rename the columns
    # Grab these from STATA-Do file available in same folder as the data
    cols = {'hv045b': 'intv_lan', 'hv045c':'resp_nativ_lan', 'hv046': 'translator', 
        'hv002':'hh_num','hv005': 'weight',
 'hv045a': 'qn_lan', 'hv001':  "cluster_number", 'hv004':   "area_unit"}
    keep_cols = ['hhid'] + list(cols.keys())
    df = df[keep_cols]
    df.rename(columns=cols, inplace=True)
    df['hh_id'] = df.apply(lambda x: 
                           str(x['cluster_number']).zfill(3) + 
                           str(x['hh_num']).zfill(3), axis=1)
    
    # Check that we have all households as expected: 26, 361 as indicated
    # in the report 
    try:
        assert df.hh_id.nunique() == dhs_tot_hhs
    except:
        print('{:,} households from this file compared to {:,} reported number'.format(
            df.hh_id.nunique(), dhs_tot_hhs))
        print()
    
    # ========================================
    # TABULATE RESPONDENT NATIVE LANGUAGE
    # ========================================
    # Since we are getting national stats, we
    # will not weight 
    chich_prop = df.resp_nativ_lan.value_counts(normalize=True)[chichewa_lan_codes]
    chich_prop_total = chich_prop.sum()
    
    # Get population from the proportion
    chich_pop = int(chich_prop_total*total_pop)
    
    
    return chich_prop_total, chich_pop
# ===================
# Malawi
# ===================
language_names = {1: "English", 2:"Chichewa", 3:"Tumbuka", 6: "Other"}

# MW DHS-2015-16 HH members stata data file
data_file = "../data/other-inputs/MWPR7ADT/MWPR7AFL.DTA"

# MW DHS-2015-16 total sample households
mw_dhs_hhs = 26361

# MW 2018 census population projection
# From here: http://www.nsomalawi.mw/images/stories/data_on_line/demography/census_2018/\
# Thematic_Reports/Population%20Projections%202018-2050.pdf
mw_proj_pop_2023 = 19809511
mw_chich_prop, mw_chich_pop = chichewa_speaking_pop = chich_speaking_pop_DHS_based(stata_file=data_file, chichewa_lan_codes=[2], 
                                    total_pop=mw_proj_pop_2023,  
                                    dhs_tot_hhs=mw_dhs_hhs)
print('================================================')
print('Based on the 2015-16 Malawi DHS and 2018 census')
print('================================================')
print('Estimated number of Chichewa speaking people in Malawi is : {:,}'.format(mw_chich_pop ))
================================================
Based on the 2015-16 Malawi DHS and 2018 census
================================================
Estimated number of Chichewa speaking people in Malawi is : 15,050,638

Chichewa speakers in Zambia#

The next biggest population of Chichewa speakers is in Zambia.

# ===================
# ZAMBIA
# ===================
za_language_codes = {1: "English", 2:"Bemba", 3:"Kaonde",4: "Lozi",
                  5 :"Lunda", 6:"Luvale", 7: "Nyanja", 
                  8: "Tonga", 96 : "Other"}
 
# MW DHS-2015-16 HH members stata data file
data_file = "../data/other-inputs/ZMPR71DT/ZMPR71FL.DTA"

# MW DHS-2015-16 total sample households
za_dhs_hhs = 13625

# ZA 2022 census population
# From here: https://www.zamstats.gov.zm/download/6815/?tmstv=1677767005&v=9623
za_pop_2022 = 19610769
za_chich_prop, za_chich_pop = chich_speaking_pop_DHS_based(stata_file=data_file, 
                                    chichewa_lan_codes=[7], 
                                    total_pop=za_pop_2022,  
                                    dhs_tot_hhs=za_dhs_hhs)

print('================================================')
print('Based on the 2018 Zambia DHS and 2022 census')
print('================================================')
print('Estimated number of Chichewa speaking people in Zambia is : {:,}'.format(za_chich_pop))
12,831 households from this file compared to 13,625 reported number

================================================
Based on the 2018 Zambia DHS and 2022 census
================================================
Estimated number of Chichewa speaking people in Zambia is : 3,787,688

Chichewa speakers in Mozambique#

There are also a sizable number of Chichewa speakers is in Mozambique.

# 2023 projected population for Mozambique
# source: https://www.imf.org/en/Countries/MOZ
mz_proj_pop_2023 =  33897000

# Proportion of chichewa speakers
# source: (https://data.humdata.org/dataset/mozambique-languages
prop_chichewa_pop = 0.078

# Estimate Chichewa speaking population 
mz_chich_pop = mz_proj_pop_2023 * prop_chichewa_pop

print('==================================================')
print('Based on the HUM-DATA and 2023 projected population')
print('===================================================')
print('Estimated number of Chichewa speaking people in Mozambique is : {:,}'.format(int(mz_chich_speakers)))
==================================================
Based on the HUM-DATA and 2023 projected population
===================================================
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[9], line 15
     13 print('Based on the HUM-DATA and 2023 projected population')
     14 print('===================================================')
---> 15 print('Estimated number of Chichewa speaking people in Mozambique is : {:,}'.format(int(mz_chich_speakers)))

NameError: name 'mz_chich_speakers' is not defined

Total Chichewa speakers#

chich = {'mw': mw_chich_pop, 'za': za_chich_pop, 'mz': mz_chich_pop}
total_chich_pop = sum(chich.values())

mw_prop = round(mw_chich_pop/total_chich_pop*100, 2)
za_prop = round(za_chich_pop/total_chich_pop*100, 2)
mz_prop = round(mz_chich_pop/total_chich_pop*100, 2)
print('===========================')
print('Based on analysis above')
print('===========================')
print('Estimated number of Chichewa speaking people is : {:,}'.format(int(total_chich_pop)))

print()
print('Distributed across 3 countries as below:')
print('-'*40)
print(
    f"{'Malawi:':<15}{mw_prop:>10}%",
    f"\n{'Zambia:':<15}{za_prop:>10}%",
    f"\n{'Mozambique:':<15}{mz_prop:>10}%",
)
===========================
Based on analysis above
===========================
Estimated number of Chichewa speaking people is : 21,482,292

Distributed across 3 countries as below:
----------------------------------------
Malawi:             70.06% 
Zambia:             17.63% 
Mozambique:         12.31%