Programming Assignment-1#

Estimating Chichewa Speakers#

Background#

This notebook provides statistics about the language of Chichewa. More specifically, the goal is to generate an accurate estimate of total number of people who speak this language. Note that Chichewa also includes the alternate names: Chewa, Nyanja, Chinyanja. In order to provide context, first, let us indicate the countries where the language is spoken.

  1. Malawi. About 70% of the population speak Chichewa. [1]

  2. Zambia. About 20% of the population speak the language. [2]

  3. Mozamabique. Less than 1% of the population speak the language. [3]

  4. Zimbabwe. Although Chichewa seem to be one of the official languages for Zimbabwe, I havent found any data yet showing how many people speak the language.

  5. Tanzania. Has a border with Malawi in the northen region where people speak Tumbuka, so it makes sense that there maybe no Chichewa speaking people there. Otherwise, I didnot find any data on proportion of the population who speak the language.

Based on the analysis in this notebook, as of 2023, there are 21,482,292 people who speak Chichewa distributed across three countries: Malawi (70%), Zambia (18%) and Mozambique(12%).

The Humanitarian data website contains data about languages for some of these countries. The HUMDATA links are provided below:

Since I was not very sure of some of the numbers from humdata, I decided to check with the actual sources as follows:

Malawi and Zambia. I could not find any current surveys with data on languages spoken but I still found something in the DHS. In the DHS, they ask about survey respondent’s native language. Although this data is not included in DHS reports (as they seem to collect this piece of data as interview metadata), its still a useful source of data for languages spoken. For Zimbabwe, Tanzania and Mozambique the DHS does’nt have this information as they only provide languages as language-1, language-2 etc except for the major languages.

🚨 Assignment Tasks

  1. Read the notebook carefully and answer the questions as asked.

  2. Complete any missing code where it says “ADD YOUR CODE”.

  3. In some cases, you may be asked to write a whole function.

Datasets#

For this assignment, you will use the following datasets:

Malawi 2015-16 DHS#

Malawi Population#

References#

1. ^ 2015-16 MDHS and humdata

2. ^ 2018 Zambia DHS and humdata

3. ^ Humdata

Setup Input Folders#

In this section, make sure to define the folders where your data is stored on your machine.
I find it helpful to set up the working directory and input data folders right at the start of the notebook.
To keep things organized, I use the naming convention: FILE_{NAME} for files and DIR_{NAME} for folders.

We’ll be using the pathlib library—it’s the recommended approach for managing file paths in Python.

# Uncomment the following lines and add your code to define the directories and files
# DIR_DATA = ADD YOUR CODE
# DIR_DHS = ADD YOUR CODE

# Population by enumeration area (EA) for Malawi
# FILE_POP_MW = ADD YOUR CODE

# MW DHS-2015-16 HH members stata data file
# FILE_MW_DHS = ADD YOUR CODE

Estimating Chichewa Speakers in Malawi Based on Tribe#

Malawi is the primary country where Chichewa is spoken. In the absence of precise language data, we begin by estimating the number of Chichewa speakers based on tribal affiliation.

Estimated Population by Tribe in Malawi#

  • Chewa: 6,020,945

  • Tumbuka: 1,614,955

  • Lomwe: 3,302,634

  • Tonga: 310,031

  • Yao: 2,321,763

  • Sena: 670,908

  • Nkhonde: 174,430

  • Lambya: 106,769

  • Sukwa: 93,762

  • Manganja: 559,887

  • Nyanja: 324,272

  • Ngoni: 1,819,347

  • Other: 186,319

In Malawi, these groups speak Chichewa:

  • Chewa: 6,020,945

  • Nyanja: 324,272

  • Ngoni: 1,819,347

  • Other: 186,319

  • Manganja: 559,887

Task-1:#

  • Using the information provided in cell above, write a function that estimates the number of Chichewa speakers in Malawi based on tribal groups.

  • Ensure your function returns a value representing the chichewa speaking population

  • Name your function: mw_estimate_chichewa_speaking_based_on_tribe

  • Call your function and print out the number of Chichewa speakers

Chichewa speakers in Malawi Based on District of Residence#

In this section, we estimate Chichewa speakers based on the district of residence. For example, we can assume that people from the following districts speak Chichewa:

Mzuzu City, Kasungu, Nkhotakota, Ntchisi, Dowa, Salima, Lilongwe,
Mchinji, Dedza, Ntcheu, Lilongwe City,
Zomba, Chiradzulu, Blantyre, Mwanza, Thyolo,
Mulanje, Phalombe, Balaka, Neno,
Zomba City, Blantyre City

Task 2 – Part 1: Generate Population by District#

  • Use the Malawi population data to write a function that reads the relevant data files and returns a DataFrame containing population totals by district.

  • Name your function: generate_dist_pop

Task 2 – Part 2: Estimate Chichewa-Speaking Population#

  • Write a function that takes the district-level population DataFrame and estimates the number of Chichewa speakers.

  • Name your function: mw_estimate_chichewa_speakers_by_district

  • Call this function and the previous function as needed and print the total estimated number of Chichewa speakers.

Chichewa speakers in Malawi Based on DHS Data#

In the Demographic and Health Survey (DHS), we estimate the number of Chichewa speakers using responses to the question on the primary language spoken by the respondent.
While the DHS does not provide exhaustive linguistic data, the self-reported language question offers a useful proxy for estimating language distribution across the population.
This approach allows us to approximate the number of Chichewa speakers based on individual-level survey data that is nationally representative.

Relevant Variables Column Name Mapping in STATA file#

  • hv045b: intv_lan — Interviewer’s language

  • hv045c: resp_nativ_lan — Respondent’s native language

  • hv046: translator — Translator used

  • hv002: hh_num — Household number

  • hv005: weight — Sampling weight

  • hv045a: qn_lan — Language of the questionnaire

  • hv001: cluster_number — Cluster number

  • hv004: area_unit — Area unit

Language Code Mapping#

  • 1: English

  • 2: Chichewa

  • 3: Tumbuka

  • 6: Other

Other Useful Information#

  • Total Number of Sampled Housolds in 2015-2016 DHS: 26, 361

  • Malawi projected population for 2023: 19, 809, 511

Task 3: Estimate Chichewa Speakers from DHS Using 2023 Population Data#

  • Write a function that estimates the number of Chichewa speakers using the respondent language variable from the DHS dataset.

  • Call the function and print the total estimated number of Chichewa speakers.

Hints

  • Report the unweighted estimate

  • Use the variable descriptions provided above (e.g., respondent language, household weights).

  • After loading the data, validate your calculations by checking against the total number of households.

  • Ensure your function is flexible and can accept multiple arguments as needed (e.g., the dataset, relevant column names, filters).