Linguists such as Charles Ferguson and Joseph Greenberg have shown that social and historical connections often define languages more effectively than formal boundaries.
Image: Pixabay
IF Arabic from Syria to Mauritania and from Morocco to the Comoros can be treated as one language, then the present classification of African languages needs serious revision.
The current framework that separates Nguni, Sotho, Shona and other clusters does not capture how these languages relate to one another. It ignores the broad continuities that stretch across regions and creates false linguistic boundaries that never existed in lived experience. This discrepancy exposes a systemic bias in how linguistic boundaries are drawn.
Arabic demonstrates how a language can function as a single linguistic system despite its internal diversity. When scholars first documented Arabic, they classified it as one language with many varieties. Later, the International Organisation for Standardisation (ISO) defined it as a macro language comprising 28 distinct but related languages.
Speakers of neighbouring Arabic varieties understand each other reasonably well, but comprehension decreases with distance.
This pattern mirrors the situation in southern Africa. IsiZulu, siSwati, isiXhosa and isiNdebele speakers communicate with relative ease, and the same applies to Sotho, Tswana and Pedi speakers. These languages share grammar, vocabulary and sound patterns, forming a wide zone of mutual intelligibility. Yet they are classified as separate languages, even though their overlaps resemble those found across the Arabic continuum.
If Arabic can exist as a macro language, then a similar approach could be applied to African languages. The lines that divide Zulu, Tswana and Shona are historical inventions rather than linguistic facts. These varieties are part of broader continua in which speech forms shift gradually from one region to another. Recognising them as part of macro systems would align classification with the reality of how people actually communicate.
Linguists such as Charles Ferguson and Joseph Greenberg have shown that social and historical connections often define languages more effectively than formal boundaries. Ferguson’s study of Arabic diglossia revealed how everyday speech and formal registers coexist within one system.
Greenberg observed that African languages develop through regional interaction rather than isolation. Their findings support a view of African languages as overlapping systems rather than separate entities.
The colonial project disrupted these relationships by imposing fixed labels on fluid patterns of speech. Missionaries and administrators needed categories for teaching, translation, and governance. They began naming and standardising the overlapping dialects.
This process produced what I call the “ethinicalisation” of languages. It turned flexible speech communities into discrete “tribes” each with its own language, even when mutual understanding remained high.
Among the BaTlhaping, Bakgatla, Barolong and Batlokwa, different forms of everyday speech were ethnocentricised as one language, Setswana. That written standard was then tied to an identity, and Setswana came to represent both language and ethnicity (a mega tribe). The exact process created the isiZulu, isiXhosa and Shona identities.
Before this ethnicalisation, people moved and communicated freely across linguistic boundaries, maintaining shared vocabularies and expressions.
Late Kenyan scholar Ngũgĩ wa Thiong’o has argued that the colonial invention of language categories was a political act rather than a scientific one. By linking language to tribe, colonial authorities transformed speech into a marker of identity and loyalty. Once a group was assigned its own language, it was treated as a separate social unit.
This practice simplified governance but deepened divisions that have persisted into the postcolonial era.
On the other hand, the term “Bantu” illustrates how linguistic categories were infused with racial meaning. Wilhelm Bleek cheekily introduced it as a descriptive label for related African languages, but colonial administrations soon used it to designate a racial group. Under apartheid, “Bantu” meant “Black person,” erasing internal linguistic diversity while maintaining divisions within that category.
The word still carries the weight of that history, reducing complex relationships to a single racial term.
The consequences of ethnicalisation reached beyond southern Africa. In Rwanda and Burundi, Belgian authorities racialised differences between Hutu, Tutsi and Twa, enforcing them through identity documents.
These artificial boundaries reinforced a hierarchy that later contributed to violence. In Kenya, linguistic distinctions among groups such as the Luo, Luhya, and Kalenjin became political markers of opposition to the Kikuyu. Language classification thus became a foundation for rivalry and exclusion.
The ethinicalisation process changed how Africans understood language itself. It created the impression that each ethnic group had its own language, when in reality many shared overlapping varieties. Before colonial borders, languages flowed into one another, reflecting the movement of people, trade and intermarriage.
The colonial insistence on naming and dividing these continua replaced shared communication with separation.
If the same criteria used for Arabic were applied to African languages, large regional macro systems would emerge. The Nguni and Sotho-Tswana clusters could be understood as zones within a broader continuum, much like the varieties of Arabic across North Africa and the Middle East. This approach would reflect both linguistic and historical realities, showing how speech communities remain connected despite political and cultural fragmentation.
Joseph Greenberg’s attempts to classify African languages recognised some of these relationships, but his framework still inherited colonial assumptions about boundaries. Later scholars such as Malcolm Guthrie refined his system but continued to use divisions shaped by earlier ethnicalised thinking.
A new model would need to acknowledge the continuum that extends from southern to central Africa, where languages merge gradually across space.
Such a reclassification has practical and intellectual importance. Education systems that teach children that their “tribal language” is entirely distinct from their neighbour’s perpetuate colonial separations. Recognising continua would allow for broader regional literacy and easier cross-border communication.
It would also foster a sense of connectedness that reflects Africa’s linguistic and historical interdependence.
Reconsidering classification does not erase diversity. It instead recognises that diversity exists within connection, not outside it. Arabic demonstrates this principle clearly. Its varieties differ, but they belong to a single linguistic world because of shared structures, traditions and mutual awareness.
The same can be said of African languages if freed from the legacy of ethnicalisation.
Language was never the private possession of one tribe or nation. It has always been shared, shifting, and adaptive. The categories that define Zulu, Tswana, Shona, or Venda as entirely separate entities were built for control, not for understanding. Restoring their continuity would bring linguistic scholarship closer to the truth of how Africans have always spoken and interacted.
If Arabic qualifies as one macro language, African languages deserve to be understood on similar terms. Their continuities, mutual intelligibility and shared development challenge the idea of strict divisions. Recognising this reality would undo the distortions left by ethnicalisation and allow Africa’s languages to be seen not as fragments but as interlinked expressions of one vast linguistic landscape.
Siyayibanga le economy!
* Siyabonga Hadebe is an independent commentator based in Geneva on socio-economic, political and global matters.
** The views expressed here do not reflect those of the Sunday Independent, Independent Media, or IOL.