Did Urbanization or Ethnicity Matter More in Malaysia's 14th General Election?

AuthorWeijian Ng, Jason

This article focuses on identifying the variable which has the highest predictive power in predicting electoral behaviour. To do this, we apply a tree-based machine learning technique to data from Malaysia's 14th General Election. We find that constituencies' urbanization level has the most significant predictive power in determining vote share. Ethnicity, a long-touted variable of significance, plays a secondary role. Moreover, these predictors' marginal effects on the vote share are highly complex, non-linear and difficult to pick up by conventional regression methods. Other explanatory factors do not exhibit significant predictive qualities of electoral behaviour, although the extant literature has shown them to have important causal relationships. As our analysis reflects the significant predictive power of urbanization in predicting voting behaviour, we caution against the haste to dismiss its relevance in the Malaysian context.

Keywords: urbanization, ethnicity, Malaysia, 14th General Election, Southeast Asia.

The 2018 14th General Election (GE14) in Malaysia resulted in the unseating of the Barisan Nasional (BN), the incumbent coalition that had governed the country since independence from the United Kingdom in 1957. Against seemingly impossible odds, the Pakatan Harapan (PH), led by nonagenarian former Prime Minister Mahathir Mohamad, secured with the help of its ally, the Sabah Heritage Party, a simple majority of 121 federal parliamentary seats out of the 222 contested.

Several key explanations have been offered to explain the BN's electoral defeat. They include: the electorate's anger and resentment against the then-Prime Minister Najib Razak due to economic mismanagement and corruption scandals; (1) the strengthening of the PH coalition as a result of Mahathir's involvement, allaying fears of an erosion of Malay rights; (2) the influence of urbanization associated with modernization theory; (3) party elite defections from the United Malay National Organisation (UMNO), the leading component party of the BN; (4) and the subsequent split in the Malay vote which the BN had traditionally relied on. (5) The role of regionalism in the BN's defeat was also amplified in the GE14, manifested in the heterogeneity of the Malay vote across the different regions in the peninsular and East Malaysia. (6)

Against this backdrop, this article seeks to identify the variable that matters the most in predicting Malaysia's electoral outcomes, and in particular the variable with the greatest predictive power of the BN's vote share and its ensuing electoral defeat. This predictive model framework will be of practical interest to politicians and political parties seeking to craft the appropriate political strategies to win elections.

We work within the constraints of publicly available constituencylevel data (which is also usually the information available to political parties and strategists) and apply the random forest machine learning algorithm (7) to compute variable importance measures for the predictors considered in this study. As its name suggests, the random forest is a forest of regression trees whereby each regression tree is fitted to a random sub-sample of the initial data set to predict the response variable using the predictors constructing the tree. The random forest then pools all the trees' predictions to generate a pooled prediction for the response variable. Because of its predictive focus, the random forest algorithm allows the predictive power of each predictor to be assessed through the computation of the variable importance index. The variable importance index of a predictor essentially measures the increase in prediction error of the response variable (i.e., the decrease in prediction accuracy) when the values of the predictor are randomly shuffled. Predictors with a lower index are therefore deemed less important in predicting the response variable as randomly shuffling their values does not severely affect their predictive accuracy for the response variable. Such use of this approach adds to a growing list of recent studies that have also applied machine learning techniques to address a diverse range of other issues, e.g. understanding the impact of voter turnout, (8) the impact of the media on public policy (9) and forecasting supreme court decisions. (10)

We confine our analysis to the 165 constituencies in Peninsular Malaysia. The exclusion of East Malaysia, comprising Sabah and Sarawak, is due to their distinct politics from the peninsula. (11) For example, East Malaysia's political parties are mostly autonomous from those in the peninsula. (12) Thus, they are relatively insulated from peninsular politics. (13) They are, however, prone to aligning themselves with the ruling coalition of the day with the view of maintaining cordial relations with the federal government, as evidenced by their shifting allegiances following GE14 and the subsequent downfall of the PH government in February 2020.

Our study shows two key findings. First, the variable importance measures reveal that urbanization had the greatest predictive power in predicting the BN's vote share in GE14, followed by ethnicity. This challenges the common notion that ethnicity is the salient predictor of electoral support. (14) It also warrants caution on recent work that disputes the relevance of urbanization in Malaysian politics. (15) Second, the predictors' marginal effects on the BN's vote share are highly non-linear and would have been difficult to be picked up by conventional regression methods. This demonstrates the random forest as a valuable tool in the political scientist's toolbox. It is a non-parametric data-driven approach that allows the data to "tell the story" while avoiding the assumptions that could underlie the data generating process. (16)

The rest of this article proceeds as follows. First, we discuss the predictor variables--ethnicity, malapportionment, regionalism, urbanization and three-corner contests--that we have included in our analysis based on the extant Political Science literature. Second, we describe how the predictors' variable importance measures are computed using random forests. We next report the findings from the analysis and conclude with some salient insights from the results.

Predictors of the BN's Vote Share

This section discusses the variables that are potential predictors of the BN's vote share in GE14, in the sense that they are likely to have an association with the BN's vote share, though not necessarily a causal one.

Ethnicity and Redelineation

Ethnicity is still primarily recognized as a salient indicator in predicting voting behaviour. However, it is inadequate by itself since other variables may be of greater predictive value in forecasting voting behaviour. (17) The role of ethnicity has been amplified by the redelineation of seats. In the run-up to GE14, constituencies were redrawn in a way that the (largely non-Malay) opposition voters were consolidated into fewer seats, while the number of putatively pro-BN districts were increased by moving Malay localities into previously marginal districts. (18) The number of Malay-majority seats increased from 119 in GE13 to 122 in GE14, while seats with only slim 50 to 60 per cent Malay majorities declined. (19) The number of ethnically-mixed seats (defined as no single ethnic group being more than 70 per cent of the voting population) was reduced from 29 to 24 seats. (20) (It is interesting to note that throughout the 1990s, the BN's best performances were in these mixed-seats, prompting the creation of more such seats in the 1993 constituency delineation exercises.) (21)

In all, almost two-thirds of the peninsula's parliamentary seats in the GE14 were at least 60 per cent Malay. (22) This represented a deliberate move by the BN, since the strategy of reducing the proportion of non-Malay voters in Malay-majority seats have helped the BN maintain its parliamentary two-thirds majority since the country's independence until 2008.


Malapportionment is another variable that could have a predictive effect on voting behaviour. (23) Due to the BN's longevity in power, malapportionment has gone relatively unchecked in Malaysia. By manipulating the size of the electorate in different constituencies, i.e. by creating "over-represented" urban seats with a large number of electors and "under-represented" rural seats with low number of voters, the BN has ensured that a rural vote has more value and weight than an urban vote. (24) This means that even if a coalition were to win the popular vote through their strong electoral performance in urban areas, it may not necessarily secure the majority of parliamentary seats--precisely the outcome that confronted the opposition Pakatan Rakyat coalition in GE13. (25) The historical malapportionment that has taken place is well documented, (26) making a one-person-onevote doctrine difficult to overturn under such circumstances. (27) With a pliant Electoral Commission, it is not difficult for an incumbent government to continue the malapportionment process to inflate its vote share. (28)


Kai Ostwald and Stephen Oliver have suggested that the results of GE14 can be grouped into four broad geographical spheres, identifying commonalities of voting patterns in each of the four following regions: (i) the northeast of Peninsular Malaysia, (ii) the Peninsula Malay (parliamentary constituencies in the peninsula with predominantly Malay electorate), (iii) the Peninsular Diverse (parliamentary constituencies in the peninsula with an ethnicallydiverse electorate), and (iv) East Malaysia (Sabah and Sarawak). (29) According to this framework, Parti Islam Se-Malaysia (PAS) is entrenched in the northeast, while the PH has the upper hand in the Peninsular Diverse constituencies. The seats in Peninsular Malay and East Malaysia are, however, considered to be more electorally open. Given...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT