Variable Selection for the Stunting Model

Data Preparation for Chronic Malnutrition Models in Children.

Evaluation of the correlation of the predictor variables

In this section, the correlation between predictor variables will be evaluated to identify potential issues of multicollinearity.

Based on the graph, we should analyze the nutritional variables with caution, as some of them exhibit very high correlations. Similarly, the immunological variables also show patterns of multicollinearity and should be studied individually.

Based on the results, to maximize the information provided to the model, it is decided to retain the two variables that show the least correlation with each other and with the others, namely, the variables associated with zinc and iron.

Regarding the immunological variables, it is decided to group the variables that show very high correlations among themselves to reduce multicollinearity. Thus, variables Refuerzo 1 and Refuerzo 2 are created to combine polio and DPT vaccines, respectively. Additionally, a numerical variable is created to aggregate the number of routine vaccines received by the individual.

Based on the final graph, it is decided to retain these variables in the analysis model.

Quasi-Zero Variance Study

In this section, variables with nearly zero variance will be evaluated, as they contribute little information to the model.

Variable

Frequency Ratio

Unique percentage

NZV

parentesco

2.493274

0.9950249

FALSE

edad

1.176471

0.6218905

FALSE

sexo

1.025189

0.2487562

FALSE

etnia

1.496894

0.2487562

FALSE

departamento

1.776471

2.7363184

FALSE

area

1.955882

0.2487562

FALSE

miembros_hogar

1.116438

1.6169154

FALSE

grado_estudios_hogar

1.696429

1.3681592

FALSE

propiedad

6.698925

0.4975124

FALSE

tipo_vivienda

30.640000

0.7462687

TRUE

material_paredes

2.710692

0.9950249

FALSE

material_techo

5.601770

0.7462687

FALSE

material_piso

1.234043

0.7462687

FALSE

n_cuartos

1.164751

0.9950249

FALSE

tipo_sanitario

2.654545

0.2487562

FALSE

fuente_agua

2.555556

0.8706468

FALSE

recoleccion_basura

3.953125

0.4975124

FALSE

electricidad

5.330709

0.2487562

FALSE

telefonia_fija

31.160000

0.2487562

TRUE

computadora

8.458824

0.2487562

FALSE

fe_total

1.000000

99.7512438

FALSE

zn_total

1.500000

97.1393035

FALSE

prot_total

1.000000

100.0000000

FALSE

lys_total

1.000000

100.0000000

FALSE

trp_total

1.000000

100.0000000

FALSE

per_vegetales

1.000000

99.1293532

FALSE

vacuna_gripe

1.913043

0.2487562

FALSE

vacuna_neumococo

1.045802

0.2487562

FALSE

vacuna_hepatitis_b1

1.132626

0.2487562

FALSE

vacuna_tuberculosis

16.478261

0.2487562

FALSE

vacuna_refuerzo_1

1.671096

0.2487562

FALSE

vacuna_refuerzo_2

6.730769

0.2487562

FALSE

n_vacunas_rutina

5.954545

1.2437811

FALSE

In this case, we observe that there are two variables with almost zero variance and, therefore, will not provide relevant information to the model.

Yeo-Johnson Evaluation of Continuous Predictor Variables

In this section, the normality of continuous variables will be assessed to determine if they need to be transformed.

Given these results, all variables exhibit quasi-normality. It should be noted that the variables for the number of rooms and household members had already been adjusted in the models of the nutritional variable profiles.

Volver arriba