I am perfomring a large-scale data analysis combining several IR files from different countries being conducted between 2003 - 2018. Therefore, I need to interpret the cluster values (variable v001) in a right manner. In several publications I have noticed that the values of v001 for one country are combined. Thus, the total number of cluster is significantly lower than the total number of women being interviewed.

However, my understanding was that the cluster classification (e.g. v001 ="1") cannot be compared over different waves and that in each survey the clusters are classified uniquely and thus cannot be aggregated per country. Following my logic, the total number of cluster must equal the total number of women being interviewed. Is that right?

Thank you very much for your help in advance!

Warm greetings from Germany,

Bianca]]>

Your logic is correct, the survey clusters cannot be combined across surveys; it is a survey related variable. Therefore when several surveys are combined across surveys, v001 should be reconstructed so that each cluster is unique throughout the combined dataset, for examples, when two surveys are combined with 100 clusters in survey 1 and 150 clusters in survey 2, the combined v001 variable should have 250 clusters. With that said, your sentence of "the total number of cluster must equal the total number of women being interviewed" is not correct. The total number of clusters has to be less than the number of completed women as we interview 20-30 HHs per cluster.

]]>