The problem with this data is that there are a lot of external factors that are usually not considered in the data-set, like for example that a poorer family won’t have time to breast feed because the mom has to go back to work asap.
You will not always accurately flatten out socioeconomic, etc factors because there are no test-subjects per se, but rather just pooling.
Interesting video on the topic (Sci Show)