Sunday, August 29, 2010

Classification and probability

Classification is the process of putting people, animals or things into groups or classes according to certain characteristics. So we classify people into children and adults according to age or men and women according to sex. We may subdivide adults into young, middle aged and old according to certain limits of age that we allocate for each group. We may classify diseases according to the organ they primarily involve into e.g. heart diseases, lung diseases, kidney diseases and so on.
Sharing a certain characteristic may indicate sharing other characteristics e.g. old people may be more prone to certain diseases than younger adults or children. These classifications which deal with groups have important effects on our dealing with individuals. In medicine we usually translate the occurrence (incidence or prevalence) of a disease in the group into a probability in the individual. For example, if 50% of old people have osteoporosis, when we have an elderly patient we say he has a 50% chance (probability) of having osteoporosis. If 80% of heart failure in elderly people is caused by atherosclerosis and we have an elderly patient with heart failure we think that the cause of his heart failure is atherosclerosis with a probability of 80% and so on. This translation of occurrence into probability is logical. However it may lead to an erroneous conclusion if done blindly. The following example explains this.
Students are taught that in children nephrotic syndrome is caused by minimal change glomerulonephritis in 80% of cases while only 20% of cases in adults are caused by minimal change disease. The most common cause of nephrotic syndrome in adults, they are taught, is membranous glomerulonephritis. Consequently when they see an 18 or 20 year old man with nephrotic syndrome and you ask them what the most likely cause of his nephrotic syndrome is, many will answer membranous glomerulonephritis because he is an adult. The answer is the effect of classifying adults who represent a very large and heterogeneous collection as one group or class. When adults are taken together membranous glomerulonephritis is the most common cause because the group includes many middle aged and elderly people in whom this disease is the commonest cause. That is not the case in young adults. The 80% likelihood of a minimal change disease being the cause of nephrotic syndrome in a child does not dramatically drop to 20% when the child reaches an age that puts him in the adult category. Nature does not change its behavior according to the limits we use in our classification. The 80% chance in say a 5 year old child may become 75 in a 10 year old, 65 in a 15 year old and 60 in a 20 year old. The probability may drop to 20% in a 40 or 50 year old and to less than that in a 60 or 70 year old man (these imaginary figures are only to explain the idea and are not claimed to be real). In other words minimal change is still the most likely cause of nephrotic syndrome in the very young adult and its probability decreases gradually as the age advances. Classification helps us to understand and remember various scientific facts but we have to keep in mind that the sharp boundaries between classes or subclasses are frequently artificial. They are created by us and not necessarily present in nature.

