I recently started exploring biplot approximation for Casebook data, as some aspects of Casebook data makes it inherently high-dimensional (e.g., we collect a bunch of data on substance abuse history of child and parent, whether caretaker was unable to cope, whether child has disability through quizzes when a "removal episode" for a child starts), and hence, comparing entry cohorts, and even children, in terms of these attributes is interesting. I tried two types of biplots so far: 1) biplot based on PCA and 2) correspondence analysis. Both the techniques are based on SVD, but the situations where these two are applied are different: PCA-based biplot is used mostly when we have observations about attributes which are continuous in nature, e.g., for each entry cohort from 2008 to 2012, I took the percentage of removal episodes with male children, and percentage of removal episodes where the children were reported to have problems like substance abuse, disability etc. So my "observations" were the entry cohorts, and my "attributes" were these percentages. The nice thing about PCA-based biplot is that it plots the "observations" and the "attributes" on the same 2D plot, which reveals how close or far the observations are from each other, how close or far the attributes are from each other, and how the attributes are related to the observations. This paper by Gabriel laid the foundation of PCA-based biplot. In summary, the PCA-based biplot approximation takes the projection of each observation and each attribute along the first two principal components, and uses them as the coordinates to plot them on a 2D plot. Some of the interesting findings from our data using PCA-based biplot were: child's drug abuse, child's alcohol abuse and physical abuse were closely related; parent's alcohol abuse, caretaker's inability to cope, inadequate housing for the child and relinquishment were once again very close; and so were child's behavioral problem and incarceration of parent. The first principal component explained 49% of the variation in the data, while the second explained 32%; so 81% of the total variance was explained by the first two. Each principal component was a linear combination of 16 variables, and the total variance got explained by 5 principal components.
Correspondence analysis, on the other hand, is used to see how two (or more) categorical covariates are related to each other. In that sense, it is very related to the chi-square test of independence. I found this document a fairly easy-to-understand, and very intuitive explanation of CA; while this is a more mathematical one. I also liked the way this article compares the inertia of a row in a contingency table with the physical concept of angular inertia, and especially the fact that it makes the following point
"Correspondence analysis provides a means of representing a table of distances in a graphical form, with rows represented by points, so that the distances between points approximate the distances between the rows they represent.".
In our case, the permanency outcome of a child can be adoption, guardianship, reunification (the more desirable ones); or, transfer to another placement, transfer to a collaborative care, or the kid's running away resulting in a dismissal of wardship (the less desirable ones). On the other hand, the type of the provider for a removal episode can be a placement provider, a residential resource, a foster family, or even a person. We created the contingency table with the permanency outcomes on the rows and the provider types as the columns, and applied correspondence analysis on that. The important structure that CA revealed was that placement in a foster family leads more often to adoption, guardianship and to some extent, reunification - in summary, to the more desirable outcomes, whereas placement with a placement provider or a residential resource leads more often to emancipation (aging out), or transfer to another agency, or the child being placed in a collaborative care. The first two eigenvalues were 0.065 and 0.0145, and the total inertia (which can be shown to be the chi-square statistic for the contingency table divided by the sum of all cell values) was 0.084, so the first two dimensions explained 0.065/0.084 = 78% and 0.0145/0.084 = 17% of the inertia, respectively. Among the rows, the biggest contributors to the inertia of 0.084 were adoption (26%), transfer to another agency (25%) and guardianship (22%); and among the columns, the biggest contributors were residential resource (40.6%), placement provider (26%) and foster family (24%).
Correspondence analysis, on the other hand, is used to see how two (or more) categorical covariates are related to each other. In that sense, it is very related to the chi-square test of independence. I found this document a fairly easy-to-understand, and very intuitive explanation of CA; while this is a more mathematical one. I also liked the way this article compares the inertia of a row in a contingency table with the physical concept of angular inertia, and especially the fact that it makes the following point
"Correspondence analysis provides a means of representing a table of distances in a graphical form, with rows represented by points, so that the distances between points approximate the distances between the rows they represent.".
In our case, the permanency outcome of a child can be adoption, guardianship, reunification (the more desirable ones); or, transfer to another placement, transfer to a collaborative care, or the kid's running away resulting in a dismissal of wardship (the less desirable ones). On the other hand, the type of the provider for a removal episode can be a placement provider, a residential resource, a foster family, or even a person. We created the contingency table with the permanency outcomes on the rows and the provider types as the columns, and applied correspondence analysis on that. The important structure that CA revealed was that placement in a foster family leads more often to adoption, guardianship and to some extent, reunification - in summary, to the more desirable outcomes, whereas placement with a placement provider or a residential resource leads more often to emancipation (aging out), or transfer to another agency, or the child being placed in a collaborative care. The first two eigenvalues were 0.065 and 0.0145, and the total inertia (which can be shown to be the chi-square statistic for the contingency table divided by the sum of all cell values) was 0.084, so the first two dimensions explained 0.065/0.084 = 78% and 0.0145/0.084 = 17% of the inertia, respectively. Among the rows, the biggest contributors to the inertia of 0.084 were adoption (26%), transfer to another agency (25%) and guardianship (22%); and among the columns, the biggest contributors were residential resource (40.6%), placement provider (26%) and foster family (24%).
I have been searching to find a comfort or effective procedure to complete this process and I think this is the most suitable way to do it effectively. ExcelR Data Scientist Courses In Pune
ReplyDelete
ReplyDeleteGreat to become visiting your weblog once more, it has been a very long time for me. Pleasantly this article i've been sat tight for such a long time. I will require this post to add up to my task in the school, and it has identical subject along with your review. Much appreciated, great offer. data science course in nagpur
mmorpg oyunlar
ReplyDeleteİNSTAGRAM TAKİPÇİ SATİN AL
Tiktok jeton hilesi
Tiktok Jeton Hilesi
antalya saç ekimi
referans kimliği nedir
instagram takipçi satın al
metin2 pvp serverlar
İnstagram takipçi satın al
tül perde modelleri
ReplyDeleteMOBİL ONAY
mobil odeme bozdurma
NFT NASİL ALINIR
ANKARA EVDEN EVE NAKLİYAT
TRAFİK SİGORTASİ
dedektor
Kurma Website
Ask romanlari
Smm panel
ReplyDeleteSmm Panel
iş ilanları
instagram takipçi satın al
https://www.hirdavatciburada.com
beyazesyateknikservisi.com.tr
servis
Tiktok Hile İndir
yurtdışı kargo
ReplyDeletelisans satın al
özel ambulans
nft nasıl alınır
uc satın al
minecraft premium
en son çıkan perde modelleri
en son çıkan perde modelleri