# 가상의 dataset (exclude confounder `ability` which we assume to be unobserved)data = np.stack([education, income, voucher]).Tdf = pd.DataFrame(data, columns = ['education', 'income', 'voucher'])df.head()
education
income
voucher
0
5.638493
29.349797
1.974566
1
8.522389
46.239001
1.949956
2
12.840748
64.198096
2.253389
3
2.954071
15.591053
0.939508
4
10.117519
51.873490
1.779209
Using DoWhy to estimate the causal effect of education on future income
4가지 절차를 따릅니다.
문제를 인과 그래프로 모델링합니다. (model)
관측된 변수로부터 인과추론이 될 수 있는지 확인합니다. (identify)
인과효과를 추정합니다. (estimate)
인과효과 추정치의 Robustness 를 확인합니다. (refute)
step 1 : model
model =CausalModel( data = df, treatment ='education', outcome ='income', common_causes = ['U'], # unobserved confounding 을 임의의 U 컬럼에 넣는 것 instruments = ['voucher'] )model.view_model()from IPython.display import Image, displaydisplay(Image(filename="causal_model.png"))
Estimand type: nonparametric-ate
### Estimand : 1
Estimand name: backdoor1 (Default)
Estimand expression:
d
────────────(Expectation(income))
d[education]
Estimand assumption 1, Unconfoundedness: If U→{education} and U→income then P(income|education,,U) = P(income|education,)
### Estimand : 2
Estimand name: iv
Estimand expression:
Expectation(Derivative(income, [voucher])*Derivative([education], [voucher])**
(-1))
Estimand assumption 1, As-if-random: If U→→income then ¬(U →→{voucher})
Estimand assumption 2, Exclusion: If we remove {voucher}→{education}, then ¬({voucher}→income)
### Estimand : 3
Estimand name: frontdoor
No such variable found!
identify_effect()
Identified Estimand 를 리턴해주는 메소드로, Estimand 가 3가지 타입이 나옵니다.
Estimand 의 종류가 non-parametric ATE 일 때 Backdoor / IV / Frontdoor 와 같은 Identification methods 를 리턴합니다.
# step 3 : estimate# step 2 에서 Estimand 가 3가지가 있는데, 그중에서 IV 를 선택함. estimate = model.estimate_effect(identified_estimand, method_name ='iv.instrumental_variable', test_significance =True)print(estimate)
Estimand type: nonparametric-ate
### Estimand : 1
Estimand name: backdoor
Estimand expression:
d
────────────(Expectation(income))
d[education]
Estimand assumption 1, Unconfoundedness: If U→{education} and U→income then P(income|education,,U) = P(income|education,)
### Estimand : 2
Estimand name: iv
Estimand expression:
Expectation(Derivative(income, [voucher])*Derivative([education], [voucher])**
(-1))
Estimand assumption 1, As-if-random: If U→→income then ¬(U →→{voucher})
Estimand assumption 2, Exclusion: If we remove {voucher}→{education}, then ¬({voucher}→income)
### Estimand : 3
Estimand name: frontdoor
No such variable found!
Step 3: Estimate
#Choose the second estimand: using IV
estimate = model.estimate_effect(identified_estimand,
method_name="iv.instrumental_variable", test_significance=True)
print(estimate)
*** Causal Estimate ***
## Identified estimand
Estimand type: nonparametric-ate
### Estimand : 1
Estimand name: iv
Estimand expression:
Expectation(Derivative(income, [voucher])*Derivative([education], [voucher])**
(-1))
Estimand assumption 1, As-if-random: If U→→income then ¬(U →→{voucher})
Estimand assumption 2, Exclusion: If we remove {voucher}→{education}, then ¬({voucher}→income)
## Realized estimand
Realized estimand: Wald Estimator
Realized estimand type: nonparametric-ate
Estimand expression:
Expectation(Derivative(income, voucher))⋅Expectation(Derivative(education, vou
-1
cher))
Estimand assumption 1, As-if-random: If U→→income then ¬(U →→{voucher})
Estimand assumption 2, Exclusion: If we remove {voucher}→{education}, then ¬({voucher}→income)
Estimand assumption 3, treatment_effect_homogeneity: Each unit's treatment ['education'] is affected in the same way by common causes of ['education'] and income
Estimand assumption 4, outcome_effect_homogeneity: Each unit's outcome income is affected in the same way by common causes of ['education'] and income
Target units: ate
## Estimate
Mean value: 3.984270555650861
p-value: [0, 0.001]
해석 방법 교육(education)을 한 단위 높일 때, 소득(income)이 4.01 포인트 가량 높아진다는 의미를 가지는 Estimate (추정치)를 보입니다.
추정된 인과효과는, p-value 가 0 ~ 0.001 사이로 통계적으로 유의합니다.
그 다음으로는, Placebo refutation test 를 통해서 이 추정치의 강건성을 검사합니다. 이 테스트에서는, 독립확률변수(X)가 도구변수와의 상관관계를 유지하면서, Treatment를 대체하도록 합니다. 따라서 진실된 인과효과는 0이 됩니다.