ich beschaeftige mich zur zeit mit der vorhersage von takeover targets.
Dazu habe ich fuer den Zeitraum von 2001 bis 2010 ein sample von 30.000 Datensaetzen erstellt.
Bei der logit regression gegen die dependent variable ergeben sich leider einige Probleme und ich weis nicht wo ich mit der Fehlersuche ansetzen soll:
1: wenn ich die Regression basierend auf dem normalen sample laufen lasse, dann sind bis zu 9 variablen significant (was extrem viel ist, viele Artikel berrichten max 5 significante variablen). Der Pseudo-R2 is allerding extrems gering; ca 0.015. Das model selbst is significant. Der Goodness of fit test is deutlich insignificant, warum?
- Code: Alles auswählen
Iteration 0: log likelihood = -3026.2946
Iteration 1: log likelihood = -2990.7089
Iteration 2: log likelihood = -2989.0102
Iteration 3: log likelihood = -2989.008
Iteration 4: log likelihood = -2989.008
Logistic regression Number of obs = 14000
LR chi2(11) = 74.57
Prob > chi2 = 0.0000
Log likelihood = -2989.008 Pseudo R2 = 0.0123
-------------------------------------------------------------------------------
target | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
1 | -.0025268 .0016342 -1.55 0.122 -.0057298 .0006761
2 | -.0022575 .0011562 -1.95 0.051 -.0045235 8.60e-06
3 | .3179513 .1468876 2.16 0.030 .0300569 .6058457
4 | -.0029705 .0015846 -1.87 0.061 -.0060762 .0001352
5 | -.0010478 .001368 -0.77 0.444 -.003729 .0016335
6 | -.0022786 .0012236 -1.86 0.063 -.0046769 .0001196
7 | .0898029 .0233317 3.85 0.000 .0440737 .1355322
8 | .0129258 .0039623 3.26 0.001 .0051599 .0206917
9 | -.000975 .00104 -0.94 0.349 -.0030134 .0010634
a | -.0527474 .0209837 -2.51 0.012 -.0938748 -.0116201
b | 6.63e-06 .0000115 0.58 0.563 -.0000158 .0000291
_cons | -2.312054 .1854292 -12.47 0.000 -2.675488 -1.948619
-------------------------------------------------------------------------------
. estat gof
Logistic model for b_target, goodness-of-fit test
number of observations = 14491
number of covariate patterns = 14491
Pearson chi2(14479) = 14393.40
Prob > chi2 = 0.6915
2. Im normalen sample sind nur ca 5% der datensaetze targets, sprich die dependent variable is 1, der rest ist null. Wenn ich dieses ratio nun leicht erhoehe indem ich nicht-targets loesche fliegt mir das ganze model um die Ohren, vorzeichen wechseln, significance level verschwinden ...
- Code: Alles auswählen
Iteration 0: log likelihood = -2882.6865
Iteration 1: log likelihood = -2851.5667
Iteration 2: log likelihood = -2850.7642
Iteration 3: log likelihood = -2850.7635
Iteration 4: log likelihood = -2850.7635
Logistic regression Number of obs = 12000
LR chi2(19) = 63.85
Prob > chi2 = 0.0000
Log likelihood = -2850.7635 Pseudo R2 = 0.0111
-------------------------------------------------------------------------------
target | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
1 | -3.08e-06 3.58e-06 -0.86 0.390 -.0000101 3.94e-06
2 | -.0008045 .0017994 -0.45 0.655 -.0043313 .0027223
3 | -.0060498 .0032872 -1.84 0.066 -.0124926 .000393
4 | .004613 .0043432 1.06 0.288 -.0038996 .0131256
5 | .0017529 .0013366 1.31 0.190 -.0008667 .0043725
6 | .0009118 .0028433 0.32 0.748 -.0046609 .0064845
7 | -.0028163 .0014481 -1.94 0.052 -.0056545 .0000219
8 | -.0048172 .0017381 -2.77 0.006 -.0082238 -.0014105
9 | -.0019306 .0017917 -1.08 0.281 -.0054424 .0015811
a | -.0043457 .0017733 -2.45 0.014 -.0078212 -.0008702
b | -2.59e-07 9.07e-07 -0.29 0.775 -2.04e-06 1.52e-06
c | -.0002311 .0016232 -0.14 0.887 -.0034126 .0029504
d | -.0017947 .0015706 -1.14 0.253 -.0048731 .0012837
e | .0010832 .0015498 0.70 0.485 -.0019544 .0041207
f | -.000714 .0016686 -0.43 0.669 -.0039844 .0025565
g | .00269 .0047278 0.57 0.569 -.0065763 .0119563
h | -.0007821 .003495 -0.22 0.823 -.0076322 .006068
i | .0114988 .0040048 2.87 0.004 .0036496 .019348
j | .8714785 .2874443 3.03 0.002 .3080979 1.434859
_cons | -2.746042 .2695021 -10.19 0.000 -3.274256 -2.217828
-------------------------------------------------------------------------------
. estat gof
Logistic model for ub_tarrget, goodness-of-fit test
number of observations = 12109
number of covariate patterns = 12109
Pearson chi2(12089) = 12090.49
Prob > chi2 = 0.4945
Bin fuer jeden Hinweis dankbar