该【assumption lean regression richard berk参考-匠人 】是由【熙凤】上传分享,文档一共【29】页,该文档可以免费在线阅读,需要了解更多关于【assumption lean regression richard berk参考-匠人 】的内容,可以使用淘豆网的站内搜索功能,选择自己适合的文档,以下文字是截取该文章内的部分文字,如需要获得完整电子版,请下载此文档到您的设备,方便您编辑和打印。:..TheAmericanStatisticianISSN:0003-1305(Print)1537-2731(Online)Journalhomepage:https:///utas20AssumptionLeanRegressionRichardBerk,AndreasBuja,LawrenceBrown,e,ArunKumarKuchibhotla,WeijieSu&LindaShazoTocitethisarticle:RichardBerk,AndreasBuja,LawrenceBrown,e,ArunKumarKuchibhotla,WeijieSu&LindaShazo(2019):AssumptionLeanRegression,TheAmericanStatistician,DOI:.1592781Tolinktothisarticle:https:///.1592781Acceptedauthorversionpostedonline::10ViewCrossmarkdataFullTerms&essandusecanbefoundathttps://ion/journalInformation?journalCode=utas20:..AssumptionLeanRegressionRichardBerk1,2,AndreasBuja2,LawrenceBrown2,e2,ArunKumarKuchibhotla2,WeijieSu2,andLindaShazo21DepartmentofCriminology,UniversityofPennsylvania2DepartmentofStatistics,******@–Itiswellknownthatwithobservationaldata,,’,however,,,,inferenceshouldbebasedonsandwichestimatorsorthepairs(x-y),,manyofwhichareeffectivelyuntestable(Box,1976,Leamer,1878;Rubin,1986;AcceptedManuscriptCox,1995;Berk,2003;Freedman,2004;2009).Wediscussheresomeimplicationsofan“assumptionlean”,onerequiresonlythattheobservationsareiid,,theparametersoffittedmodels:..needtobeinterpretedasstatisticalfunctionals,herecalled“regressionfunctionals.”Foreaseandclarityofexposition,,.(2018a;b),aportionofwhichdrawsonearlyinsightsofHalbertWhite(1980).2TheParentJointProbabilityDistributionForobservationaldata,supposethereisasetofreal-valuedrandomvariablesthathaveajointdistributionP,alsocalledthe“population,”thatcharacterizesregressorvariablesXX,,?,theregressorvariablesarenotinterpretedasfixed;(1)1p??columnrandomvectorX???(1,,),fortheconditionalPP?Y,XPY|XdistributionofYgivenX,|XAcceptedManuscriptHence,,theregressorsbeingrandomvariables,:..AsafeatureofPor,moreprecisely,of,thereisa“trueresponsesurface”PY|Xdenotedby?(),?()XistheconditionalexpectationofYgivenX,?()[|]XEX?Y,butthereareotherpossibilities,,?(),wewillmakeuse,forPY|Xexample,ofstandardordinaryleastsquares(OLS),butinlatersections,;deviationsfromlinearityin?()Xmaybedifficulttodetectwithdiagnostics,orthelinearfitisknowntobeadeficientapproximationof?()Xandyet,OLSisemployedbecauseofsubstantivetheories,measurementscales,()X??βXtoYwithOLScanberepresentedmathematicallyatthepopulationPwithoutassumingthattheresponsesurface?()XislinearinX:(1)2βPEβX()argmin().???β?p?1[]YThevectorββP?()isthe“populationOLSsolution”andcontainsthe“populationcoefficients.”Notationally,whenwewriteβ,itisunderstoodtobeβP().Similartofinitedatasets,eptedManuscriptobtainedbysolvingapopulationversionofthenormalequations,resultinginβPEXXEX()[][].???1(2)Y:..Thus,oneobtainsthebestlinearapproximationtoYaswellasto?(),itcanbeusefulwithout(unrealistically)assumingthat?()XisidenticaltoβX?.Wehaveworkedsofarwithadistribution/populationP,,therefore,definedatargetofestimation:βP()obtainedfrom(1)and(2)-definedaslongasthejointdistributionPhassecondmomentsandtheregressordistributionisnotperfectlycollinear;thatis,thesecondmomentPXmatrixEXX[]??()“assumptionlean”or“modelrobust”?Indeed,thosewhoinsistthatmodelsmustalwaysbe“correctlyspecified”.“Improving”modelsbysearchingregressors,tryingouttransformationsofallvariables,inventingnewregressorsfromexistingones,usingmodelselectionalgorithms,performinginteractiveexperiments,,butAcceptedManuscriptfitthemtoowell().Researchisunderwaytoprovidevalidpost-selectioninference(.,,),,,however,indicatethat:..extensionsofBerketal.(2013)haveasymptoticjustificationsundermisspecification(,).Beyondthecostsofdatadredging,therecanbesubstantivereasonsfordiscouraging“modelimprovement.”Somevariablesmayexpressphenomenain“natural”or“conventional”,-,,,inBujaetal.(2018b)’smaximthatmodelsarealways“wrong”,therefore,isadiscussionofsomeoftheseconsequencesandanargumentinfavorofassumptionleaninferenceemployingmodelrobuststandarderrors,suchasthoseobtainedfromsandwichestimatorsorthex-?()XandβX?.:..showsthetrueresponsesurface?()??01?,*|?*linearapproximationisdenotedas??????yx()01andwillbecalledthe*“populationresidual.”Thevalueofδatx*posedintotwocomponents:?ponentresultsfromthedisparitybetweenthetrueresponsesurface,*,andtheapproximationβ0+β1x*.Wedenotethisdisparity?()xbyη=η(x*)andcallit“thenonlinearity.”Becauseβ0+β1x*isanapproximation,,thenonlinearityη(X)isarandomvariableaswell.?ponentofδatx*,denotedbyε,israndomvariationaroundthetrueconditionalmeanμ(x*).Wepreferforsuchvariationtheterm“noise”over“error.”Sometimesitiscalled“irreduciblevariation”,inwhichcasewewrite????YβX,?????()XβXand????Y(),butthesearenotassumptions,rather,theyareAcceptedManuscriptconsequencesofthedefinitionsthatconstitutetheaboveOLS-,thenonlinearityandthenoiseareall“population-orthogonal”totheregressors:EEXE()(())()??????(3)jjj:..Aswasalreadynoted,theseproperties(3)?isthepopulationOLSapproximationofYandalsoof?()(X0=1),thefacts(3)implythatallthreetermsaremarginallypopulationcentered:EEXE[][()][]0.??????(4)However,[|]()???,which,thoughmarginallycentered,isafunctionofXandhence,notindependentoftheregressors(unlessitvanishes).parison,thenoiseεismarginallyandconditionallycentered,EX[|]0??,butnotassumedhomoskedastic,andhence,,“errorterms”,,monstatisticalpracticethatsuchregressorsaretreatedasfixed(Searle,1970:Chapter3).Inprobabilisticterms,:..frequentistparadigm,alternativedatasetsgeneratedfromthesamemodelleaveregressorvaluesunchanged;,regressionmodelshavenothingtosayabouttheregressordistribution;theyonlymode
assumption lean regression richard berk参考-匠人 来自淘豆网www.taodocs.com转载请标明出处.