Help to understand categorical variable results in regression

Hello again. What I understand is that ideally you would like to model the probability of someone getting nausea, using the cancer types as your logistic model's covariates. This is called relative risk regression and it is an alternative to logistic regression. Unfortunately, logistic regression models the odds of someone feeling nausea, defined as: P(nausea=yes|covariates)/(1-P(nausea=yes|covariates). One of the main reasons we use odds is that P(nausea=yes|covariates)=p is a probability and thus ranges in [0,1] and hence p/(1-p) spans the entire positive line and thus the logarithm of the odds spans the whole real line. If you tried to model the probability of someone getting nausea then you would have: Log(p)=bo+b1x1+…+bnxn. However, p is in [0,1] and hence log(p) must be negative and thus we need to impose a linear constraint on the linear combination bo+b1x1+…+bnxn<=0. This creates problems in finding the MLE as the standard Fisher scoring algorithm would likely not converge to a solution given the fact that it is an unconstrained optimization algorithm. You may find these links useful: https://www.theanalysisfactor.com/the-difference-between-relative-risk-and-odds-ratios/ and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3305608/. Does that make any sense or I am way off and this wasn’t what you wanted to ask?

/r/statistics Thread