Adding total number of employed people for linear regression measuring job tenure before and after event

I think you've overcomplicated the problem. I would say that you're actually working with time series data, and you're using regression inappropriately.

Here's how I would approach the problem:

  1. Define what "low tenure" and "high tenure" is. Clearly in your mind there is a distinction. What is it? It will be arbitrary, but that's okay.

  2. Graph the headcount of employees by the newly created categorical variable "tenure category". Visually see what the change in headcount looks like. You might even be able to stop the analysis there.

  3. Hypothesis testing the mean tenure for the two groups with t test. Is the mean of the two groups significant? That gives you an answer. It might not be though.

Regression could still be used, but for what? You're trying to predict job tenure using time and headcount? I'm struggling to see why headcount would predict tenure logically, and be careful when working with "tenure" which increases with time. You might find that your time independent variable predicts tenure well because as time goes on, so does tenure. Sure it could still be lower or higher, but how are you going to safely conclude anything about the event's role? It's too convulated to be interpreted.

/r/AskStatistics Thread