No screening option and dummy coding

ASAL2433 · Post by **ASAL2433** » Tue Mar 05, 2024 12:05 pm

Hello!

I am in the process of analysing the pilot data from a DCE, to give me information on priors for the experimental design.

The DCE contains 3 alternatives: program A, program B and no screening. I have a categorical variable 'conditions covered', which I have dummy coded: (1) lots of conditions and (0) less conditions (see below). So if lots of conditions is shown, I indicate this as a 1 in the 'cond' column. If less conditions is shown, I indicate this with a 0. In the no screen option (alt 3), I have put a 0 in all columns.

ID SET CSET Alt CHOICE Cond
4 1 3 1 1 1
4 1 3 2 0 0
4 1 3 3 0 0
4 2 3 1 1 1
4 2 3 2 0 0
4 2 3 3 0 0
4 3 3 1 1 0
4 3 3 2 0 1

Is this the correct approach? Or will there be confusion because the 0s indicate different things i.e. less conditions, and then nothing in the no screen option. Alternatively, should I be coding conditions covered using 3 levels: (2) lots of conditions, (1) less conditions and then (0) no conditions (only present in the no screening option)?

ASAL2433 · Post by **ASAL2433** » Wed Mar 06, 2024 1:02 pm

Or should I be coding the no screening attibute values as -999?

johnr · Post by **johnr** » Wed Mar 06, 2024 3:45 pm

Hi

No, the original data format is correct. Treating it as -999 in Nlogit will get the same result (if you use it correctly, which I will explain below). If you think about the utility functions, you have

u(A) = ... + B1*Con + ...
u(B) = ... + B1*Con + ...
u(C) = 0

Your concern is, if Con = 0, this collapses to

u(A) = ... + B1*0 + ... = ... + 0 + ...
u(B) = ... + B1*0 + ... = ... + 0 + ...
u(C) = 0

such that the effect of the base level of the dummy code is the same as the non-chosen utility. This has been a key argument for years, where many have stated that the base level of dummy codes are perfectly confounded with the constants in the model. This led to arguments about effects coding versus dummy coding. We now know better, where the two really are the same, just different scaling, hence it doesn't really matter which coding structure you choose to use. That aside, B1 represents the effect of moving from condition = 0 to condition = 1 for utility functions A and B. The issue is that the base condition is confounded with something else, which is okay., the effect of B1 tells us relative to condition = 0, the impact of moving to condition 1 is B1. This is what you want.

So why is -999 in Nlogit the same thing? Because this is a missing number, which will drop the observation if present. Hence, if you had

u(A) = ... + B1*Con + ...
u(B) = ... + B1*Con + ...
u(C) = B1 * -999

It would drop the entire observation from your analysis. Given that alternative C is in every choice task in your data, then it would drop all the data and not estimate a model at all. Hence, you would drop the variable from your utility function and end up with

u(A) = ... + B1*Con + ...
u(B) = ... + B1*Con + ...
u(C) = 0

which funnily enough is what we started with. Hence, my opening statement, Nlogit will get the same result (if you use it correctly - that is, not include that variable in utility function C).

John

Michiel Bliemer · Post by **Michiel Bliemer** » Wed Mar 06, 2024 4:02 pm

For those of you estimating choice models using Nlogit, Apollo, or Biogeme, you may want to consider posting questions on user groups dedicated to these software tools:

Nlogit: https://www.limdep.com/listserver/
Apollo: http://www.apollochoicemodelling.com/forum/
Biogeme: https://groups.google.com/g/biogeme?pli=1

Note that different softwares require different data formats.

choice-metrics.com

No screening option and dummy coding

No screening option and dummy coding

Re: No screening option and dummy coding

Re: No screening option and dummy coding

Re: No screening option and dummy coding