help tostranksum
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


Title

    tostranksum -- Two-sample rank sum test for stochastic equivalence


Syntax

    Two-sample stochastic equivalence rank sum test

        ranksum varname [if] [in], by(groupvar) [, eqvtype(type) eqvlevel(#) uppereqvlevel(#) ccontinuity alpha(#) relevance]


    tostranksum options    Description
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Main
    * by(groupvar)         grouping variable
      eqvtype(string)      specify equivalence threshold with Delta or epsilon
      eqvlevel(#)          the level of tolerance defining the equivalence interval
      uppereqvlevel(#)     the upper value of an asymmetric equivalence interval
      ccontinuity          include a continuity correction
      alpha(#)             set nominal type I level; default is alpha(0.05)
      relevance            perform & report combined tests for difference and equivalence
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    * by(groupvar) is required.
    by is allowed with tostranksum see [D] by.


Description

    tostranksum tests the stochastic dominance of two independent samples (that is, unpaired or unmatched data) by using the z approximation to the Wilcoxon rank-sum test, which was generalized to different
    sample sizes by the Mann-Whitney two-sample statistic (Wilcoxon 1945; Mann and Whitney 1947) in a two one-sided tests approach (Schuirmann, 1987).  In rank sum tests for 0th-order stochastic difference,
    the null hypothesis is that there is stochastic equality between two populations, so Ho: P(X > Y) = 0.5), with Ha: P(X>Y) ≠ 0.5.  When performing tests for stochastic equivalence, the null hypothesis is
    that one population stochastically dominates the other by at least as much as the equivalence interval defined by some chosen level of tolerance (as specified by eqvtype and eqvlevel).


    With respect to the rank sum test, a negativist null hypothesis takes one of the following two forms depending on whether tolerance is defined in terms of Delta (equivalence expressed in the same units as
    the summed ranks) or in terms of epsilon (equivalence expressed in the units of the Z distribution):

        Ho: |W - E(W)| >= Delta,
        where the equivalence interval ranges from (W - E(W))-Delta to (W - E(W))+Delta, and where W is the rank-sum statistic and E(W) is its mean if there is no stochastic dominance. This translates directly
        into two one-sided null hypotheses:

            Ho1: Delta - [W - E(W)] <= 0; and

            Ho2: [W - E(W)] + Delta <= 0

        -OR-

        Ho: |Z| >= epsilon,
        where the equivalence interval ranges from -epsilon to epsilon.  This also translates directly into two one-sided null hypotheses:

            Ho1: epsilon - Z <= 0; and

            Ho2: Z + epsilon <= 0

        When an asymmetric equivalence interval is defined using the uppereqvlevel option the general negativist null hypothesis becomes:

        Ho: [W - E(W)] <= Delta_lower, or [W - E(W)] >= Delta_upper,
        where the equivalence interval ranges from [W - E(W)] + Delta_lower to [W - E(W)] + Delta_upper.  This also translates directly into two one-sided null hypotheses:

            Ho1: Delta_upper - [W - E(W)] <= 0; and

            Ho2: [W - E(W)] - Delta_lower <= 0

        -OR-

        Ho: Z <= epsilon_lower, or Z >= epsilon_upper,

            Ho1: epsilon_upper - Z <= 0; and

            Ho2: Z - epsilon_lower <= 0
 
    NOTE: the appropriate level of alpha is precisely the same as in the corresponding two-sided test for stochastic dominance, so that, for example, if one wishes to make a type I error %5 of the time, one
    simply conducts both of the one-sided tests of Ho1 and Ho2 by comparing the resulting p-value to 0.05 (Wellek, 2010).

    tostranksum is for use with unpaired/unmatched data.  For equivalence tests on paired/matched data, see tostsignrank.


Options for ranksum

        +------+
    ----+ Main +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    by(groupvar) is required.  It specifies the name of the grouping variable.

    eqvtype(string) defines whether the equivalence interval will be defined in terms of Delta or epsilon (delta, or epsilon).  These options change the way that evqlevel is interpreted: when delta is
        specified, the evqlevel is measured in the units of the rank sums, and when epsilon is specified, the evqlevel is measured in multiples of the standard deviation of the Z distribution; put another way
        epsilon = Delta/standard error.  The default is epsilon.

        Defining tolerance in terms of epsilon means that it is not possible to reject any test for mean equivalence Ho if epsilon <= the critical value of z for a given alpha.  Because epsilon = Delta/standard
        error, we can see that it is not possible to reject any Ho if Delta <= the product of the standard error and critical value of z for a given alpha.  tostranksum reports when either of these conditions
        obtain.  Given that the variance of rank sum distributions can be very large, tolerance should be specified using delta only with great care

    eqvlevel(#) defines the equivalence threshold for the tests depending on whether eqvtype is delta or epsilon (see above).  Researchers are responsible for choosing meaningful values of Delta or epsilon.
        The default value is 1 (certain to be meaningless) when delta is the eqvtype and 2 when epsilon is the eqvtype.

    uppereqvlevel(#) defines the upper equivalence threshold for the test, and transforms the meaning of eqvlevel to mean the lower equivalence threshold for the test.  Also, eqvlevel is assumed to be a
        negative value.  Taken together, these correspond to Schuirmann's (1987) asymmetric equivalence intervals.  If uppereqvlevel==|eqvlevel|, then uppereqvlevel will be ignored.

    ccontinuity specifies that the test statistics incorporate a continuity correction using |W-E(W)|-0.5, but retaining the sign of the z-statistic after the correction has been applied (see eqvtype above).

    alpha(#) specifies the nominal type I error rate.  The default is alpha(0.05).

    relevance reports results and inference for combined tests for stochastic difference and stochastic equivalence for a specific alpha, eqvtype, and eqvlevel.  See the end of the Discussion section in tost
        for more details on inference from combined tests.


Remarks

    Following Tryon and Lewis (Tryon and Lewis 2008), when rejection decisions from both tests for stochastic dominance and tests for stochastic equivalence are combined, there are four possible
    interpretations for a given alpha and epsilon or Delta:

    1.  One may reject the positivist Ho, but fail to reject the negativist Ho, and conclude that there is relevant 0th-order stochastic dominance between the first and second groups which is at least as
        large as epsilon or Delta.

    2.  One may fail to reject the positivist Ho, but reject the negativist Ho, and conclude that there is 0th-order stochastic equivalence between the first and second groups within the equivalence range
        (i.e. defined by epsilon or Delta).

    3.  One may reject both the positivist Ho and the negativist Ho, and conclude that there is trivial 0th-order stochastic dominance between the first and second groups which lies within the equivalence
        range (i.e. defined by epsilon or Delta).

    4.  One may fail to reject both the positivist Ho, and the negativist Ho, and draw an indeterminate conclusion, because the data are underpowered to detect either 0th-order stochastic dominance or
        equivalence.


Examples

    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Example 1 setup
    . webuse fuel2

    1a Perform two-sample rank-sum relevance test on mpg by using the two
    groups defined by treat; equivalence interval is +/- 1 sd beyond the
    critical value of Z for alpha = 0.1.
    epsilon = invnormal(.9)+1 = 2.2815516
    . tostranksum mpg, by(treat) eqvt(epsilon) eqvl(2.2815516) alpha(.1) rel


    1b Perform asymmetric rank-sum relevance test on mpg by using the two
    two groups defined by treat, and add a continuity correction.
    The lower end of the equivalence interval = invnormal(.9)+1=2.2815516
    meaning equivalence must lie no more than 1 sd beyond the critical value
    of Z for alpha = 0.1.  The upper end of the equivalence interval
    = invnormal(.9)+.5 = 1.7815516 meaning equivalence must lie no more than
    0.5 sd beyond the critical value of Z for alpha = 0.1.
    . tostranksum mpg, by(treat) eqvt(epsilon) eqvl(2.2815516) upper(1.7815516) cc alpha(.1) rel
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Saved results

    tostranksum saves the following in r():

    Scalars   
      r(N_1)         sample size n_1
      r(N_2)         sample size n_2
      r(z1)          z statistic for Ho1 (upper)
      r(z2)          z statistic for Ho2 (lower)
      r(p1)          P(Z >= z1)
      r(p2)          P(Z >= z2)
      r(Var_a)       adjusted variance
      r(group1)      value of variable for first group
      r(sum_obs)     actual sum of ranks for first group
      r(sum_exp)     expected sum of ranks for first group
      r(Delta)       Delta, tolerance level defining the equivalence interval; OR
      r(Du)          Delta_upper, tolerance level defining the equivalence interval's upper side; AND
      r(Dl)          Delta_lower, tolerance level defining the equivalence interval's lower side; OR
      r(epsilon)     epsilon, tolerance level defining the equivalence interval
      r(eu)          epsilon_upper, tolerance level defining the equivalence interval's upper side; AND
      r(el)          epsilon_lower, tolerance level defining the equivalence interval's lower side
      r(relevance)   Relevance test conclusion for given alpha and Delta/epsilon


Author

    Alexis Dinno
    Portland State University
    alexis.dinno@pdx.edu

    Development of tost is ongoing, please contact me with any questions, bug reports or suggestions for improvement.  Fixing bugs will be facilitated by sending along:

        (1) a copy of the data (de-labeled or anonymized is fine),
        (2) a copy of the command used, and
        (3) a copy of the exact output of the command.


Suggested citation

    Dinno, A.  2025. tostranksum: Two-sample rank sum test for stochastic equivalence.  In: tost Stata software package.  URL: https://www.alexisdinno.com/stata/tost.html


References

    Mann, H. B., and Whitney, D. R.  1947. On a test whether one of two random variables is stochastically larger than the other.  Annals of Mathematical Statistics 18: 50-60.

    Schuirmann, D. A.  1987.  A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability.  Journal of Pharmacokinetics and
        Biopharmaceutics.  15: 657-680

    Tryon, W. W., and Lewis, C.  2008.  An inferential confidence interval method of establishing statistical equivalence that corrects Tryon's (2001) reduction factor.  Psychological Methods.  13: 272-277

    Wellek, S. 2010. Testing Statistical Hypotheses of Equivalence and Noninferiority, second edition.  Chapman and Hall/CRC Press. p. 31

    Wilcoxon, F.  1945.  Individual comparisons by ranking methods.  Biometrics Bulletin 1: 80-83.


Also See

      Help: tost, pkequiv, ranksum
