<TeXmacs|1.99.7>

<style|<tuple|article|std-latex>>

<\body>
  <\hide-preamble>
    <assign|sectemul|<macro|<number|<section-nr>|arabic>>>

    <assign|the-footnote|<macro|<number|<footnote-nr>|fnsymbol>>>

    <assign|be|<macro|>>

    <assign|ee|<macro|>>

    <assign|bea|<macro|>>

    <assign|eea|<macro|>>

    <assign|beaa|<macro|>>

    <assign|eeaa|<macro|>>

    <assign|disp|<macro|>>

    <assign|mbeta|<macro|<with|math-font-series|bold|mode|math|\<beta\>>>>

    <assign|malpha|<macro|<with|math-font-series|bold|mode|math|\<alpha\>>>>

    <assign|mb|<macro|<with|math-font-series|bold|mode|math|b>>>

    <assign|mx|<macro|<with|math-font-series|bold|mode|math|x>>>

    <assign|bmu|<macro|<with|math-font-series|bold|mode|math|\<mu\>>>>

    <assign|bomega|<macro|<with|math-font-series|bold|mode|math|\<omega\>>>>

    <assign|ms|<macro|<with|math-font-series|bold|mode|math|s>>>

    <assign|mt|<macro|<with|math-font-series|bold|mode|math|t>>>

    <assign|mX|<macro|<math|X>>>

    <assign|mxi|<macro|<with|math-font-series|bold|mode|math|\<xi\>>>>

    <assign|mw|<macro|<with|math-font-series|bold|mode|math|w>>>

    <assign|msx|<macro|<with|font-size|0.71|math-font-series|bold|mode|math|X>>>

    <assign|mmu|<macro|<with|math-font-series|bold|mode|math|u>>>

    <assign|my|<macro|<with|math-font-series|bold|mode|math|y>>>

    <assign|me|<macro|<with|math-font-series|bold|mode|math|\<varepsilon\>>>>

    <assign|md|<macro|<with|math-font-series|bold|mode|math|d>>>

    <assign|mdelta|<macro|<with|math-font-series|bold|mode|math|\<delta\>>>>

    <assign|mtheta|<macro|<with|math-font-series|bold|mode|math|\<theta\>>>>

    <assign|ba|<macro|<with|math-font-series|bold|mode|math|a>>>

    <assign|mz|<macro|<with|math-font-series|bold|mode|math|z>>>

    <assign|mee|<macro|<with|math-font-series|bold|mode|math|\<epsilon\>>>>

    <assign|ep|<macro|<math|\<varepsilon\>>>>

    <assign|mzero|<macro|<with|math-font-series|bold|mode|math|0>>>

    <assign|mbz|<macro|<with|math-font-series|bold|mode|math|Z>>>

    <assign|ma|<macro|\<cal-A\>>>

    <assign|mT|<macro|\<cal-T\>>>

    <assign|mI|<macro|\<cal-I\>>>

    <assign|mmm|<macro|\<cal-M\>>>

    <assign|mXa|<macro|<math|X<rsub|<ma>>>>>

    <assign|mXac|<macro|<math|X<rsub|<ma><rsup|c>>>>>

    <assign|mB|<macro|\<cal-B\>>>

    <assign|mXB|<macro|<math|X<rsub|<mB>>>>>

    <assign|mXBc|<macro|<math|X<rsub|<mB><rsup|c>>>>>

    <assign|cW|<macro|\<cal-W\>>>

    <assign|cC|<macro|\<cal-C\>>>

    <assign|cJ|<macro|\<cal-J\>>>

    <assign|cM|<macro|\<cal-M\>>>

    <assign|cN|<macro|\<cal-N\>>>

    <assign|cO|<macro|\<cal-O\>>>

    <assign|cF|<macro|\<cal-F\>>>

    <assign|cS|<macro|\<cal-S\>>>

    <assign|cD|<macro|\<cal-D\>>>

    <assign|cG|<macro|\<cal-G\>>>

    <assign|cU|<macro|\<cal-U\>>>

    <assign|SCAD|<macro|scad>>

    <assign|SSCAD|<macro|scad>>

    <assign|MCP|<macro|mcp>>

    <assign|mstackit|<macro|1|2| <move|<math|<above|<arg|2>|<with|font-size|0.84|<arg|1>>>>|0pt|-.2ex>
    >>

    <assign|sgn|<macro|sgn>>

    <assign|symbolfootnote|<macro|1|2|<assign|the-footnote|<macro|<number|<footnote-nr>|fnsymbol>>><footnote|[><arg|1>]<arg|2>>>

    <assign|no|<macro|<no-indent>>>
  </hide-preamble>

  <baselineskip>= 24pt

  <assign|the-theorem|<macro|<number|<section-nr>|arabic>.<number|<theorem-nr>|arabic>>><assign|the-remark|<macro|<number|<remark-nr>|arabic>>>

  <\center>
    <no-indent><with|font-size|1.41|A Novel Approach for Fast Detection of
    Multiple Change Points in Linear Models>

    <no-indent>Xiaoping Shi<rsup|<math|<text|<with|font-size|0.84|a>>>>,
    Yuehua Wu<rsup|<math|<text|<with|font-size|0.84|a>>>> and Baisuo
    Jin<rsup|<math|<text|<with|font-size|0.84|b>>>>

    <vspace|2fn><no-indent><rsup|<math|a>>Department of Mathematics and
    Statistics, York University, Toronto, Ontario, Canada;
    <rsup|<math|b>>Department of Statistics and Finance, University of
    Science and Technology of China, Hefei, Anhui, China
  </center>

  <vspace|2fn>

  <no-indent><with|font-series|bold|Abstract> A change point problem occurs
  in many statistical applications. If there exist change points in a model,
  it is harmful to make a statistical analysis without any consideration of
  the existence of the change points and the results derived from such an
  analysis may be misleading. There are rich literatures on change point
  detection. Although many methods have been proposed for detecting multiple
  change points, using these methods to find multiple change points in a
  large sample seems not feasible. In this article, a connection between
  multiple change point detection and variable selection through a proper
  segmentation of data sequence is established, and a novel approach is
  proposed to tackle multiple change point detection problem via the
  following two key steps: (1) apply the recent advances in consistent
  variable selection methods such as SCAD, adaptive LASSO and MCP to detect
  change points; (2) employ a refine procedure to improve the accuracy of
  change point estimation. Five algorithms are hence proposed, which can
  detect change points with much less time and more accuracy compared to
  those in literature. In addition, an optimal segmentation algorithm based
  on residual sum of squares is given. Our simulation study shows that the
  proposed algorithms are computationally efficient with improved change
  point estimation accuracy. The new approach is readily generalized to
  detect multiple change points in other models such as generalized linear
  models and nonparametric models.

  <no-indent>KEY WORDS: Adaptive LASSO; Asymptotic normality; Least squares;
  Linear model; MCP; Multiple change point detection algorithm; SCAD;
  Variable selection.

  <reset-counter|equation><vspace|.3in><no-indent><with|font-series|bold|1.
  Introduction>

  <no-indent>The most popular statistical model used in practice is a linear
  model, which has been extensively studied in the literature. This model is
  simple and can be used to approximate a nonlinear function locally. However
  there may be change points in a linear model such that the regression
  parameters may change at these points. Thus if there do exist change points
  in a linear model, the linear model is actually a segmented linear model.

  A change point problem occurs in many statistical applications in the areas
  including medical and health sciences, life science, meteorology,
  engineering, financial econometrics and risk management. To detect all
  change points are of great importance in statistical applications. If there
  exists a change point, it is harmful to make a statistical analysis without
  any consideration of the existence of this change point and the results
  derived from such an analysis may be misleading. There are rich literatures
  on change point detection, see, e.g., Csrg and Horvth (1997) and Chen
  and Gupta (2000).

  Compared with the detection of one change point, to locate all change
  points is a very challenge problem. Although, it has been studied in
  literature (see Davis, Lee, and Rodriguez-Yam (2006), Pan and Chen (2006),
  and Kim, Yu and Feuer (2009), and Loschi, Pontel and Cruz (2010) among
  others), a powerful and efficient method still needs to be explored. Thus
  this paper is mainly concerned with the multiple change point detection
  problem in linear regression.

  Consider a linear model with <math|K<rsub|0>\<leqslant\>K<rsub|U>\<less\>\<infty\>>
  multiple change points located at <math|a<rsup|<around|(|0|)>><rsub|1,n>,\<ldots\>,a<rsup|<around|(|0|)>><rsub|K<rsub|0>,n>>:

  <eqnarray|<tformat|<table|<row|<cell|y<rsub|i,n>>|<cell|=>|<cell|<big|sum><rsub|j=1><rsup|q>x<rsub|i,j,n>*\<beta\><rsub|j,0>+<big|sum><rsub|\<ell\>=1><rsup|K<rsub|0>><big|sum><rsub|j=1><rsup|q>x<rsub|i,j,n>*\<delta\><rsub|j,0><rsup|<around|(|\<ell\>|)>>*I*<around|(|a<rsup|<around|(|0|)>><rsub|\<ell\>,n>\<less\>i\<leqslant\>n|)>+\<varepsilon\><rsub|i,n>>>|<row|<cell|>|<cell|=>|<cell|<mx><rsub|i,n><rsup|T><around*|[|<mbeta><rsub|0>+<big|sum><rsub|\<ell\>=1><rsup|K<rsub|0>><mdelta><rsub|\<ell\>,0>I*<around|(|a<rsup|<around|(|0|)>><rsub|\<ell\>,n>\<less\>i\<leqslant\>n|)>|]>+\<varepsilon\><rsub|i,n>,<space|1em>i=1,\<ldots\>,n,<eq-number><label|cp>>>>>>

  where <math|<around|{|<mx><rsub|i,n>=<around|(|x<rsub|i,1,n>,\<ldots\>,x<rsub|i,q,n>|)><rsup|T>|}>>
  is a sequence of <math|q>-dimensional predictors,
  <math|<mbeta><rsub|0>=<around|(|\<beta\><rsub|1,0>,\<ldots\>,\<beta\><rsub|q,0>|)><rsup|T>>
  <math|\<neq\>> <math|<with|font-series|bold|0>> is unknown
  <math|q>-dimensional vector of regression coefficients, <math|K<rsub|0>> is
  unknown number of change points, <math|a<rsup|<around|(|0|)>><rsub|1,n>>,
  <math|\<ldots\>>, and <math|a<rsup|<around|(|0|)>><rsub|K<rsub|0>,n>> are
  unknown change point locations (or change points),
  <math|<mdelta><rsub|\<ell\>,0>>, <math|1\<leqslant\>\<ell\>\<leqslant\>K<rsub|0>>,
  denote unknown amounts of changes in regression coefficient vectors at
  change points, and <math|\<varepsilon\><rsub|1,n>,\<ldots\>,\<varepsilon\><rsub|n,n>>
  are random errors. In this paper, we assume that <math|K<rsub|U>> is an
  upper bound of <math|K<rsub|0>>. Set <math|a<rsup|<around|(|0|)>><rsub|K<rsub|0>+1,n>=n>.
  If there is no change point, <math|K<rsub|0>=0> and the model
  (<reference|cp>) becomes

  <\equation*>
    y<rsub|i,n>=<big|sum><rsub|j=1><rsup|q>x<rsub|i,j,n>*\<beta\><rsub|j,0>+\<varepsilon\><rsub|i,n>,<space|1em>i=1,\<ldots\>,n.
  </equation*>

  Otherwise, <math|K<rsub|0>\<geqslant\>1>, and we assume that

  <\equation>
    0\<less\>a<rsup|<around|(|0|)>><rsub|\<ell\>,n>/n\<rightarrow\>\<tau\><rsub|\<ell\>>\<less\>1,<space|1em><text|for
    >1\<leqslant\>\<ell\>\<leqslant\>K<rsub|0>.<label|cdcp1>
  </equation>

  If <math|K<rsub|0>\<geqslant\>2>, we assume that

  <\equation>
    min<rsub|1\<leqslant\>\<ell\>\<leqslant\>K<rsub|0>-1><around|(|\<tau\><rsub|\<ell\>+1>-\<tau\><rsub|\<ell\>>|)>\<gtr\>0<label|cdcp2>
  </equation>

  is unknown. The problem studied in this paper is to estimate
  <math|K<rsub|0>>, <math|a<rsup|<around|(|0|)>><rsub|1,n>>,
  <math|\<ldots\>>, and <math|a<rsup|<around|(|0|)>><rsub|K<rsub|0>,n>> or in
  other words to detect multiple change points. If there is no confusion, the
  superscript \P(0)\Q, subscript \P0\Q, and subscript <math|n> will be
  suppressed.

  For detecting multiple change points, it may be convenient to consider the
  following linear model with probable multiple change points located at
  <math|1\<less\>a<rsub|1,n>\<less\>\<cdots\>\<less\>a<rsub|K,n>\<less\>n>

  <eqnarray|<tformat|<table|<row|<cell|y<rsub|i>>|<cell|=>|<cell|<mx><rsub|i><rsup|T><around*|[|<mbeta>+<big|sum><rsub|\<ell\>=1><rsup|K><mdelta><rsub|\<ell\>>I*<around|(|a<rsub|\<ell\>,n>\<less\>i\<leqslant\>n|)>|]>+\<varepsilon\><rsub|i>,<space|1em>i=1,\<ldots\>,n,<eq-number><label|cp1>>>>>>

  where <math|<mbeta>>, <math|<mdelta><rsub|1>>, <math|\<ldots\>>,
  <math|<mdelta><rsub|K>> are unknown <math|q>-dimensional parameter vectors.
  We can instead test the following null hypothesis:

  <eqnarray*|<tformat|<table|<row|<cell|H<rsub|0>:>|<cell|>|<cell|<text|There
  is no change point, i.e., for any >1\<less\>a<rsub|1,n>\<less\>\<cdots\>\<less\>a<rsub|K,n>\<less\>n,>>|<row|<cell|>|<cell|>|<cell|<mdelta><rsub|\<ell\>>=<around|(|\<delta\><rsub|1><rsup|<around|(|\<ell\>|)>>*<text|,
  <math|\<ldots\>>, >\<delta\><rsub|q><rsup|<around|(|\<ell\>|)>>|)><rsup|T>=<text|
  <math|<with|font-series|bold|0>> for any
  >\<ell\>\<in\><around|{|1,\<ldots\>,K|}>,<text|where
  >1\<leqslant\>K\<leqslant\>K<rsub|U>>>>>>

  versus the alternative hypothesis:

  <\eqnarray*>
    <\tformat>
      <\table|<row|<cell|H<rsub|1>:>|<cell|>|<cell|<text|There exist
      <math|1\<leqslant\>K\<leqslant\>K<rsub|U>> change points, i.e., there
      exist >1\<less\>a<rsub|1,n>\<less\>\<cdots\>\<less\>a<rsub|K,n>\<less\>n>>>
        <\row|<cell|>|<cell|>>
          <\cell>
            <text|such that ><mdelta><rsub|\<ell\>>=<around|(|\<delta\><rsub|1><rsup|<around|(|\<ell\>|)>>,\<ldots\>,\<delta\><rsub|q><rsup|<around|(|\<ell\>|)>>|)><rsup|T>\<neq\><mzero>

            <text|for any <math|\<ell\>\<in\><around|{|1,\<ldots\>,K|}>>.>
          </cell>
        </row>
      </table>
    </tformat>
  </eqnarray*>

  Many classical methods have been given in literature for detecting change
  points, which include the popular model selection based change point
  detection method and the well known cumulative sum (CUSUM) method. However
  the amounts of computing time required by these two typical change point
  detection methods are respectively <math|O<around|(|2<rsup|n>|)>> and
  <math|O<around|(|n<rsup|2>|)>>. When <math|n> is very large, using these
  methods to find multiple change points seems not feasible.

  If the set of all true change points in the model (<reference|cp1>) is a
  subset of <math|<around|{|a<rsub|\<ell\>,n>,1\<leqslant\>\<ell\>\<leqslant\>K|}>>,
  it is easy to see that <math|a<rsub|j,n>> is a change point if and only if
  <math|<mdelta><rsub|j>\<neq\><with|font-series|bold|0>>. We rewrite
  (<reference|cp1>) as follows:

  <eqnarray|<tformat|<table|<row|<cell|<my><rsub|n>=<mX><rsub|n><wide|<mbeta>|~>+<me><rsub|n>,<eq-number><label|reg0>>>>>>

  where <math|<my>=<around|(|y<rsub|1>,y<rsub|2>,\<cdots\>,y<rsub|n>|)><rsup|T>>,
  <math|<wide|<mbeta>|~>=<around|(|<mbeta><rsup|T>,<mdelta><rsub|1><rsup|T>,\<ldots\>,<mdelta><rsub|K><rsup|T>|)><rsup|T>>,
  <math|<me><rsub|n>=<around|(|\<varepsilon\><rsub|1>,\<varepsilon\><rsub|2>,\<ldots\>,\<varepsilon\><rsub|n>|)><rsup|T>>,
  and

  <eqnarray*|<tformat|<table|<row|<cell|<mX><rsub|n>>|<cell|=>|<cell|<around*|(|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|2|2|cell-halign|c>|<cwith|1|-1|3|3|cell-halign|c>|<cwith|1|-1|4|4|cell-halign|c>|<cwith|1|-1|5|5|cell-halign|c>|<cwith|1|-1|5|5|cell-rborder|0ln>|<table|<row|<cell|X<rsub|<around|(|0,1|)>>>|<cell|0<rsub|<around|(|0,1|)>>>|<cell|0<rsub|<around|(|0,1|)>>>|<cell|\<cdots\>>|<cell|0<rsub|<around|(|0,1|)>>>>|<row|<cell|X<rsub|<around|(|1,2|)>>>|<cell|X<rsub|<around|(|1,2|)>>>|<cell|0<rsub|<around|(|1,2|)>>>|<cell|\<cdots\>>|<cell|0<rsub|<around|(|1,2|)>>>>|<row|<cell|\<vdots\>>|<cell|\<vdots\>>|<cell|\<ldots\>>|<cell|\<ldots\>>|<cell|\<vdots\>>>|<row|<cell|X<rsub|<around|(|K,K+1|)>>>|<cell|X<rsub|<around|(|K,K+1|)>>>|<cell|X<rsub|<around|(|K,K+1|)>>>|<cell|\<cdots\>>|<cell|X<rsub|<around|(|K,K+1|)>>>>>>>|)><rsub|n\<times\><around|(|K+1|)>*q>>>>>>

  with <math|0<rsub|<around|(|j-1,j|)>>> is a zero matrix of dimension
  <math|<around|(|a<rsub|j,n>-a<rsub|j-1,n>|)>\<times\>q>, and
  <math|a<rsub|0,n>=0>,

  <eqnarray|<tformat|<table|<row|<cell|X<rsub|<around|(|j-1,j|)>>=<around*|(|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|2|2|cell-halign|c>|<cwith|1|-1|3|3|cell-halign|c>|<cwith|1|-1|3|3|cell-rborder|0ln>|<table|<row|<cell|x<rsub|a<rsub|j-1,n>+1,1>>|<cell|\<cdots\>>|<cell|x<rsub|a<rsub|j-1,n>+1,q>>>|<row|<cell|\<vdots\>>|<cell|\<cdots\>>|<cell|\<vdots\>>>|<row|<cell|x<rsub|a<rsub|j,n>,1>>|<cell|\<cdots\>>|<cell|x<rsub|a<rsub|j,n>,q>>>>>>|)><rsub|<around|(|a<rsub|j,n>-a<rsub|j-1,n>|)>\<times\>q><space|1em><text|for
  <math|j=1,\<ldots\>,K+1>.>>>>>>

  Thus to detect all the true change points and remove the pseudo change
  points in (<reference|cp1>) can be considered as a variable selection
  problem for the linear regression model (<reference|reg0>), and we may
  tackle the problem by employing variable selection methods. This leads us
  to explore a possibility by first properly segmenting data sequence and
  then applying variable selection methods and/or other methods for detecting
  probable multiple change points.

  The paper is arranged as follows. The segmentation of data sequence and
  multiple change point estimation are discussed in Section 2. Five
  algorithms for detecting probable multiple change points are proposed in
  Section 3. Simulation studies and practical recommendations are given in
  Section 4. Two real data examples are provided in Section 5.

  Throughout the rest of the paper, <math|<with|font-series|bold|1><rsub|q>=<around|(|1,\<ldots\>,1|)><rsup|T>>
  is the <math|q>-dimensional vector, <math|I<rsub|q>> is the
  <math|q\<times\>q> identity matrix, an indicator function is written as
  <math|I<around|(|\<cdummy\>|)>>, the transpose of a matrix <math|A> is
  denoted by <math|A<rsup|T>>, and <math|<around|\<lfloor\>|c|\<rfloor\>>> is
  the integer part of a real number <math|c>. For a vector <math|<ba>>,
  <math|<ba><rsup|T>> is its transpose, <math|<ba><around|(|j|)>> is its
  <math|j>th component, <math|<around|\||<ba>|\|>>,
  <math|<around|\<\|\|\>|<ba>|\<\|\|\>>> and
  <math|<around|\<\|\|\>|<ba>|\<\|\|\>><rsub|\<infty\>>> are respectively its
  <math|L<rsub|1>>-norm, <math|L<rsub|2>>-norm (Euclidean norm) and
  <math|L<rsub|\<infty\>>> norm. If <math|<ma>> is a set, its complement and
  its size are denoted by <math|<wide|<ma>|\<bar\>>> and
  <math|<around|\||<ma>|\|>>, respectively. In addition, the notations
  \P<math|\<rightarrow\><rsub|p>>\Q and \P<math|\<rightarrow\><rsub|d>>\Q
  denote convergence in probability and convergence in distribution,
  respectively. Furthermore, the <math|<around|(|1-\<alpha\>|)>>th quantile
  of the chi-square distribution with <math|\<ell\>> degrees of freedom is
  denoted by <math|\<chi\><rsup|2><rsub|\<alpha\>,\<ell\>>>.

  <vspace|.2in><no-indent><with|font-series|bold|2. Segmentation and Change
  Point Estimation>

  <no-indent>For a multiple change point detection problem, the multiple
  change point locations are unknown and in practice their approximate
  locations within a permissible range is main concern, which inspires us to
  partition the data sequence to search for change points. We thus divide the
  data sequence into <math|p<rsub|n>+1> segments. Let
  <math|m=m<rsub|n>=<around|\<lfloor\>|n/<around|(|p<rsub|n>+1|)>|\<rfloor\>>>.
  The segmentation is such that the first segment has length
  <math|0\<less\>m\<leqslant\>n-p<rsub|n>*m\<leqslant\>c<rsub|0>*m> with some
  <math|c<rsub|0>\<geqslant\>1> and each of the rest <math|p<rsub|n>>
  segments has length <math|m>. Without loss of generality, we assume that
  <math|p<rsub|n>\<to\>\<infty\>> as <math|n\<to\>\<infty\>>. The partition
  of the data sequence yields the following segmented regression model:

  <eqnarray|<tformat|<table|<row|<cell|y<rsub|i>>|<cell|=>|<cell|<mx><rsub|i><rsup|T><around*|[|<mbeta>+<big|sum><rsub|\<ell\>=1><rsup|p<rsub|n>><with|font-series|right|<tformat|<table|<row|<cell|<around*|\<nobracket\>|<rsub|\<ell\>>I*<around*|(|n-<around|(|p<rsub|n>-\<ell\>+1|)>*m\<less\>i\<leqslant\>n|)>|\<nobracket\>>>|<cell|>|<cell|>>|<row|<cell|>|<cell|+>|<cell|<around*|\<nobracket\>|<big|sum><rsub|\<ell\>=1><rsup|p<rsub|n>><bomega><rsub|\<ell\>><around|(|i|)>*I*<around*|(|n-<around|(|p<rsub|n>-\<ell\>+1|)>*m\<less\>i\<leqslant\>n-<around|(|p<rsub|n>-\<ell\>|)>*m|)>|]>+\<varepsilon\><rsub|i>,<space|1em>i=1,\<ldots\>,n,<label|newcp>>>>>>|\<nobracket\>>>>>>>

  where two sets <math|{<with|font-series|right|<rsub|1>,\<ldots\>,<rsub|p<rsub|n>>}>>
  and <math|<around|{|<mzero>,<mdelta><rsub|1>,\<ldots\>,<mdelta><rsub|K<rsub|0>>|}>>
  are equal, and <math|<around|{|<bomega><rsub|\<ell\>>|}>> are defines as
  follows: if there is a change point located in
  <math|<around|{|n-<around|(|p<rsub|n>-\<ell\>+1|)>*m+1,\<ldots\>,n-<around|(|p<rsub|n>-\<ell\>|)>*m-1|}>>,
  say <math|a<rsub|k,n>>, then

  <\equation*>
    <bomega><rsub|\<ell\>><around|(|i|)>=<around*|{|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|l>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|2|2|cell-halign|l>|<cwith|1|-1|2|2|cell-rborder|0ln>|<table|<row|<cell|-<mdelta><rsub|k>,>|<cell|n-<around|(|p<rsub|n>-\<ell\>+1|)>*m\<less\>i\<leqslant\>a<rsub|k,n>\<less\>n-<around|(|p<rsub|n>-\<ell\>|)>*m,>>|<row|<cell|<mzero>,>|<cell|<text|elsewhere>;>>>>>|\<nobracket\>>
  </equation*>

  otherwise,

  <\equation*>
    <bomega><rsub|\<ell\>><around|(|i|)>=<mzero>,<space|1em>i=1,\<ldots\>,n.
  </equation*>

  The model (<reference|newcp>) can be written as

  <eqnarray|<tformat|<table|<row|<cell|<my><rsub|n>=<wide|X|~><rsub|n><mtheta><rsub|n>+X<rsub|\<omega\>>*<big|sum><rsub|\<ell\>=1><rsup|p<rsub|n>><wide|<bomega>|\<vect\>><rsub|\<ell\>>+<me><rsub|n>,<eq-number><label|reg>>>>>>

  where <math|<my><rsub|n>> and <math|<me><rsub|n>> are defined in Section 1,
  <math|<mtheta><rsub|n>=<around|(|\<theta\><rsub|1>,\<ldots\>,\<theta\><rsub|q*<around|(|p<rsub|n>+1|)>>|)><rsup|T>=(<mbeta><rsup|T>,<with|font-series|right|<rsub|1><rsup|T>,\<ldots\>,<rsub|p<rsub|n>><rsup|T>)<rsup|T>>>,
  <math|<with|font-series|right|<rsub|r>=<around|(|d<rsub|r*1>,\<ldots\>,d<rsub|r*q>|)><rsup|T>>>,
  <math|r=1,\<ldots\>,p<rsub|n>>,

  <eqnarray|<tformat|<table|<row|<cell|<wide|X|~><rsub|n>>|<cell|=>|<cell|<around*|(|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|2|2|cell-halign|c>|<cwith|1|-1|3|3|cell-halign|c>|<cwith|1|-1|4|4|cell-halign|c>|<cwith|1|-1|5|5|cell-halign|c>|<cwith|1|-1|5|5|cell-rborder|0ln>|<table|<row|<cell|X<rsub|<around|(|1|)>>>|<cell|0<rsub|m\<times\>q>>|<cell|0<rsub|m\<times\>q>>|<cell|\<cdots\>>|<cell|0<rsub|m\<times\>q>>>|<row|<cell|X<rsub|<around|(|2|)>>>|<cell|X<rsub|<around|(|2|)>>>|<cell|0<rsub|m\<times\>q>>|<cell|\<cdots\>>|<cell|0<rsub|m\<times\>q>>>|<row|<cell|\<vdots\>>|<cell|\<vdots\>>|<cell|\<ldots\>>|<cell|\<ldots\>>|<cell|\<vdots\>>>|<row|<cell|X<rsub|<around|(|p<rsub|n>+1|)>>>|<cell|X<rsub|<around|(|p<rsub|n>+1|)>>>|<cell|X<rsub|<around|(|p<rsub|n>+1|)>>>|<cell|\<cdots\>>|<cell|X<rsub|<around|(|p<rsub|n>+1|)>>>>>>>|)><rsub|n\<times\><around|(|p<rsub|n>+1|)>*q>=<around|(|X<rsub|n><rsup|<around|(|1|)>>,\<ldots\>,X<rsub|n><rsup|<around|(|p<rsub|n>+1|)>>|)><eq-number><label|xx>>>>>>

  with <math|X<rsub|n><rsup|<around|(|j|)>>=<around|(|<mzero><rsub|q\<times\>m>,\<ldots\>,<mzero><rsub|q\<times\>m>,X<rsub|<around|(|j|)>><rsup|T>,\<ldots\>,X<rsub|<around|(|p<rsub|n>+1|)>><rsup|T>|)><rsup|T>>,

  <eqnarray*|<tformat|<table|<row|<cell|<mX><rsub|<around|(|1|)>>>|<cell|=>|<cell|<around*|(|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|2|2|cell-halign|c>|<cwith|1|-1|3|3|cell-halign|c>|<cwith|1|-1|3|3|cell-rborder|0ln>|<table|<row|<cell|x<rsub|1,1>>|<cell|\<cdots\>>|<cell|x<rsub|1,q>>>|<row|<cell|\<vdots\>>|<cell|\<cdots\>>|<cell|\<vdots\>>>|<row|<cell|x<rsub|n-p<rsub|n>*m,1>>|<cell|\<cdots\>>|<cell|x<rsub|n-p<rsub|n>*m,q>>>>>>|)><rsub|<around|(|n-p<rsub|n>*m|)>\<times\>q>,>>>>>

  <eqnarray|<tformat|<table|<row|<cell|<mX><rsub|<around|(|j|)>>>|<cell|=>|<cell|<around*|(|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|2|2|cell-halign|c>|<cwith|1|-1|3|3|cell-halign|c>|<cwith|1|-1|3|3|cell-rborder|0ln>|<table|<row|<cell|x<rsub|n-<around|(|p<rsub|n>-j+2|)>*m+1,1>>|<cell|\<cdots\>>|<cell|x<rsub|n-<around|(|p<rsub|n>-j+2|)>*m+1,q>>>|<row|<cell|\<vdots\>>|<cell|\<cdots\>>|<cell|\<vdots\>>>|<row|<cell|x<rsub|n-<around|(|p<rsub|n>-j+1|)>*m,1>>|<cell|\<cdots\>>|<cell|x<rsub|n-<around|(|p<rsub|n>-j+1|)>*m,q>>>>>>|)><rsub|m\<times\>q>,<space|1em><text|for
  <math|j=2,\<ldots\>,p<rsub|n>+1>,>>>>>>

  <math|X<rsub|\<omega\>>=<text|diag><around|(|<mx><rsub|1><rsup|T>,\<ldots\>,<mx><rsub|n><rsup|T>|)>>,
  and <math|<wide|<bomega>|\<vect\>><rsub|\<ell\>>=<around|(|<bomega><rsub|\<ell\>><rsup|T><around|(|1|)>,\<ldots\>,<bomega><rsub|\<ell\>><rsup|T><around|(|n|)>|)><rsup|T>>.
  It is easy to see that <math|<mx><rsub|\<omega\>>\<equiv\>X<rsub|\<omega\>>*<big|sum><rsub|\<ell\>=1><rsup|p<rsub|n>><wide|<bomega>|\<vect\>><rsub|\<ell\>>>
  is an <math|n> dimensional vector and all its elements excluding at most
  <math|K<rsub|0>*<around|(|m-1|)>> of them are zeros. It is noted that in
  Harchaoui and Levy-Leduc (2008), the mean-shift model is considered and the
  length of each of their segments is only 1.

  Consider a special case that each true change point is at an end of a
  segment. Then an end of a segment is a true change point if and only if the
  corresponding <math|<with|font-series|right|<rsub|r>\<neq\>><with|font-series|bold|0>>.
  Thus to locate all the true change points in (<reference|cp>) is equivalent
  to carry out variable selection. Since <math|p<rsub|n>\<to\>\<infty\>>, we
  may take advantage of the recent advances in consistent variable selection
  methods for a linear regression model as (<reference|reg>) with a large
  number of regression coefficients, which include the SCAD (Fan and Li
  (2001)), the adaptive LASSO (Zhou (2006)), and the MCP (Zhang (2010)) among
  others.

  Let us examine the relationship between the models (<reference|cp>) and
  (<reference|reg>). It can be seen that under the null hypothesis
  <math|H<rsub|0>>, <math|<mbeta>=<mbeta><rsub|0>>, and
  <math|<with|font-series|right|<rsub|r>=<mzero>>>,
  <math|r\<in\><around|{|1,\<cdots\>,p<rsub|n>|}>>. We now assume that
  <math|H<rsub|1>> hold. Thus, there exist <math|{r<rsub|k>>,

  k=1,\<cdots\>,K<rsub|0>} such that <math|a<rsub|k,n>\<in\><around|{|n-p<rsub|n>*m+<around|(|r<rsub|k>-1|)>*m,\<ldots\>,n-p<rsub|n>*m+r<rsub|k>*m-1|}>>.
  Since <math|K<rsub|0>> is finite with an upper bound <math|K<rsub|U>>, in
  view of (<reference|cdcp1>) and (<reference|cdcp2>), it follows that

  <\equation>
    <mbeta>=<mbeta><rsub|0>,<space|1em><with|font-series|right|<rsub|r<rsub|k>-1>=<mzero>,<space|1em><rsub|r<rsub|k>>=<mdelta><rsub|k>\<neq\><mzero>,<space|1em><text|and
    ><rsub|r<rsub|k>+1>=<mzero><label|hh1>>
  </equation>

  for large <math|n>. Thus in order to detect all the change points
  <math|<around|{|a<rsub|1,n>,\<ldots\>,a<rsub|K<rsub|0>,n>|}>>, we may
  estimate <math|{<with|font-series|right|<rsub|i>}>> in advance.

  The following assumptions are made for investigating the asymptotic
  properties of the estimates of <math|{<with|font-series|right|<rsub|i>}>>:

  <no-indent><with|font-series|bold|Assumption
  C1.><space|1em><math|<big|sum><rsub|i=s><rsup|t><mx><rsub|i><mx><rsub|i><rsup|T>/<around|(|t-s|)>\<rightarrow\>W\<gtr\>0>
  as <math|t-s\<rightarrow\>\<infty\>>.

  It is noted that Assumption C1 is a common assumption made in change point
  analysis for a mean shift model. Under Assumption C1, it can be shown that
  <math|<mX><rsub|<around|(|1|)>><rsup|T><mX><rsub|<around|(|1|)>>/<around|(|n-p<rsub|n>*m|)>\<rightarrow\>W\<gtr\>0>,
  and <math|<mX><rsub|<around|(|i|)>><rsup|T><mX><rsub|<around|(|i|)>>/m\<rightarrow\>W\<gtr\>0>
  for <math|i\<in\><around|{|2,\<ldots\>,p<rsub|n>+1|}>>.

  <em|Remark 1.> <space|1em>Assumption C1 is similar to Condition (b) in Zhou
  (2006). If we only consider the consistency of change point estimators,
  Assumption C1 can be relaxed to the following weaker one: For
  <math|b<rsub|1>,b<rsub|2>\<gtr\>0>, <math|b<rsub|1>*I<rsub|q>\<leqslant\><big|sum><rsub|i=s><rsup|t><mx><rsub|i><mx><rsub|i><rsup|T>/<around|(|t-s|)>\<leqslant\>b<rsub|2>*I<rsub|q>>
  when <math|t-s> is large enough.

  <no-indent><with|font-series|bold|Assumption
  C2.><space|1em><math|<around|{|\<varepsilon\><rsub|i>,<space|1em>i=1,2,\<ldots\>|}>>
  is a sequence of independently and identically distributed (i.i.d.) random
  variables with mean 0 and variance <math|\<sigma\><rsup|2>>.

  <em|Remark 2.> <space|1em>This assumption can be replaced by a weaker
  assumption of the strong mixing condition in (2.1) in Kuelbs and Philipp
  (1980), which adapts to the autoregressive models in Davis, Huang and Yao
  (1995) and Wang, Li and Tsai (2007). Let
  <math|<around|{|\<varepsilon\><rsub|i>,i=1,2,\<ldots\>|}>> be a weak sense
  stationary sequence of random variables with mean 0 and
  <math|<around|(|2+\<delta\>|)>>th moments for
  <math|0\<less\>\<delta\>\<leqslant\>1> that are uniformly bounded by some
  positive constant. Suppose that <math|<around|{|\<varepsilon\><rsub|i>,i=1,2,\<ldots\>|}>>
  satisfies the strong mixing condition <math|<around|\||P*<around|(|A*B|)>-P<around|(|A|)>*P<around|(|B|)>|\|>\<leqslant\>\<rho\><around|(|n|)>*\<downarrow\>*0>
  for all <math|n>, <math|s\<geqslant\>1>, all
  <math|A\<in\><mmm><rsub|1><rsup|s>> and
  <math|B\<in\><mmm><rsub|s+n><rsup|\<infty\>>>, where
  <math|<mmm><rsub|a><rsup|b>> is the <math|\<sigma\>>-field generated by the
  random vectors <math|\<varepsilon\><rsub|a>,\<varepsilon\><rsub|a+1>,\<cdots\>,\<varepsilon\><rsub|b>>,
  and <math|\<rho\><around|(|n|)>\<less\>\<less\>n<rsup|-<around|(|1+t|)>*<around|(|1+2/\<delta\>|)>>>
  for some <math|t\<gtr\>0>. Then Theorem 4 and Lemma 3.4 in Kuelbs and
  Philipp (1980) warrant the same results as given in Theorems 1-3 below.

  For simple presentation below, we assume that each of
  <math|<around|{|<mX><rsub|<around|(|r|)>>|}>> is of full rank in this
  paper. If a <math|<mX><rsub|<around|(|r|)>>> is not of full rank,
  Moore-Penrose matrix inverse can be used instead of the matrix inverse.

  <vspace|.2in><no-indent><with|font-family|ss|2.1. Estimate
  <math|{<with|font-series|right|<rsub|i>}>> by least squares>

  <no-indent>By least squares method, we estimate
  <math|<with|font-series|right|<rsub|r>>>, <math|r=1,\<ldots\>,p<rsub|n>>,
  as follows:

  <\equation>
    <wide||^><rsub|r>=<around*|(|<mX><rsup|T><rsub|<around|(|r+1|)>><mX><rsub|<around|(|r+1|)>>|)><rsup|-1><mX><rsup|T><rsub|<around|(|r+1|)>><my><rsup|<around|(|r+1|)>>-<around*|(|<mX><rsup|T><rsub|<around|(|r|)>><mX><rsub|<around|(|r|)>>|)><rsup|-1><mX><rsup|T><rsub|<around|(|r|)>><my><rsup|<around|(|r|)>>,<space|1em>r=1,\<ldots\>,p<rsub|n>,<label|ddd>
  </equation>

  where <math|<my><rsup|<around|(|1|)>>=<around|(|y<rsub|1>,\<ldots\>,y<rsub|n-p<rsub|n>*m>|)><rsup|T>>,
  and <math|<my><rsup|<around|(|r|)>>=<around|(|y<rsub|n-<around|(|p<rsub|n>-r+2|)>*m+1>,\<ldots\>,y<rsub|n-<around|(|p<rsub|n>-r+1|)>*m>|)><rsup|T>>,
  <math|r=2>, <math|\<ldots\>>, <math|p<rsub|n>+1>. It is easy to see that

  <\equation*>
    <wide||^><rsub|r>+<wide||^><rsub|r+1>=<around*|(|<mX><rsup|T><rsub|<around|(|r+2|)>><mX><rsub|<around|(|r+2|)>>|)><rsup|-1><mX><rsup|T><rsub|<around|(|r+2|)>><my><rsup|<around|(|r+2|)>>-<around*|(|<mX><rsup|T><rsub|<around|(|r|)>><mX><rsub|<around|(|r|)>>|)><rsup|-1><mX><rsup|T><rsub|<around|(|r|)>><my><rsup|<around|(|r|)>>.
  </equation*>

  It is obvious that under <math|H<rsub|0>>, for any
  <math|\<ell\>\<in\><around|{|1,\<ldots\>,p<rsub|n>|}>> and any
  <math|i\<in\><around|{|n-p<rsub|n>*m+1,\<ldots\>,n|}>>,

  <\equation*>
    <bomega><rsub|\<ell\>><around|(|i|)>=<mzero><nbsp><nbsp><text|and><nbsp><nbsp><with|font-series|right|<rsub|\<ell\>>=<mzero>.>
  </equation*>

  We have the following theorem.

  <em|Theorem 1.> Assume that <math|m\<to\>\<infty\>> as
  <math|n\<to\>\<infty\>>. If <math|H<rsub|0>> holds, under the assumptions
  C1-C2, it follows that

  <\equation*>
    <sqrt|m>*<wide||^><rsub|i>\<rightarrow\><rsub|d>N*<around*|(|<mzero>,2*\<sigma\><rsup|2>*W<rsup|-1>|)>,<space|1em>i=1,\<ldots\>,p<rsub|n><text|.>
  </equation*>

  We now assume that <math|H<rsub|1>> holds. In view of (<reference|hh1>), it
  follows that <math|<with|font-series|right|<rsub|r<rsub|k>>+<rsub|r<rsub|k>+1>=<mdelta><rsub|k>>>.
  By the definition of <math|<around|{|<bomega><rsub|\<ell\>><around|(|i|)>|}>>,
  we have

  <eqnarray|<tformat|<table|<row|<cell|>|<cell|>|<cell|<big|sum><rsub|\<ell\>=1><rsup|p<rsub|n>><bomega><rsub|\<ell\>><around|(|i|)>*I*<around|(|n-<around|(|p<rsub|n>-\<ell\>+1|)>*m\<less\>i\<leqslant\>n-<around|(|p<rsub|n>-\<ell\>|)>*m|)>>>|<row|<cell|>|<cell|=>|<cell|<around*|{|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|l>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|2|2|cell-halign|l>|<cwith|1|-1|2|2|cell-rborder|0ln>|<table|<row|<cell|-<mdelta><rsub|k>,>|<cell|<space|.2in><text|if
  <math|\<exists\><space|0.22em>r<rsub|k>> such that
  >n-<around|(|p<rsub|n>-r<rsub|k>+1|)>*m\<less\>a<rsub|k,n>\<less\>n-<around|(|p<rsub|n>-r<rsub|k>|)>*m,>>|<row|<cell|>|<cell|>>|<row|<cell|<mzero>,>|<cell|<space|.2in><text|otherwise.>>>>>>|\<nobracket\>><eq-number><label|omega>>>>>>

  It can also be verified that

  <eqnarray|<tformat|<table|<row|<cell|>|<cell|>|<cell|<big|sum><rsub|\<ell\>=1><rsup|p<rsub|n>><with|font-series|right|<tformat|<table|<row|<cell|<rsub|\<ell\>>I*<around|(|n-<around|(|p<rsub|n>-\<ell\>+1|)>*m\<less\>i\<leqslant\>n|)>>|<cell|>|<cell|>>|<row|<cell|>|<cell|=>|<cell|<around*|{|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|l>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|2|2|cell-halign|l>|<cwith|1|-1|2|2|cell-rborder|0ln>|<table|<row|<cell|<big|sum><rsub|\<ell\>=1><rsup|r<rsub|k>-1><rsub|\<ell\>>,>|<cell|<space|.2in><text|if
  >n-<around|(|p<rsub|n>-r<rsub|k>+2|)>*m\<less\>i\<leqslant\>n-<around|(|p<rsub|n>-r<rsub|k>+1|)>*m,>>|<row|<cell|>|<cell|<label|sumd>>>|<row|<cell|<big|sum><rsub|\<ell\>=1><rsup|r<rsub|k>+1><rsub|\<ell\>>,>|<cell|<space|.2in><text|if
  >n-<around|(|p<rsub|n>-r<rsub|k>|)>*m\<less\>i\<leqslant\>n-<around|(|p<rsub|n>-r<rsub|k>-1|)>*m.>>>>>|\<nobracket\>>>>>>>>>>>>

  Thus, we have the following theorem:

  <em|Theorem 2.> If Assumptions C1-C2 hold, then under <math|H<rsub|1>>,

  <\equation*>
    <sqrt|m>*<around*|(|<wide||^><rsub|r<rsub|k>>+<wide||^><rsub|r<rsub|k>+1>-<mdelta><rsub|k>|)>\<rightarrow\><rsub|d>N*<around*|(|<mzero>,2*\<sigma\><rsup|2>*W<rsup|-1>|)>,<space|1em>k=1,\<ldots\>,K<rsub|0>.
  </equation*>

  The proofs of Theorems 1-2 follow from the least squares theory. The
  details are omitted.

  <vspace|.2in><no-indent><with|font-family|ss|2.2. Estimate
  <math|{<with|font-series|right|<rsub|i>}>> by recent advances in consistent
  variable selection methods>

  <vspace|.2in><no-indent><em|2.2.1. Estimate
  <math|{<with|font-series|right|<rsub|i>}>> by the adaptive LASSO>

  <no-indent>The adaptive LASSO, extending the LASSO in Tibshirani (1996),
  was proposed in Zhou (2006) and possesses oracle properties for fixed
  number of regression coefficients.

  In light of Zhou (2006), the adaptive LASSO type estimator of
  <math|<mtheta><rsub|n>> for the model (<reference|reg>) is defined by

  <eqnarray|<tformat|<table|<row|<cell|<wide|<mtheta>|\<breve\>><rsub|n>=arg
  min<rsub|<mtheta><rsub|n>><around*|{|<around*|\||<around*|\||<my>-<mX><rsub|n><mtheta><rsub|n>|\|>|\|><rsup|2>+\<lambda\><rsub|n><big|sum><rsub|r=1><rsup|p<rsub|n>><frac|1|<around|\||<wide||~><rsub|r>|\|><rsup|\<nu\>>><around*|\||<with|font-series|right|<around*|\<nobracket\>|<around*|\<nobracket\>|<rsub|r>|\|>|}>,>|\<nobracket\>>|\<nobracket\>><eq-number><label|gl>>>>>>

  where <math|\<nu\>\<gtr\>0>, <math|\<lambda\><rsub|n>> is a thresholding
  parameter and <math|<wide||~><rsub|r>> <math|<around|{|r=1,\<cdots\>,p<rsub|n>|}>>
  are initial estimators satisfying certain conditions.

  <em|Remark 3.> The adaptive LASSO estimate of <math|<mtheta><rsub|n>> may
  also be defined by

  <eqnarray|<tformat|<table|<row|<cell|<wide|<mtheta>|\<check\>><rsub|n>=arg
  min<rsub|<mtheta><rsub|n>><around*|\||<around*|\||<my>-<mX><rsub|n><mtheta><rsub|n>|\|>|\|><rsup|2>+\<lambda\><rsub|n>*<big|sum><rsub|r=1><rsup|p<rsub|n>><big|sum><rsub|i=1><rsup|q><frac|1|<around|\||<wide|d|~><rsub|r*i>|\|><rsup|\<nu\>>><around*|\||d<rsub|r*i>|\|>+\<gamma\><rsub|n>*<big|sum><rsub|i=1><rsup|q><frac|1|<around|\||<wide|\<beta\>|~><rsub|0*i>|\|><rsup|\<nu\>>><around*|\||\<beta\><rsub|0*i>|\|>,<eq-number><label|al>>>>>>

  where <math|\<mu\>\<gtr\>0>, <math|\<lambda\><rsub|n>> and
  <math|\<gamma\><rsub|n>> are thresholding parameters satisfying certain
  conditions. The difference between (<reference|gl>) and (<reference|al>) is
  that the variable selection in addition to the multiple change point
  detection is also considered in (<reference|al>). Due to the similarity in
  the techniques for finding the asymptotic behavior of both
  <math|<wide|<mtheta>|\<breve\>><rsub|n>> and
  <math|<wide|<mtheta>|\<check\>><rsub|n>>, we only consider
  <math|<wide|<mtheta>|\<breve\>><rsub|n>> in this paper for simple
  presentation.

  Since the dimension of <math|<mtheta><rsub|n>> increases with <math|n> in
  (<reference|reg>), the asymptotic results in Zhou (2006) are not applicable
  here. In the following we will investigate the limiting behavior of those
  <math|<with|font-series|right|<rsub|i>>>s associated with change points
  under the condition that <math|K<rsub|0>\<geqslant\>1>, i.e., there exists
  at least one change point in the model (<reference|cp>). As stated before,
  the subscript <math|n> may be suppressed for convenience if there is no
  confusion.

  Before we proceed, we define some notations as follows: Let
  <math|<mB>=<around|{|\<kappa\><rsub|1>,\<kappa\><rsub|2>,\<ldots\>,\<kappa\><rsub|\<iota\>>|}>\<subset\><around|{|2,\<ldots\>,p<rsub|n>+1|}>>
  such that <math|\<kappa\><rsub|1>\<less\>\<ldots\>\<less\>\<kappa\><rsub|\<iota\>>>.
  Denote <math|<mtheta><rsub|<mB>>=(<with|font-series|right|<rsup|T><rsub|\<kappa\><rsub|1>>,\<cdots\>,<rsup|T><rsub|\<kappa\><rsub|\<iota\>>>)<rsup|T>>>,
  <math|<mXB>=<around|(|X<rsub|n><rsup|<around|(|\<kappa\><rsub|1>|)>>,\<ldots\>,X<rsub|n><rsup|<around|(|\<kappa\><rsub|\<iota\>>|)>>|)>>,
  where <math|<around|{|X<rsub|n><rsup|<around|(|i|)>>|}>> are given in
  (<reference|xx>).

  Recall that for each <math|<mdelta><rsub|k>> in (<reference|cp>), there
  exists <math|r<rsub|k>> such that <math|<with|font-series|right|<rsub|r<rsub|k>>=<mdelta><rsub|k>>>,
  or equivalently there exists a change point within
  <math|<around|{|n-<around|(|p<rsub|n>-r<rsub|k>+1|)>*m,\<ldots\>,n-<around|(|p<rsub|n>-r<rsub|k>|)>*m-1|}>>
  for <math|k=1,\<ldots\>,K<rsub|0>>. Define

  <eqnarray|<tformat|<table|<row|<cell|>|<\cell>
    <ma><rsub|c>={i:

    <\with|font-series|right>
      <rsub|i-1>=<mzero>,<rsub|i>\<neq\><mzero>,

      <\tformat>
        <\table>
          <\row|<cell|<rsub|i+1>=<mzero>},>>
            <\cell>
              <ma><rsub|1>={i:<rsub|i-1>\<neq\><mzero>,

              <rsub|i>=<mzero>,

              <\tformat>
                <\table|<row|<cell|<rsub|i+1>=<mzero>},>|<cell|>>>
                  <\row|<cell|>>
                    <\cell>
                    <ma><rsub|2>={i:<rsub|i-1>=<mzero>,<rsub|i>=<mzero>,

                    <tformat|<table|<row|<cell|<rsub|i+1>\<neq\><mzero>},>|<\cell>
                    <ma><rsub|3>=

                    <\around|{>
                    i:<rsub|i-1>=<mzero>,

                    <rsub|i>=<mzero>,<rsub|i+1>=<mzero>
                    </around|}>

                    .
                    </cell>>>>
                    </cell>
                  </row>
                </table>
              </tformat>
            </cell>
          </row>
        </table>
      </tformat>
    </with>
  </cell>>>>>

  It is easy to see that for large <math|n>,
  <math|<wide|<ma>|\<bar\>><rsub|c>=<ma><rsub|1>\<cup\><ma><rsub|2>\<cup\><ma><rsub|3>>.

  In view of Zhou (2006) and Huang, Ma and Zhang (2008), we need to make some
  assumption on the initial estimators <math|<around|{|<wide||~><rsub|i>|}>>
  used in (<reference|gl>) for investigating the asymptotic properties of
  <math|<wide|<mtheta>|\<breve\>><rsub|n>>. By the remark 1 of Zhou (2006),
  one might assume that for any <math|i>, there is a sequence of
  <math|<around|{|a<rsub|n>|}>> such that
  <math|a<rsub|n>\<rightarrow\>\<infty\>> and
  <math|a<rsub|n>(<wide||~><rsub|i>-<with|font-series|right|<rsub|i>)=O<rsub|p><around|(|1|)>>>.
  But <math|p<rsub|n>> is fixed in Zhou (2006). Huang, Ma and Zhang (2008)
  allows <math|p<rsub|n>\<to\>\<infty\>> as <math|n\<to\>\<infty\>>. Thus a
  stronger assumption like that <math|r<rsub|n>*max<rsub|i>\|<wide||~><rsub|i>-<with|font-series|right|<rsub|i>\|=O<rsub|p><around|(|1|)>>>
  as <math|r<rsub|n>\<rightarrow\>\<infty\>> (see (A2) of Huang, Ma and Zhang
  2008) might be made. However such assumptions may not be enough for the
  multiple change point detection problem. A careful study shows that we need
  put some lower bound on <math|<around|\||<wide||~><rsub|i>|\|>> for
  <math|i\<in\><ma><rsub|c>> such that they are not close to <math|0>. Hence
  we make the following assumption on <math|<around|{|<wide||~><rsub|r>|}>>:

  <no-indent><with|font-series|bold|Assumption C3.><space|1em>There exists a
  constant <math|a\<gtr\>0> such that for large <math|n>,

  <eqnarray|<tformat|<table|<row|<cell|<around|\||<wide||~><rsub|i>|\|><choice|<tformat|<table|<row|<cell|\<geqslant\>a\<gtr\>0,>|<cell|<text|for
  <math|i\<in\><ma><rsub|c>>,>>>|<row|<cell|=O<rsub|p>*<around*|(|1/<sqrt|m><space|0.27em>|)>,>|<cell|<text|for
  <math|i\<nin\><ma><rsub|c>>.>>>>>>>>>>>

  To obtain <math|<around|{|<wide||~><rsub|r>|}>> in practice, we can
  estimate the set <math|<ma><rsub|c>> first, which, for example, may be
  estimated by the lease squares based multiple change point detection
  algorithm given in Subsection 3.1. After we obtain the estimate
  <math|<wide|<ma>|^><rsub|c>> of <math|<ma><rsub|c>>, we can set
  <math|<wide||~><rsub|i>=c> for <math|i\<in\><wide|<ma>|^><rsub|c>>, and
  <math|<with|font-series|bold|1><rsub|q>/<sqrt|m>> otherwise.

  To study the asymptotic behavior of <math|<wide|<mtheta>|\<breve\>>>, the
  following three Lemmas are necessary.

  <em|Lemma 1.> Under Assumption C1, there exists positive definite matrix
  <math|\<cal-W\><rsub|<ma><rsub|c>>> (defined in (<reference|wc>) in the
  appendix) such that <math|X<rsub|<ma><rsub|c>><rsup|T>*X<rsub|<ma><rsub|c>>/n\<rightarrow\>\<cal-W\><rsub|<ma><rsub|c>>>.

  <em|Remark 4.> One can not replace <math|X<rsub|<ma><rsub|c>><rsup|T>*X<rsub|<ma><rsub|c>>>
  by <math|<wide|X|~><rsub|n><rsup|T>*<wide|X|~><rsub|n>> above since the
  minimum eigenvalue may converge to 0 in consideration of the fact that
  <math|p<rsub|n>\<to\>\<infty\>> (see Condition (b) in Zou (2006) and (2.13)
  in Zhang and Huang (2008)). Thus if they allow
  <math|p<rsub|n>\<to\>\<infty\>>, their conditions no longer hold and may be
  strengthened as Assumption C1.

  <em|Lemma 2.> Under Assumption C1, for large <math|n> elements of
  <math|<wide|X|~><rsup|T><rsub|n><mx><rsub|\<omega\>>/m> are uniformly
  bounded.

  <em|Lemma 3.> Under Assumptions C1-C2, for large <math|n> elements of
  <math|<wide|X|~><rsub|n><rsup|T><me><rsub|n>/<sqrt|n>> is uniformly bounded
  in probability.

  If there exists at least one change point, i.e.,
  <math|K<rsub|0>\<geqslant\>1>, the limiting behavior of the adaptive LASSO
  estimator <math|<wide|<mtheta>|\<breve\>><rsub|n>> is given in the
  following theorem.

  <em|Theorem 3.> Assume that <math|\<lambda\><rsub|n>/<sqrt|n>\<rightarrow\>0>,
  <math|m/<sqrt|n>\<rightarrow\>0> and <math|\<lambda\><rsub|n>*<around|(|n/p<rsub|n>|)><rsup|\<nu\>/2>/<sqrt|n>\<rightarrow\>\<infty\>>
  for <math|\<nu\>\<gtr\>0> as <math|n\<to\>\<infty\>>. If Assumptions C1-C3
  hold, then

  <\equation*>
    <sqrt|n>*<around|(|<wide|<mtheta>|\<breve\>><rsub|<ma><rsub|c>>-<around|(|<mtheta><rsub|n>|)><rsub|<ma><rsub|c>>|)>\<rightarrow\><rsub|d>N*<around|(|<mzero>,\<sigma\><rsup|2>*\<cal-W\><rsub|<ma><rsub|c>><rsup|-1>|)>.
  </equation*>

  <em|Remark 5.> <space|1em>If we replace the weight
  <math|1/<around|\||x|\|><rsup|\<nu\>>> by <math|exp (-1/<around|\||x|\|>)>
  in (<reference|gl>), the condition <math|\<lambda\><rsub|n>*<around|(|n/p<rsub|n>|)><rsup|\<nu\>/2>/<sqrt|n>\<rightarrow\>\<infty\>>
  can be relaxed to the weaker condition: <math|\<lambda\><rsub|n>*exp
  <around*|(|<sqrt|n/p<rsub|n>><space|0.27em>|)>/<sqrt|n>\<to\>\<infty\>>.
  Although it may result in an absorbing state in <math|x=0> (see Fan and Lv
  (2008)), it has not occurred in simulations.

  <em|Remark 6.> <space|1em>By (<reference|gl>),
  <math|<wide|<mtheta>|\<breve\>>> is a unique solution of a convex
  optimization problem and hence the Karush-Kunh-Tucker condition holds. For
  any vector <math|<mb>=<around|(|b<rsub|1>,\<ldots\>,b<rsub|p>|)><rsup|T>>,
  denote its sign vector by sgn<math|<around|(|<mb>|)>=<around|(|<text|sgn><around|(|b<rsub|1>|)>,\<ldots\>,<text|sgn><around|(|b<rsub|p>|)>|)><rsup|T>>,
  with the convention sgn<math|<around|(|0|)>=0>. As in Zhao and Yu (2006),
  we say that <math|<wide|<mtheta>|\<breve\>><rsub|n>=<rsub|s><mtheta>> if
  and only if sgn<math|<around|(|<wide|<mtheta>|\<breve\>><rsub|n>|)>=<text|sgn><around|(|<mtheta>|)>>.
  If the condition <math|p<rsub|n>/n<rsup|\<nu\>/<around|(|2+\<nu\>|)>>=o<around|(|1|)>>
  is further assumed to hold, by Lemma 1-3 and Theorem 3, it can be shown
  that

  <\equation*>
    P*<around|(|<wide|<mtheta>|\<breve\>><rsub|n>=<rsub|s><mtheta>|)>\<to\>1,<space|1em><text|as
    <math|n\<to\>\<infty\>>.>
  </equation*>

  The proof is similar to the proof of Theorem 1 in Huang, Ma and Zhang
  (2008) and hence omitted.

  <vspace|.2in><no-indent><em|2.2.2. Estimate
  <math|{<with|font-series|right|<rsub|i>}>> by the SCAD or MCP>

  <no-indent>SCAD (Fan and Li (2001)) and MCP (Zhang (2010)) are two popular
  recent consistent variable selection methods. They can also be employed to
  solve the multiple change point detection problem.

  Consider the following estimator of <math|<mtheta><rsub|n>>:

  <eqnarray|<tformat|<table|<row|<cell|<wide|<mtheta>|^><rsup|P>=arg
  min<rsub|<mtheta>><around*|{|<around*|\||<around*|\||<my>-<mX><rsub|n><mtheta>|\|>|\|><rsup|2>+n<big|sum><rsub|r=1><rsup|p<rsub|n>>p<rsub|\<lambda\>,\<gamma\>>(<around*|\||<with|font-series|right|<around*|\<nobracket\>|<around*|\<nobracket\>|<rsub|r>|\|>)|}>,>|\<nobracket\>>|\<nobracket\>>>>>>>

  where <math|p<rsub|\<lambda\>,\<gamma\>>> is the penalty function with
  tuning parameters <math|\<lambda\>\<gtr\>0> and <math|\<gamma\>\<gtr\>0>.
  If

  <eqnarray|<tformat|<table|<row|<cell|p<rsub|\<lambda\>,\<gamma\>><around|(|x|)>=<around*|{|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|l>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|2|2|cell-halign|l>|<cwith|1|-1|2|2|cell-rborder|0ln>|<table|<row|<cell|\<lambda\>*x,>|<cell|<text|if><nbsp>x\<leqslant\>\<lambda\>,>>|<row|<cell|<frac|\<gamma\>*\<lambda\>*x-0.5*<around|(|x<rsup|2>+\<lambda\><rsup|2>|)>|\<gamma\>-1>,>|<cell|<text|if><nbsp>\<lambda\>\<less\>x\<leqslant\>\<gamma\>*\<lambda\>,>>|<row|<cell|<frac|\<lambda\><rsup|2>*<around|(|\<gamma\>+1|)>|2>,>|<cell|<text|if><nbsp>x\<gtr\>\<gamma\>*\<lambda\>,>>>>>|\<nobracket\>><eq-number><label|scad1>>>>>>

  the SCAD penalty function proposed by Fan and Li (2001),
  <math|<wide|<mtheta>|^><rsup|P>> is the SCAD type estimator of
  <math|<mtheta><rsub|n>>. Denote it by <math|<wide|<mtheta>|^><rsup|<SCAD>>>.
  Instead, let

  <eqnarray|<tformat|<table|<row|<cell|p<rsub|\<lambda\>,\<gamma\>><around|(|x|)>=<around*|{|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|l>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|2|2|cell-halign|l>|<cwith|1|-1|2|2|cell-rborder|0ln>|<table|<row|<cell|\<lambda\>*x-<frac|x<rsup|2>|2*\<gamma\>>,>|<cell|<text|if><nbsp>x\<leqslant\>\<gamma\>*\<lambda\>,>>|<row|<cell|<frac|1|2>*\<gamma\>*\<lambda\><rsup|2>,>|<cell|<text|if><nbsp>x\<gtr\>\<gamma\>*\<lambda\>,>>>>>|\<nobracket\>><eq-number><label|mcp1>>>>>>

  the MCP penalty function proposed by Zhang (2010),
  <math|<wide|<mtheta>|^><rsup|P>> becomes the MCP type estimator of
  <math|<mtheta><rsub|n>>. Denote it by <math|<wide|<mtheta>|^><rsup|<MCP>>>.

  Under certain conditions, the asymptotic properties of both
  <math|<wide|<mtheta>|^><rsup|<SCAD>>> and
  <math|<wide|<mtheta>|^><rsup|<MCP>>> are similar to the asymptotic
  properties of <math|<wide|<mtheta>|\<breve\>>>. Since the emphasis of this
  paper is on the algorithms for detecting multiple change points, their
  asymptotic properties will not be discussed here.

  <vspace|.2in><no-indent><with|font-series|bold|3. Multiple change points
  detection algorithms>

  <no-indent>For a given <math|p<rsub|n>> or <math|m>, we divide the data
  sequence into <math|p<rsub|n>+1> segments such that the first segment has
  the length between <math|m> and <math|c<rsub|0>*m> with
  <math|c<rsub|0>\<geqslant\>1> and the rest <math|p<rsub|n>> segments are
  all of length <math|m>, and we have the model (<reference|newcp>). Define

  <\equation>
    <wide|\<sigma\>|^><rsub|n><rsup|2>=<big|sum><rsub|\<ell\>=1><rsup|n-p<rsub|n>*m><around|(|y<rsub|\<ell\>>-<mx><rsub|\<ell\>><rsup|T><wide|<mbeta>|^>|)><rsup|2>/<around|(|n-p<rsub|n>*m-q|)><label|sigma>
  </equation>

  with <math|<wide|<mbeta>|^>=<around|(|<mX><rsup|T><rsub|<around|(|1|)>><mX><rsub|<around|(|1|)>>|)><rsup|-1><mX><rsup|T><rsub|<around|(|1|)>><my><rsup|<around|(|1|)>>>.
  Given a significance level <math|\<alpha\>>, five multiple change point
  detection algorithms are proposed in this section.

  <vspace|.2in><no-indent><with|font-family|ss|3.1 Least squares based
  multiple change point detection algorithm>

  <no-indent>In light of Theorems 1-2, the least squares based multiple
  change point detection algorithm is given as follows:

  <no><em|L>east <em|s>quares based <em|m>ultiple <em|c>hange <em|p>oints
  <em|d>etection <em|a>lgorithm (LSMCPDA):

  <no-indent><em|Step 1.> Set <math|i=1>, <math|j=1> and <math|<wide|K|^>=0>.

  <no-indent><em|Step 2.> If <math|i\<geqslant\>p<rsub|n>-3>, go to Step 3.
  Otherwise, we test the hypothesis <math|H<rsub|0,i>:<space|0.27em><with|font-series|right|<rsub|i>=<mzero>>>
  by checking if

  <\equation*>
    <wide||^><rsup|T><rsub|i><mX><rsub|<around|(|i+1|)>><rsup|T><mX><rsub|<around|(|i+1|)>><wide||^><rsub|i>/<around|(|2*q*<wide|\<sigma\>|^><rsub|n><rsup|2>|)>\<geqslant\>\<chi\><rsup|2><rsub|\<alpha\>,q>,
  </equation*>

  where <math|<wide||^><rsub|i>> is given in (<reference|ddd>). If the test
  is significant, set <math|i=i+1> and repeat Step 2, otherwise we test the
  hypothesis <math|H<rsub|0,<around|(|i+1,i+2|)>>:<space|0.27em><with|font-series|right|<rsub|i+1>+<rsub|i+2>=<mzero>>>
  by checking if

  <\equation*>
    <around*|(|<wide||^><rsub|i+1>+<wide||^><rsub|i+2>|)><rsup|T><mX><rsub|<around|(|i+1|)>><rsup|T><mX><rsub|<around|(|i+1|)>><around*|(|<wide||^><rsub|i+1>+<wide||^><rsub|i+2>|)>/<around|(|2*q*<wide|\<sigma\>|^><rsub|n><rsup|2>|)>\<geqslant\>\<chi\><rsup|2><rsub|\<alpha\>,2*q>.
  </equation*>

  If the test is not significant, set <math|i=i+1> and repeat Step 2,
  otherwise, a change point estimate is <math|n-p<rsub|n>*m+i*m>. Set
  <math|<wide|r|^><rsub|j>=n-p<rsub|n>*m+i*m>, <math|j=j+1>, <math|i=i+2>,
  and <math|<wide|K|^>=<wide|K|^>+1>. Then repeat Step 2.

  <no-indent><em|Step 3.> If <math|<wide|K|^>=0>, then go to the next step.
  Otherwise, we use the CUSUM to improve the accuracy of the multiple change
  point detection as follows: We search for the change points within the
  <math|<wide|K|^>> sets: <math|<around*|{|<around|{|n-p<rsub|n>*m+<around|(|<wide|r|^><rsub|j>-1|)>*m,\<ldots\>,n-p<rsub|n>*m+<around|(|<wide|r|^><rsub|j>+1|)>*m|}>,j=1,\<ldots\>,<wide|K|^>|}>>
  by the CUSUM. An estimate of the change point within the <math|j>th set is
  given by

  <\equation*>
    <wide|a|^><rsub|j,n>=arg max<rsub|\<ell\>><around*|[|min<rsub|<mbeta>>
    <big|sum><rsub|j=n-p<rsub|n>*m+<around|(|<wide|r|^><rsub|j>-1|)>*m><rsup|\<ell\>><around|(|y<rsub|j>-<mx><rsub|j><rsup|T><mbeta>|)><rsup|2>+min<rsub|<mbeta>>
    <big|sum><rsub|j=\<ell\>+1><rsup|n-p<rsub|n>*m+<around|(|<wide|r|^><rsub|j>+1|)>*m><around|(|y<rsub|j>-<mx><rsub|j><rsup|T><mbeta>|)><rsup|2>|]>.
  </equation*>

  <no-indent><em|Step 4.> If <math|<wide|K|^>=0>, there is no change points.
  Otherwise, there are <math|<wide|K|^>> change points and they are
  <math|<wide|a|^><rsub|1,n>,\<ldots\>,<wide|a|^><rsub|<wide|K|^>,n>>.

  If in the algorithm above, the chi-square tests in Step 2 are replaced by
  the CUSUM tests (see Appendix A.1) and Step 3 is replaced by Steps 3-5 of
  the SMCPDA with <math|<around|{|<wide|r|^><rsup|<SCAD>><rsub|j>|}>>,
  <math|<around|{|<wide||^><rsub|j><rsup|<SCAD>>|}>>,
  <math|<wide|K|^><rsup|<SCAD>>> and <math|<around|{|<wide|a|^><rsup|<SCAD>><rsub|j,n>|}>>
  replaced by <math|<around|{|<wide|r|^><rsub|j>|}>>,
  <math|<around|{|<wide||^><rsub|j>|}>>, <math|<wide|K|^>>, and
  <math|<around|{|<wide|a|^><rsub|j,n>|}>> respectively, the new algorithm is
  named as CLSMCPDA, where \P<math|C>\Q is the first letter of
  \P<em|C>USUM\Q.

  <vspace|.2in><no-indent><with|font-family|ss|3.2 Adaptive LASSO based
  multiple change poins detection algorithm>

  <no-indent>In light of Theorems 3, the adaptive LASSO based multiple change
  point detection algorithm is given as follows:

  <no><em|A>daptvie <em|L>asso based <em|m>ultiple <em|c>hange <em|p>oints
  <em|d>etection <em|a>lgorithm (ALMCPDA):

  <no-indent><em|Step 1.> Set <math|i=1>, <math|j=1> and
  <math|<wide|K|\<breve\>>=0>. Execute the algorithm LSMCPDA and obtain
  <math|<wide|K|^>>. If <math|<wide|K|^>\<gtr\>0>, we also obtain
  <math|<wide|a|^><rsub|1,n>,\<ldots\>,<wide|a|^><rsub|<wide|K|^>,n>>.

  <vspace|.2in><no-indent><em|Step 2.> If <math|<wide|K|^>=0>, set
  <math|<wide||~><rsub|1>=\<cdots\>=<wide||~><rsub|p<rsub|n>>=<with|font-series|bold|1><rsub|q>/<sqrt|m>>,
  otherwise, set

  <eqnarray|<tformat|<table|<row|<cell|<wide||~><rsub|\<ell\>>=<choice|<tformat|<table|<row|<cell|c*<with|font-series|bold|1><rsub|q>,>|<cell|\<ell\>\<in\><around|{|r<rsub|k>,r<rsub|k>*m\<less\><wide|a|^><rsub|k,n>-n+p<rsub|n>*m\<leqslant\><around|(|r<rsub|k>+1|)>*m|}>,>>|<row|<cell|<with|font-series|bold|1><rsub|q>/<sqrt|m>,>|<cell|<text|elsewhere>;>>>>>>>>>>

  where <math|r<rsub|k>> is an integer such that
  <math|r<rsub|k>*m\<less\><wide|a|^><rsub|k,n>-n+p<rsub|n>*m\<leqslant\><around|(|r<rsub|k>+1|)>*m>
  and <math|c> is a prechosen constant. Select <math|\<lambda\>\<gtr\>0> and
  <math|\<nu\>\<gtr\>0>. Find the adaptive LASSO estimate
  <math|<wide|<mtheta>|\<breve\>>> of <math|<mtheta>> via

  <eqnarray|<tformat|<table|<row|<cell|<wide|<mtheta>|\<breve\>>=arg
  min<rsub|<mtheta>><around*|{|<around*|\||<around*|\||<my>-<mX><rsub|n><mtheta>|\|>|\|><rsup|2>+\<lambda\><big|sum><rsub|r=1><rsup|p<rsub|n>><frac|1|<around|\||<wide||~><rsub|r>|\|><rsup|\<nu\>>><around*|\||<with|font-series|right|<around*|\<nobracket\>|<around*|\<nobracket\>|<rsub|r>|\|>|}>,>|\<nobracket\>>|\<nobracket\>>>>>>>

  and we obtain <math|<wide||\<breve\>><rsub|\<ell\>>> for
  <math|1\<leqslant\>\<ell\>\<leqslant\>p<rsub|n>>.

  <vspace|.2in><no-indent><em|Step 3.> We compute
  <math|z<rsub|\<ell\>>=<around|\|||\|><wide||\<breve\>><rsub|\<ell\>><around|\|||\|><rsub|\<infty\>>>
  for <math|1\<leqslant\>\<ell\>\<leqslant\>p<rsub|n>>. If
  <math|z<rsub|1>=z<rsub|2>=\<cdots\>=z<rsub|p<rsub|n>>=0>, go to Step 5.
  Otherwise, we treat <math|<around|{|z<rsub|\<ell\>>|}>> as random variables
  from the model <math|<mz>=<bmu>+<mee>> with
  <math|<bmu>=<around|(|\<mu\><rsub|1>,\<ldots\>,\<mu\><rsub|p<rsub|n>>|)><rsup|T>>
  and <math|<mee>\<sim\>N<around|(|<with|font-series|bold|0>,I<rsub|p<rsub|n>>|)>>.
  Use LASSO, SCAD or MCP among other recent advances in variable selection to
  perform variable selection based on <math|<around|{|z<rsub|\<ell\>>|}>>. We
  obtain the estimates <math|<around|{|<wide|\<mu\>|~><rsub|\<ell\>>|}>>. If
  <math|<wide|\<mu\>|~><rsub|\<ell\>>>, <math|1\<leqslant\>\<ell\>\<leqslant\>p<rsub|n>>,
  are all zeros, set <math|<wide|K|\<breve\>>=0> and go to Step 6. Otherwise,
  let <math|<mI>> be the subset of <math|<around|{|1,\<ldots\>,p<rsub|n>|}>>
  such that <math|\<ell\>\<in\><mI>> if and only if
  <math|<wide|\<mu\>|~><rsub|\<ell\>>\<neq\>0>. Write
  <math|<mI>=<around|{|s<rsub|1>,\<ldots\>,s<rsub|<around|\||<mI>|\|>>|}>>
  such that <math|s<rsub|1>\<less\>\<ldots\>\<less\>s<rsub|<around|\||<mI>|\|>>>.

  <no-indent><em|Step 4.> If <math|i\<gtr\><around|\||<mI>|\|>>, go to Step
  5. Otherwise, we test the hypothesis <math|H<rsub|0,s<rsub|i>>:<space|0.27em><with|font-series|right|<rsub|s<rsub|i>>=<mzero>>>
  by checking if

  <\equation*>
    <around|(|p<rsub|n>-s<rsub|i>|)>*<wide||\<breve\>><rsup|T><rsub|s<rsub|i>><mX><rsub|<around|(|s<rsub|i>+1|)>><rsup|T><mX><rsub|<around|(|s<rsub|i>+1|)>><wide||\<breve\>><rsub|s<rsub|i>>/<around|(|q*<wide|\<sigma\>|^><rsub|n><rsup|2>|)>\<geqslant\>\<chi\><rsup|2><rsub|\<alpha\>,q>,
  </equation*>

  where <math|<wide|\<sigma\>|^><rsub|n><rsup|2>> is given in
  (<reference|sigma>). If the test is not significant, set <math|i=i+1> and
  repeat Step 4. Otherwise, a change point estimate is
  <math|n-p<rsub|n>*m+<around|(|s<rsub|i>-1|)>*m>. Set
  <math|<wide|r|\<breve\>><rsub|j>=n-p<rsub|n>*m+<around|(|s<rsub|i>-1|)>*m>,
  <math|j=j+1>, <math|i=i+2>, and <math|<wide|K|\<breve\>>=<wide|K|\<breve\>>+1>.
  Then repeat Step 4.

  <no-indent><em|Step 5.> If <math|<wide|K|\<breve\>>=0>, then go to the next
  step. Otherwise, we use the CUSUM to improve the accuracy of the multiple
  change point detection as follows: We search for the change points within
  the <math|<wide|K|\<breve\>>> sets: <math|<around*|{|<around|{|n-p<rsub|n>*m+<around|(|<wide|r|\<breve\>><rsub|j>-1|)>*m,\<ldots\>,n-p<rsub|n>*m+<around|(|<wide|r|\<breve\>><rsub|j>+1|)>*m|}>,j=1,\<ldots\>,<wide|K|\<breve\>>|}>>
  by the CUSUM. An estimate of the change point for the <math|j>th set is
  given by

  <\equation*>
    <wide|a|\<breve\>><rsub|j,n>=arg max<rsub|\<ell\>><around*|[|min<rsub|<mbeta>>
    <big|sum><rsub|j=n-p<rsub|n>*m+<around|(|<wide|r|\<breve\>><rsub|j>-1|)>*m><rsup|\<ell\>><around|(|y<rsub|j>-<mx><rsub|j><rsup|T><mbeta>|)><rsup|2>+min<rsub|<mbeta>>
    <big|sum><rsub|j=\<ell\>+1><rsup|n-p<rsub|n>*m+<around|(|<wide|r|\<breve\>><rsub|j>+1|)>*m><around|(|y<rsub|j>-<mx><rsub|j><rsup|T><mbeta>|)><rsup|2>|]>.
  </equation*>

  <no-indent><em|Step 6.> If <math|<wide|K|\<breve\>>=0>, there is no change
  points. Otherwise, there are <math|<wide|K|\<breve\>>> change points and
  they are <math|<wide|a|\<breve\>><rsub|1,n>,\<ldots\>,<wide|a|\<breve\>><rsub|<wide|K|\<breve\>>,n>>.

  If the algorithm above, the chi-square test is replaced by the CUSUM test
  in Step 4, the new algorithm is named as CALMCPDA, where \P<math|C>\Q is
  also the first letter of \P<em|C>USUM\Q. Denote all the estimates based on
  CALMCPDA by adding a superscript \PC\Q to the corresponding estimates based
  on ALMCPDA. For example, the estimate of <math|K<rsub|0>> based on CALMCPDA
  is denoted by <math|<wide|K|\<breve\>><rsup|C>>.

  <vspace|.2in><no-indent><with|font-family|ss|3.3 SCAD based multiple change
  points detection algorithm>

  <no-indent>Similar to the ALMCPDA, the SCAD based multiple change point
  detection algorithm is given as follows:

  <no><em|S>CAD based <em|m>ultiple <em|c>hange <em|p>oints <em|d>etection
  <em|a>lgorithm (SMCPDA):

  <no-indent><em|Step 1.> Set <math|i=1>, <math|j=1> and
  <math|<wide|K|^><rsup|<SCAD>>=0>.

  <vspace|.2in><no-indent><em|Step 2.> Select <math|\<lambda\>\<gtr\>0> and
  <math|\<gamma\>\<gtr\>0>. Find the SCAD estimate
  <math|<wide|<mtheta>|^><rsup|<SCAD>>=<around*|(|<around*|(|<wide|<mbeta>|^><rsup|<SCAD>>|)><rsup|T>|\<nobracket\>>>,
  <math|<around*|(|<wide||^><rsub|1><rsup|<SCAD>>|)><rsup|T>>,
  <math|\<ldots\>>, <math|<around*|\<nobracket\>|<around*|(|<wide||^><rsub|p<rsub|n>><rsup|<SCAD>>|)><rsup|T>|)><rsup|T>>
  of <math|<mtheta>> via

  <eqnarray|<tformat|<table|<row|<cell|<wide|<mtheta>|^><rsup|<SCAD>>=arg
  min<rsub|<mtheta>><around*|{|<around*|\||<around*|\||<my>-<mX><rsub|n><mtheta>|\|>|\|><rsup|2>+n<big|sum><rsub|r=1><rsup|p<rsub|n>>p<rsub|\<lambda\>,\<gamma\>>(<around*|\||<with|font-series|right|<around*|\<nobracket\>|<around*|\<nobracket\>|<rsub|r>|\|>)|}>,>|\<nobracket\>>|\<nobracket\>>>>>>>

  where <math|p<rsub|\<lambda\>,\<gamma\>>> is given in (<reference|scad1>)
  and we obtain <math|<wide||^><rsub|\<ell\>><rsup|<SCAD>>> for
  <math|1\<leqslant\>\<ell\>\<leqslant\>p<rsub|n>>.

  <vspace|.2in><no-indent><em|Step 3.> It is same as Step 3 of ALMCPDA with
  <math|z<rsub|\<ell\>>=<around|\|||\|><wide||\<breve\>><rsub|\<ell\>><around|\|||\|><rsub|\<infty\>>>
  is replaced by <math|z<rsub|\<ell\>>=<around*|\||<wide||^><rsub|\<ell\>><rsup|<SCAD>>|\|><rsub|\<infty\>>>
  for <math|1\<leqslant\>\<ell\>\<leqslant\>p<rsub|n>> and
  <math|<wide|K|\<breve\>>=0> is replaced by <math|<wide|K|^><rsup|<SCAD>>>.

  <no-indent><em|Step 4.> If <math|i\<gtr\><around|\||<mI>|\|>>, go to Step
  5. Otherwise, we test the hypothesis <math|H<rsub|0,s<rsub|i>>:<space|0.27em><with|font-series|right|<rsub|s<rsub|i>>=<mzero>>>
  by CUSUM. If the test is not significant, set <math|i=i+1> and repeat Step
  4. Otherwise, a change point estimate is
  <math|n-p<rsub|n>*m+<around|(|s<rsub|i>-1|)>*m>. Set
  <math|<wide|r|^><rsup|<SCAD>><rsub|j>=n-p<rsub|n>*m+<around|(|s<rsub|i>-1|)>*m>,
  <math|j=j+1>, <math|i=i+2>, and <math|<wide|K|^><rsup|<SCAD>>=<wide|K|^><rsup|<SCAD>>+1>.
  Then repeat Step 4.

  <no-indent><em|Step 5.> If <math|<wide|K|^><rsup|<SCAD>>=0>, then go to the
  next step. Otherwise, we use the CUSUM to improve the accuracy of the
  multiple change point detection as follows: We search for the change points
  within the <math|<wide|K|^><rsup|<SCAD>>> sets:
  <math|<around*|{|<around|{|n-p<rsub|n>*m+<around|(|<wide|r|^><rsup|<SCAD>><rsub|j>-1|)>*m,\<ldots\>,n-p<rsub|n>*m+<around|(|<wide|r|^><rsup|<SCAD>><rsub|j>+1|)>*m|}>,j=1,\<ldots\>,<wide|K|^><rsup|<SCAD>>|}>>
  by the CUSUM. An estimate of the change point for the <math|j>th set is
  given by

  <\equation*>
    <wide|a|^><rsup|<SCAD>><rsub|j,n>=arg
    max<rsub|\<ell\>><around*|[|min<rsub|<mbeta>>
    <big|sum><rsub|j=n-p<rsub|n>*m+<around|(|<wide|r|^><rsup|<SCAD>><rsub|j>-1|)>*m><rsup|\<ell\>><around|(|y<rsub|j>-<mx><rsub|j><rsup|T><mbeta>|)><rsup|2>+min<rsub|<mbeta>>
    <big|sum><rsub|j=\<ell\>+1><rsup|n-p<rsub|n>*m+<around|(|<wide|r|^><rsup|<SCAD>><rsub|j>+1|)>*m><around|(|y<rsub|j>-<mx><rsub|j><rsup|T><mbeta>|)><rsup|2>|]>.
  </equation*>

  <no-indent><em|Step 6.> If <math|<wide|K|^><rsup|<SCAD>>=0>, there is no
  change points. Otherwise, there are <math|<wide|K|^><rsup|<SCAD>>> change
  points and they are <math|<wide|a|^><rsup|<SCAD>><rsub|1,n>,\<ldots\>,<wide|a|^><rsup|<SCAD>><rsub|<wide|K|^><rsup|<SCAD>>,n>>.

  <vspace|.2in><no-indent><with|font-family|ss|3.4 MCP based multiple change
  points detection algorithm>

  <no-indent>The differences between the SMCPDA and the MCP based multiple
  change point detection algorithm (MMCPDA) are as follows:

  <\enumerate>
    <item>The superscript \P<math|s*c*a*d>\Q in the SMCPDA is replaced by the
    superscript \P<math|m*c*p>\Q in the MMCPDA.

    <item>The step 2 in the SMCPDA is modified to the following step 2 in the
    MMCPDA:

    <no-indent><em|Step 2.> Select <math|\<lambda\>\<gtr\>0> and
    <math|\<gamma\>\<gtr\>0>. Find the MCP estimate
    <math|<wide|<mtheta>|^><rsup|<MCP>>=<around*|(|<around*|(|<wide|<mbeta>|^><rsup|<MCP>>|)><rsup|T>|\<nobracket\>>>,
    <math|<around*|(|<wide||^><rsub|1><rsup|<MCP>>|)><rsup|T>>,
    <math|\<ldots\>>, <math|<around*|\<nobracket\>|<around*|(|<wide||^><rsub|p<rsub|n>><rsup|<MCP>>|)><rsup|T>|)><rsup|T>>
    of <math|<mtheta>> via

    <eqnarray|<tformat|<table|<row|<cell|<wide|<mtheta>|^><rsup|<MCP>>=arg
    min<rsub|<mtheta>><around*|{|<around*|\||<around*|\||<my>-<mX><rsub|n><mtheta>|\|>|\|><rsup|2>+n<big|sum><rsub|r=1><rsup|p<rsub|n>>p<rsub|\<lambda\>,\<gamma\>>(<around*|\||<with|font-series|right|<around*|\<nobracket\>|<around*|\<nobracket\>|<rsub|r>|\|>)|}>,>|\<nobracket\>>|\<nobracket\>>>>>>>

    where <math|p<rsub|\<lambda\>,\<gamma\>>> is given in (<reference|mcp1>).
  </enumerate>

  <em|Remark 7.> <space|1em>The use of CUSUM in these algorithms is for
  improving the change point estimation accuracy. The amounts of computing
  time required by these algorithms are all
  <math|O<around|(|n|)>+O<around|(|m|)>>, where <math|O<around|(|m|)>>
  corresponds to the time required for using CUSUM method. If a segmentation
  satisfies that <math|m=o<around|(|n|)>>,
  <math|O<around|(|n|)>+O<around|(|m|)>=O<around|(|n|)>>, which is
  computationally more efficient than the existing multiple change point
  detection methods in literature.

  <vspace|.2in><no-indent><with|font-series|bold|4. Simulation study>

  <no-indent>In this section, we present simulation studies of multiple
  change point analysis. Since the time for finding the multiple change
  points in a large sample by the algorithms proposed in Section 3 is
  significantly reduced compared to the existing multiple change point
  detection methods in the literature, such comparison studies are omitted in
  this section. We will only compare the number of times of selecting the
  true number of change points and the accuracy of change point estimation by
  the algorithms proposed in Section 3 based on 1000 simulation. A Dell
  server (two E5520 Xeon Processors, two 2.26GHz 8M Caches, 16GB Memory) is
  used in the simulation.

  It is noted that the LARS algorithm (Efron, Hastie, Johnstone, and
  Tibshirani 2004) is used to compute <math|<wide|<mtheta>|\<breve\>><rsub|n>>
  defined in (<reference|gl>) with <math|\<nu\>=1> and an optimal
  <math|\<lambda\><rsub|n>> selected by the BIC. For applying LARS, the added
  penalty on <math|<mbeta>> is set as <math|1/<around|\||<with|font-series|bold|1><rsub|q>|\|>>,
  which will not affect the multiple change-point detection results as
  <math|<mbeta>\<neq\><mzero>>. The PLUS algorithm (Zhang, 2010) with the
  added penalty <math|n*p<rsub|\<lambda\>,\<gamma\>><around|(|<around*|\||<mbeta>|\|>|)>>
  on <math|<mbeta>> is used to compute <math|<wide|<mtheta>|^><rsub|n><rsup|<SCAD>>>
  defined in (<reference|scad1>) or <math|<wide|<mtheta>|^><rsub|n><rsup|<MCP>>>
  defined in (<reference|mcp1>), which also do not affect the multiple change
  point detection results as <math|<mbeta>\<neq\><mzero>>. Let
  <math|<wide|\<sigma\>|^><rsub|n><rsup|2>> be given in (<reference|sigma>).
  We use <math|\<lambda\>=<wide|\<sigma\>|^><rsub|n>*<sqrt|2*log
  p<rsub|n>/n>> in the PLUS algorithm as suggested in Zhang (2010). In all of
  our numerical examples, we set <math|\<gamma\>=3.7> for SCAD by following
  the recommendation of Fan and Li (2001), but set <math|\<gamma\>=2.4> for
  MCP based on some preliminary simulation studies. It is noted that in the
  step 3 of the algorithms ALMCPDA, CALMCPDA, SMCPDA, and MMCPDA, we use SCAD
  to perform variable selection for model <math|<mz>=<bmu>+<mee>> by applying
  the PLUS algorithm with <math|\<lambda\>=0.02>. To use such small
  <math|\<lambda\>> is for avoiding the possibility of overestimation of the
  number of multiple change points.

  Throughout this section, <math|\<alpha\>=0.05>.

  <vspace|.2in><no-indent><with|font-family|ss|4.1. The case that there is no
  change point in the data sequence of size 5000>

  <no-indent>In this subsection, we consider the case that there is no change
  point in the data sequence. We will examine the performance of the proposed
  algorithms to see if they do claim that there is no change point.

  Consider the following linear model

  <\equation*>
    y<rsub|i>=<mx><rsub|i><rsup|T><mbeta><rsub|0>+\<varepsilon\><rsub|i>,<space|1em>i=1,\<ldots\>,n,
  </equation*>

  where <math|<mbeta><rsub|0>> is a <math|q\<times\>1> parameter vector. Set
  <math|n=5000>, <math|q=3>, <math|<mbeta><rsub|0>=<around|(|1,1.4,0.7|)><rsup|T>>,
  and <math|x<rsub|i,1,n>=1> for <math|i=1,\<ldots\>,5000>. Generate
  <math|\<varepsilon\><rsub|i>>, <math|i=1,\<ldots\>,5000>, such that they
  are i.i.d. <math|N<around|(|0,1|)>> distributed, and generate two sequences
  <math|x<rsub|i,2,n>>, <math|1,\<ldots\>,n>, and <math|x<rsub|i,3,n>>,
  <math|1,\<ldots\>,5000>, such that they are i.i.d. <math|N<around|(|1,2|)>>
  distributed. For demonstration, a sample scatter plot of simulated data is
  given in Figure 1.

  <big-figure|<with|par-mode|center|<image|nocp.eps|1tex-text-width|||><label|fig1>>|There
  is no change point in the data sequence.>

  We compare the following five algorithms: LSMCPDA, both ALMCPDA and
  CALMCPDA with <math|c=1>, SMCPDA and MMCPDA. Recall that all the tests used
  in the algorithms CALMCPDA, SMCPDA, and MMCPDA are based on CUSUM. The
  number of correct detection and average computation time in second based on
  1000 simulations are given Table 1.

  <vspace|.2in><assign|tabcolsep|<macro|1mm>><assign|arraystretch|<macro|1.2>>

  <\big-table>
    \;

    <tabular*|<tformat|<cwith|1|-1|1|1|cell-lborder|1ln>|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-rborder|1ln>|<cwith|1|-1|2|2|cell-halign|c>|<cwith|1|-1|2|2|cell-rborder|1ln>|<cwith|1|-1|3|3|cell-halign|c>|<cwith|1|-1|3|3|cell-rborder|1ln>|<cwith|1|-1|4|4|cell-halign|c>|<cwith|1|-1|4|4|cell-rborder|1ln>|<cwith|1|-1|5|5|cell-halign|c>|<cwith|1|-1|5|5|cell-rborder|1ln>|<cwith|1|-1|6|6|cell-halign|c>|<cwith|1|-1|6|6|cell-rborder|1ln>|<cwith|1|-1|7|7|cell-halign|c>|<cwith|1|-1|7|7|cell-rborder|1ln>|<cwith|1|-1|1|-1|cell-valign|c>|<cwith|1|1|1|-1|cell-tborder|1ln>|<cwith|1|1|1|-1|cell-bborder|1ln>|<cwith|2|2|1|-1|cell-bborder|1ln>|<cwith|3|3|1|-1|cell-bborder|1ln>|<table|<row|<cell|>|<cell|LSMCPDA>|<cell|ALMCPDA>|<cell|CALMCPDA>|<cell|SMCPDA>|<cell|MMCPDA>>|<row|<cell|No.
    of Correct Detection>|<cell|999>|<cell|996>|<cell|1000>|<cell|1000>|<cell|1000>>|<row|<cell|Average
    Computation Time>|<cell|1.42>|<cell|3.84>|<cell|6.78>|<cell|1.93>|<cell|1.98>>>>>
  </big-table|The entries are the numbers of correct change point detection
  by the five algorithms LSMCPDA, ALMCPDA, CALMCPDA, SMCPDA and MMCPDA and
  the corresponding average computation time based on 1000 simulations.>

  <no-indent>From Table 1, it can be seen that all algorithms perform very
  well. The average detection time required by CALMCPDA for a sample of size
  5000 is more than other proposed algorithms but only 6.78 seconds.

  <vspace|.2in><no-indent><with|font-family|ss|4.2. The case that there are
  nine change points in the data sequence of size 5000>

  <no-indent>In this subsection, we consider a case that there are nine
  change points in the data sequence of size 5000. We will examine the
  performance of the proposed algorithms via the rate for correctly
  estimating the number of change points and the accuracy of change point
  estimation. The average computation time for multiple change point
  detection is also given for each algorithm.

  Consider the model (<reference|cp>), i.e.,

  <eqnarray|<tformat|<table|<row|<cell|y<rsub|i,n>>|<cell|=>|<cell|<big|sum><rsub|j=1><rsup|q>x<rsub|i,j,n>*\<beta\><rsub|j,0>+<big|sum><rsub|\<ell\>=1><rsup|K<rsub|0>><big|sum><rsub|j=1><rsup|q>x<rsub|i,j,n>*\<delta\><rsub|j,0><rsup|<around|(|\<ell\>|)>>*I*<around|(|a<rsup|<around|(|0|)>><rsub|\<ell\>,n>\<less\>i\<leqslant\>n|)>+\<varepsilon\><rsub|i,n>>>|<row|<cell|>|<cell|=>|<cell|<mx><rsub|i,n><rsup|T><around*|[|<mbeta><rsub|0>+<big|sum><rsub|\<ell\>=1><rsup|K<rsub|0>><mdelta><rsub|\<ell\>,0>I*<around|(|a<rsup|<around|(|0|)>><rsub|\<ell\>,n>\<less\>i\<leqslant\>n|)>|]>+\<varepsilon\><rsub|i,n>,<space|1em>i=1,\<ldots\>,n.<eq-number>>>>>>

  As in Subsection 4.1, set <math|n=5000>, <math|q=3>,
  <math|<mbeta><rsub|0>=<around|(|1,1.4,0.7|)><rsup|T>>, choose
  <math|p<rsub|n>=<around|\<lfloor\>|n/50|\<rfloor\>>> and
  <math|m=<around|\<lfloor\>|n/<around|(|p<rsub|n>+1|)>|\<rfloor\>>>, and
  generate <math|<around|{|x<rsub|i,j,n>|}>> and
  <math|<around|{|\<varepsilon\><rsub|i>|}>> in the same way as in Subsection
  4.1. Set <math|K<rsub|0>=9>, <math|<mdelta><rsub|1>=<mdelta><rsub|3>=<mdelta><rsub|5>=<mdelta><rsub|7>=<mdelta><rsub|9>=<around|(|0.5,-0.7,0.4|)><rsup|T>>,
  and <math|<mdelta><rsub|2>=<mdelta><rsub|4>=<mdelta><rsub|6>=<mdelta><rsub|8>=-<mdelta><rsub|1>>.
  Consider the following two change point location settings:

  <\itemize>
    <item*|>CPL1. <math|a<rsub|i>=500\<times\>i>, for <math|i=1,\<ldots\>,9>;

    <item*|>CPL2. <math|a<rsub|1>=503>, <math|a<rsub|2>=923>,
    <math|a<rsub|3>=1471>, <math|a<rsub|4>=2077>, <math|a<rsub|5>=2334>,
    <math|a<rsub|6>=2890>, <math|a<rsub|7>=3410>, <math|a<rsub|8>=3909>, and
    <math|a<rsub|9>=4546>.
  </itemize>

  For demonstration, two scatter plots of simulated data for the settings
  CPL1 and CPL2 are given respectively in Figures 2-3. One can hardly find
  any change points from these two figures.

  <big-figure|<with|par-mode|center|<image|9cps-f.eps|1tex-text-width|||><label|fig2>>|The
  scatter plot of simulated data for Setting CPL1.>

  <big-figure|<with|par-mode|center|<image|9cps-r.eps|1tex-text-width|||><label|fig3>>|The
  scatter plot of simulated data for Setting CPL2.>

  We compare the following five algorithms: LSMCPDA, ALMCPDA, CALMCPDA,
  SMCPDA and MMCPDA. Let <math|<wide|a|~><rsub|i>> stand for
  <math|<wide|a|^><rsub|i>>, <math|<wide|a|\<breve\>><rsub|i>>,
  <math|<wide|a|\<breve\>><rsup|C><rsub|i>>,
  <math|<wide|a|^><rsup|<SCAD>><rsub|i>> or
  <math|<wide|a|^><rsup|<MCP>><rsub|i>> for <math|i=1,\<ldots\>,9>. We check
  the accuracy of multiple change point estimation based on each algorithm by
  examining the distance between <math|<wide|a|~><rsub|i>> and
  <math|a<rsub|i>> for <math|i=1,\<ldots\>,9>. We only consider such distance
  to be equal to 0 or less than or equal to 5 or 10. The simulation results
  for the two change point location settings CPL1 and CPL2 are presented in
  Tables 2-3.

  <assign|tabcolsep|<macro|1mm>><assign|arraystretch|<macro|1.2>>

  <\big-table>
    <tabular*|<tformat|<cwith|1|-1|1|1|cell-lborder|1ln>|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-rborder|1ln>|<cwith|1|-1|2|2|cell-halign|c>|<cwith|1|-1|2|2|cell-rborder|1ln>|<cwith|1|-1|3|3|cell-halign|c>|<cwith|1|-1|3|3|cell-rborder|1ln>|<cwith|1|-1|4|4|cell-halign|c>|<cwith|1|-1|4|4|cell-rborder|1ln>|<cwith|1|-1|5|5|cell-halign|c>|<cwith|1|-1|5|5|cell-rborder|1ln>|<cwith|1|-1|6|6|cell-halign|c>|<cwith|1|-1|6|6|cell-rborder|1ln>|<cwith|1|-1|1|-1|cell-valign|c>|<cwith|1|1|1|-1|cell-tborder|1ln>|<cwith|1|1|1|-1|cell-bborder|1ln>|<cwith|4|4|1|-1|cell-bborder|1ln>|<cwith|7|7|1|-1|cell-bborder|1ln>|<cwith|10|10|1|-1|cell-bborder|1ln>|<cwith|13|13|1|-1|cell-bborder|1ln>|<cwith|16|16|1|-1|cell-bborder|1ln>|<cwith|19|19|1|-1|cell-bborder|1ln>|<cwith|22|22|1|-1|cell-bborder|1ln>|<cwith|25|25|1|-1|cell-bborder|1ln>|<cwith|28|28|1|-1|cell-bborder|1ln>|<cwith|29|29|1|-1|cell-bborder|1ln>|<cwith|30|30|1|-1|cell-bborder|1ln>|<table|<row|<cell|>|<cell|LSMCPDA>|<cell|ALMCPDA>|<cell|CALMCPDA>|<cell|SMCPDA>|<cell|MMCPDA>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|1,n>-a<rsub|1>|\|>=0>>|<cell|208>|<cell|215>|<cell|215>|<cell|212>|<cell|212>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|1>-a<rsub|1>|\|>\<leqslant\>5>>|<cell|958>|<cell|973>|<cell|973>|<cell|973>|<cell|974>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|1>-a<rsub|1>|\|>\<leqslant\>10>>|<cell|990>|<cell|993>|<cell|993>|<cell|992>|<cell|992>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|2>-a<rsub|2>|\|>=0>>|<cell|489>|<cell|532>|<cell|532>|<cell|520>|<cell|525>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|2>-a<rsub|2>|\|>\<leqslant\>5>>|<cell|924>|<cell|939>|<cell|939>|<cell|918>|<cell|922>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|2>-a<rsub|2>|\|>\<leqslant\>10>>|<cell|979>|<cell|982>|<cell|982>|<cell|960>|<cell|964>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|3>-a<rsub|3>|\|>=0>>|<cell|263>|<cell|262>|<cell|262>|<cell|253>|<cell|253>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|3>-a<rsub|3>|\|>\<leqslant\>5>>|<cell|806>|<cell|807>|<cell|807>|<cell|773>|<cell|792>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|3>-a<rsub|3>|\|>\<leqslant\>10>>|<cell|972>|<cell|977>|<cell|977>|<cell|932>|<cell|952>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|4>-a<rsub|4>|\|>=0>>|<cell|162>|<cell|174>|<cell|174>|<cell|157>|<cell|174>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|4>-a<rsub|4>|\|>\<leqslant\>5>>|<cell|810>|<cell|806>|<cell|806>|<cell|773>|<cell|786>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|4>-a<rsub|4>|\|>\<leqslant\>10>>|<cell|961>|<cell|959>|<cell|959>|<cell|921>|<cell|939>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|5>-a<rsub|5>|\|>=0>>|<cell|716>|<cell|726>|<cell|726>|<cell|694>|<cell|703>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|5>-a<rsub|5>|\|>\<leqslant\>5>>|<cell|961>|<cell|975>|<cell|975>|<cell|931>|<cell|947>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|5>-a<rsub|5>|\|>\<leqslant\>10>>|<cell|986>|<cell|998>|<cell|998>|<cell|953>|<cell|970>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|6>-a<rsub|6>|\|>=0>>|<cell|210>|<cell|223>|<cell|223>|<cell|215>|<cell|218>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|6>-a<rsub|6>|\|>\<leqslant\>5>>|<cell|980>|<cell|985>|<cell|985>|<cell|941>|<cell|956>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|6>-a<rsub|6>|\|>\<leqslant\>10>>|<cell|993>|<cell|1000>|<cell|1000>|<cell|955>|<cell|971>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|7>-a<rsub|7>|\|>=0>>|<cell|201>|<cell|219>|<cell|219>|<cell|195>|<cell|204>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|7>-a<rsub|7>|\|>\<leqslant\>5>>|<cell|824>|<cell|876>|<cell|876>|<cell|814>|<cell|844>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|7>-a<rsub|7>|\|>\<leqslant\>10>>|<cell|928>|<cell|973>|<cell|973>|<cell|904>|<cell|937>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|8>-a<rsub|8>|\|>=0>>|<cell|455>|<cell|511>|<cell|511>|<cell|460>|<cell|474>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|8>-a<rsub|8>|\|>\<leqslant\>5>>|<cell|893>|<cell|978>|<cell|978>|<cell|897>|<cell|927>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|8>-a<rsub|8>|\|>\<leqslant\>10>>|<cell|907>|<cell|991>|<cell|991>|<cell|911>|<cell|942>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|9>-a<rsub|9>|\|>=0>>|<cell|240>|<cell|277>|<cell|277>|<cell|276>|<cell|279>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|9>-a<rsub|9>|\|>\<leqslant\>5>>|<cell|786>|<cell|935>|<cell|936>|<cell|922>|<cell|918>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|9>-a<rsub|9>|\|>\<leqslant\>10>>|<cell|822>|<cell|980>|<cell|981>|<cell|966>|<cell|961>>|<row|<cell|No.
    of Correct Detection>|<cell|818>|<cell|950>|<cell|987>|<cell|898>|<cell|920>>|<row|<cell|Average
    Computation Time>|<cell|2.23>|<cell|5.61>|<cell|8.20>|<cell|2.88>|<cell|2.98>>>>>
  </big-table|The entries are the numbers of <math|<wide|a|~><rsub|i>> such
  that <math|<around|\||<wide|a|~><rsub|i>-a<rsub|i,n>|\|>\<leqslant\>0,5,10>
  for <math|i=1,\<ldots\>,9>, the number of correctly estimating the number
  of change points and the corresponding average computation time by each of
  the five algorithms LSMCPDA, ALMCPDA, CALMCPDA, SMCPDA and MMCPDA based on
  1000 simulations for the change point location setting CPL1.>

  <assign|tabcolsep|<macro|1mm>><assign|arraystretch|<macro|1.2>>

  <\big-table>
    \;

    <tabular*|<tformat|<cwith|1|-1|1|1|cell-lborder|1ln>|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-rborder|1ln>|<cwith|1|-1|2|2|cell-halign|c>|<cwith|1|-1|2|2|cell-rborder|1ln>|<cwith|1|-1|3|3|cell-halign|c>|<cwith|1|-1|3|3|cell-rborder|1ln>|<cwith|1|-1|4|4|cell-halign|c>|<cwith|1|-1|4|4|cell-rborder|1ln>|<cwith|1|-1|5|5|cell-halign|c>|<cwith|1|-1|5|5|cell-rborder|1ln>|<cwith|1|-1|6|6|cell-halign|c>|<cwith|1|-1|6|6|cell-rborder|1ln>|<cwith|1|-1|1|-1|cell-valign|c>|<cwith|1|1|1|-1|cell-tborder|1ln>|<cwith|1|1|1|-1|cell-bborder|1ln>|<cwith|4|4|1|-1|cell-bborder|1ln>|<cwith|7|7|1|-1|cell-bborder|1ln>|<cwith|10|10|1|-1|cell-bborder|1ln>|<cwith|13|13|1|-1|cell-bborder|1ln>|<cwith|16|16|1|-1|cell-bborder|1ln>|<cwith|19|19|1|-1|cell-bborder|1ln>|<cwith|22|22|1|-1|cell-bborder|1ln>|<cwith|25|25|1|-1|cell-bborder|1ln>|<cwith|28|28|1|-1|cell-bborder|1ln>|<cwith|29|29|1|-1|cell-bborder|1ln>|<cwith|30|30|1|-1|cell-bborder|1ln>|<table|<row|<cell|>|<cell|LSMCPDA>|<cell|ALMCPDA>|<cell|CALMCPDA>|<cell|SMCPDA>|<cell|MMCPDA>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|1>-a<rsub|1>|\|>=0>>|<cell|362>|<cell|378>|<cell|378>|<cell|377>|<cell|381>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|1>-a<rsub|1>|\|>\<leqslant\>5>>|<cell|955>|<cell|961>|<cell|961>|<cell|955>|<cell|960>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|1>-a<rsub|1>|\|>\<leqslant\>10>>|<cell|986>|<cell|991>|<cell|991>|<cell|985>|<cell|991>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|2>-a<rsub|2>|\|>=0>>|<cell|270>|<cell|276>|<cell|275>|<cell|271>|<cell|274>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|2>-a<rsub|2>|\|>\<leqslant\>5>>|<cell|858>|<cell|872>|<cell|869>|<cell|861>|<cell|865>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|2>-a<rsub|2>|\|>\<leqslant\>10>>|<cell|975>|<cell|991>|<cell|988>|<cell|976>|<cell|981>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|3>-a<rsub|3>|\|>=0>>|<cell|426>|<cell|522>|<cell|522>|<cell|522>|<cell|523>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|3>-a<rsub|3>|\|>\<leqslant\>5>>|<cell|767>|<cell|952>|<cell|952>|<cell|957>|<cell|958>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|3>-a<rsub|3>|\|>\<leqslant\>10>>|<cell|811>|<cell|982>|<cell|982>|<cell|987>|<cell|988>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|4>-a<rsub|4>|\|>=0>>|<cell|195>|<cell|194>|<cell|194>|<cell|115>|<cell|150>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|4>-a<rsub|4>|\|>\<leqslant\>5>>|<cell|892>|<cell|911>|<cell|911>|<cell|525>|<cell|714>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|4>-a<rsub|4>|\|>\<leqslant\>10>>|<cell|955>|<cell|970>|<cell|970>|<cell|562>|<cell|766>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|5>-a<rsub|5>|\|>=0>>|<cell|272>|<cell|295>|<cell|294>|<cell|169>|<cell|249>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|5>-a<rsub|5>|\|>\<leqslant\>5>>|<cell|910>|<cell|980>|<cell|978>|<cell|578>|<cell|834>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|5>-a<rsub|5>|\|>\<leqslant\>10>>|<cell|921>|<cell|997>|<cell|995>|<cell|582>|<cell|845>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|6>-a<rsub|6>|\|>=0>>|<cell|793>|<cell|795>|<cell|795>|<cell|783>|<cell|779>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|6>-a<rsub|6>|\|>\<leqslant\>5>>|<cell|967>|<cell|971>|<cell|968>|<cell|954>|<cell|946>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|6>-a<rsub|6>|\|>\<leqslant\>10>>|<cell|987>|<cell|993>|<cell|988>|<cell|972>|<cell|964>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|7>-a<rsub|7>|\|>=0>>|<cell|293>|<cell|317>|<cell|315>|<cell|309>|<cell|309>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|7>-a<rsub|7>|\|>\<leqslant\>5>>|<cell|922>|<cell|941>|<cell|939>|<cell|932>|<cell|931>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|7>-a<rsub|7>|\|>\<leqslant\>10>>|<cell|973>|<cell|991>|<cell|989>|<cell|984>|<cell|986>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|8>-a<rsub|8>|\|>=0>>|<cell|197>|<cell|210>|<cell|196>|<cell|211>|<cell|206>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|8>-a<rsub|8>|\|>\<leqslant\>5>>|<cell|836>|<cell|899>|<cell|899>|<cell|904>|<cell|910>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|8>-a<rsub|8>|\|>\<leqslant\>10>>|<cell|891>|<cell|968>|<cell|968>|<cell|969>|<cell|975>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|9>-a<rsub|9>|\|>=0>>|<cell|305>|<cell|298>|<cell|298>|<cell|304>|<cell|304>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|9>-a<rsub|9>|\|>\<leqslant\>5>>|<cell|927>|<cell|924>|<cell|924>|<cell|934>|<cell|932>>|<row|<cell|<math|<around|\||<wide|a|~><rsub|9>-a<rsub|9>|\|>\<leqslant\>10>>|<cell|974>|<cell|977>|<cell|977>|<cell|982>|<cell|982>>|<row|<cell|No.
    of Correct Detection>|<cell|895>|<cell|947>|<cell|964>|<cell|572>|<cell|759>>|<row|<cell|Average
    Computation Time>|<cell|2.29>|<cell|5.97>|<cell|8.65>|<cell|3.00>|<cell|2.98>>>>>
  </big-table|The entries are the numbers of <math|<wide|a|~><rsub|i>> such
  that <math|<around|\||<wide|a|~><rsub|i>-a<rsub|i,n>|\|>\<leqslant\>0,5,10>
  for <math|i=1,\<ldots\>,9>, the number of correctly estimating the number
  of change points and the corresponding average computation time by each of
  the five algorithms LSMCPDA, ALMCPDA, CALMCPDA, SMCPDA and MMCPDA based on
  1000 simulations for the change point location setting CPL2.>

  From both tables, it can be seen that all algorithms perform well in terms
  of accuracy of multiple change point estimation and the rate for correctly
  estimating the number of change points. The ALMCPDA and CALMCPDA are
  compatible and in generally outperform others. The average detection time
  required by CALMCPDA for a sample of size 5000 is more than all other
  algorithms, which is 8.20 seconds for CPL1 and 8.65 seconds for CPL2. In
  contrast, the average detection time required by ALMCPDA is only 5.61
  seconds for CPL1 and 5.97 seconds for CPL2.

  <no-indent><with|font-family|ss|4.3. Practical recommendation of
  <math|p<rsub|n>>>

  <no-indent>It is clear that the choice of <math|p<rsub|n>> will affect the
  performance of the proposed algorithms. Too large <math|p<rsub|n>> may tend
  to underestimate the true number of multiple change points and increase
  biases in change point estimation while may cut down the computation time.
  Hence a care must be taken in choosing a proper <math|p<rsub|n>>, and we
  propose the following algorithm:

  <no-indent><em|Step 1.> We choose an initial set <math|<mB>> containing
  probable values of <math|p<rsub|n>>.

  <no-indent><em|Step 2.> For each <math|p<rsub|n>> in the set <math|<mB>>,
  we obtain an estimate of <math|<mtheta><rsub|n>> in (<reference|al>) by
  using an algorithm, say ALMCPDA. We can then calculate the residual sum of
  squares, denoted by <math|R*S*S<around|(|p<rsub|n>|)>>.

  <no-indent><em|Step 3.> The optimal <math|p<rsub|n>> is chosen as <math|arg
  min<rsub|p<rsub|n>\<in\><mB>> R*S*S<around|(|p<rsub|n>|)>>.

  <vspace|.2in><no-indent><with|font-series|bold|5. Empirical applications>

  <no-indent>In this section, we consider empirical applications of the
  multiple change point detection methods proposed in this paper by analyzing
  the U.S. Ex-Post Real Interest Rate (Garcia and Perron, 1996) and Gross
  domestic product in U.S.A (Maddala, 1977).

  <vspace|.2in><no-indent><with|font-family|ss|5.1. The U.S. Ex-Post Real
  Interest Rate>

  <no-indent>Garcia and Perron (1996) considered the time series behavior of
  the U.S. Ex-Post real interest rate (constructed from the three-month
  treasury bill rate deflated by the CPI inflation rate taken from the
  Citibase data base). The data are quarterly series from January, 1961 to
  March, 1986, which is plotted in Figure 4. We are interested in finding out
  if there are change points in the mean of the series. Thus we apply the
  proposed algorithms to the mean shift model. It is noted that by Remark 2,
  the algorithms are applicable even if there exists potential serial
  correlation.

  <big-figure|<with|par-mode|center|<image|finance.eps|1tex-text-width|0.38tex-text-height||><label|fig4>>|U.S.
  Ex-Post Real Interest Rate, the first quarter of 1961 -- the third quarter
  of 1986>

  First, we need to select a <math|p<rsub|n>>. Following the recommendations
  in Subsection 4.3, we will choose an optimal <math|p<rsub|n>> from the
  range 3 to 13. For each <math|p<rsub|n>\<in\><around|{|3,4,\<ldots\>,13|}>>,
  we obtain <math|<wide|<mtheta>|\<breve\>><rsub|n>> by the ALMCPDA, and
  calculate the corresponding <math|R*S*S<around|(|p<rsub|n>|)>>. Choose
  <math|arg min<rsub|3\<leqslant\>p<rsub|n>\<leqslant\>13>
  R*S*S<around|(|p<rsub|n>|)>> as the optimal <math|p<rsub|n>>, which is 5.
  See Figure 5.

  Based on the first step, we set <math|p<rsub|n>=5> and apply the five
  algorithms given in Section 3 to the data. Two change points are found
  based on the ALMCPDA and the CALMCPDA, which are located at 47 and 79 (see
  Figure 4) with RSS=455.95 corresponding to the third quarter of 1972 and
  the third quarter of 1980. These results are consistent with those of
  Garcia and Perron (1996). However the other three algorithms LSMCPDA,
  SMCPDA and MMCPDA only detect one change point located at 47 with
  RSS=1214.89. By comparing their RSSs, it is clear that both ALMCPDA and
  CALMCPDA have better performance than the other three algorithms.

  <big-figure|<with|par-mode|center|<image|serror.eps|1tex-text-width|0.38tex-text-height||><label|fig5>>|<math|R*S*S<around|(|p<rsub|n>|)>>
  against <math|p<rsub|n>> for the U.S. ex-post real interest rate data>

  <vspace|.2in>

  <no-indent><with|font-family|ss|5.2. Gross domestic product in U.S.A>

  <no-indent>The data presented in Maddala (1977, Table 10.3) gives the gross
  domestic product (<math|G>), the labor input index (<math|L>) and the
  capital input index (<math|C>) in the United States for the years
  1929-1967. <math|log G> is modeled as a linear function of <math|log L> and
  <math|log C>. The <math|log G>, <math|log L> and <math|log C> are plotted
  over time given in Figure 6. Worsley (1983) used the likelihood ratio
  method to search for change points in this data set and pointed out that
  the data contained two change points located at 1942 and 1946
  (RSS<math|=0.011>). Caussinus and Lyazrhi (1997) used Bayes invariant
  optimal multi-decision procedure to detect change points in the data series
  and claimed three change points located at 1938, 1944 and 1948
  (RSS<math|=0.01>).

  <big-figure|<with|par-mode|center|<image|gross.eps|1tex-text-width|0.38tex-text-height||><label|fig6>>|Logrithms
  of Gross domestic product (<math|log>G), labor-input index (<math|log>L)
  and capital-input index (<math|log>C) in U.S.A. for the years 1929-1967.>

  Since the sample size is only 39, the proposed algorithms employing least
  squares or the CUSUM test may not work. Thus we only apply the first two
  steps of the SMCPDA or the MMCPDA to carry out multiple change point
  analysis. As in the previous example, we need to select a <math|p<rsub|n>>.
  Following the recommendations in Subsection 4.3, we will choose an optimal
  <math|p<rsub|n>> from 13 to 17. For each
  <math|p<rsub|n>\<in\><around|{|13,\<ldots\>,17|}>>, we obtain
  <math|<wide|<mtheta>|^><rsup|<SCAD>><rsub|n>> by the SMCPDA, and calculate
  the corresponding <math|R*S*S<around|(|p<rsub|n>|)>>. Choose <math|arg
  min<rsub|13\<leqslant\>p<rsub|n>\<leqslant\>17>
  R*S*S<around|(|p<rsub|n>|)>> as the optimal <math|p<rsub|n>>, which is 17.
  With <math|p<rsub|n>=17>, four change points detected by applying the
  SMCPDA are located at 1936, 1942, 1946 and 1950 with RSS=0.0054. With the
  same <math|p<rsub|n>>, two change points detected by applying the MMCPDA
  are located at 1942 and 1958 with RSS=0.015. Thus, in terms of the RSSs,
  the SMCPDA has a better performance.

  <vspace|.2in><no-indent><with|font-series|bold|6. Conclusion>

  <no-indent>By properly segmenting the data sequence, we proposed five
  multiple change point detection algorithms. The proposed approach is based
  on the following reasons. On the one hand, a proper segmentation can
  isolate the finite change points such that each change point is only
  located in one segment, and a connection between multiple change point
  detection and variable selection can be established. Thus the recent
  advances in consistent variable selection methods such as SCAD, adaptive
  LASSO and MCP can be used to detect these change points simultaneously. On
  the other hand, a refining procedure using a method such as CUSUM can
  improve the accuracy of change point estimates. Compared with other change
  point detection methods, which is very time consuming, the newly proposed
  algorithms are much faster, more effective, and have strong theoretical
  backup. The proposed approach can be extended to detect multiple change
  points in other models such as generalized linear models and nonparametric
  models without any extra difficulties.

  <no-indent><with|font-series|bold|Appendix>

  <reset-counter|equation><assign|the-equation|<macro|A.<number|<equation-nr>|arabic>>><no-indent><with|font-family|ss|A.1.
  CUSUM test for a single change point>

  <no-indent>Consider the following model

  <\equation>
    <my><rsub|i>=<mx><rsub|i><rsup|T><mbeta><rsub|1>I*<around|(|n<rsub|\<ell\>>\<leqslant\>i\<leqslant\>k|)>+<mx><rsub|i><rsup|T><mbeta><rsub|2>I*<around|(|k\<less\>i\<leqslant\>n<rsub|\<ell\>+1>|)>+\<varepsilon\><rsub|i>,<space|1em>n<rsub|\<ell\>>\<leqslant\>i\<leqslant\>n<rsub|\<ell\>+1>,<label|ap1>
  </equation>

  where <math|<my><rsub|\<ell\>>=<around|(|y<rsub|n<rsub|\<ell\>>>,\<ldots\>,y<rsub|n<rsub|\<ell\>+1>>|)><rsup|T>>,
  <math|<mx><rsub|n<rsub|\<ell\>>>,<mx><rsub|n<rsub|\<ell\>>+1>,\<cdots\>,<mx><rsub|n<rsub|\<ell\>+1>>>
  are <math|q>-dimensional predictors, <math|<mbeta><rsub|1>> and
  <math|<mbeta><rsub|2>> are unknown <math|q>-dimensional vectors of
  regression coefficients, and <math|<me><rsub|n>=<around|(|\<varepsilon\><rsub|n<rsub|\<ell\>>>,\<ldots\>,\<varepsilon\><rsub|n<rsub|\<ell\>+1>>|)><rsup|T>>.
  If <math|n<rsub|\<ell\>>\<leqslant\>k\<less\>n<rsub|\<ell\>+1>> and
  <math|<mbeta><rsub|1>\<neq\><mbeta><rsub|2>>, there is a change point at
  <math|k>.

  Let <math|N<rsub|\<ell\>>=n<rsub|\<ell\>+1>-n<rsub|\<ell\>>+1>. Define

  <\equation*>
    <wide|\<sigma\>|^><rsup|2><rsub|\<ell\>,k>=<frac|1|N<rsub|\<ell\>>>*<around*|[|min<rsub|<mbeta>>
    <big|sum><rsub|i=n<rsub|\<ell\>>><rsup|k><around|(|y<rsub|i>-<mx><rsub|i><rsup|T><mbeta>|)><rsup|2>+min<rsub|<mbeta>>
    <big|sum><rsub|i=k+1><rsup|n<rsub|\<ell\>+1>><around|(|y<rsub|i>-<mx><rsub|i><rsup|T><mbeta>|)><rsup|2>|]>,
  </equation*>

  and <math|<wide|\<sigma\>|^><rsup|2><rsub|\<ell\>>=min<rsub|<mbeta>>
  <big|sum><rsub|i=n<rsub|\<ell\>>><rsup|n<rsub|\<ell\>+1>><around|(|y<rsub|i>-<mx><rsub|i><rsup|T><mbeta>|)><rsup|2>/N<rsub|\<ell\>>>.
  By Theorem 3.1.1 of Csrg and Horvath (1997)), it follows that

  <eqnarray|<tformat|<table|<row|<cell|lim<rsub|N<rsub|\<ell\>>\<rightarrow\>\<infty\>>
  P*<around*|[|a<rsub|\<ell\>>*\<Lambda\><rsub|\<ell\>><rsup|1/2>\<leqslant\>x/2+b<rsub|\<ell\>,q>|]>=exp
  <around*|(|-2*e<rsup|-x/2>|)>,<eq-number><label|cso>>>>>>

  for all <math|x>, where <math|a<rsub|\<ell\>>=<around|(|2*log log
  N<rsub|\<ell\>>|)><rsup|1/2>>, <math|b<rsub|\<ell\>,q>=2*log log
  N<rsub|\<ell\>>+q<around|(|log log log N<rsub|\<ell\>>|)>/2-log
  \<Gamma\>*<around|(|q/2|)>>, <math|\<Gamma\><around|(|x|)>> is the Gamma
  function, <math|\<Lambda\><rsub|\<ell\>>=max<rsub|n<rsub|\<ell\>>+q\<leqslant\>k\<leqslant\>n<rsub|\<ell\>+1>-q><around*|[|-2*log
  <around*|(|<wide|\<sigma\>|^><rsup|2><rsub|\<ell\>,k>/<wide|\<sigma\>|^><rsup|2><rsub|\<ell\>>|)><rsup|N<rsub|\<ell\>>/2>|]>>.

  In light of the proof of Corollary 2.1 of Hukov, Prkov and Steinebach
  (2007), it can be shown that

  <\equation*>
    lim<rsub|N<rsub|\<ell\>>\<rightarrow\>\<infty\>>
    P*<around*|[|a<rsub|\<ell\>>*\<Lambda\><rsub|\<ell\>><rsup|1/2>\<leqslant\>x/2+b<rsub|\<ell\>,q>|]>=lim<rsub|N<rsub|\<ell\>>\<rightarrow\>\<infty\>>
    P*<around*|[|<around|(|\<Lambda\><rsub|\<ell\>>-<wide|b|~><rsub|\<ell\>,q>|)>/<wide|a|~><rsub|\<ell\>,q>\<leqslant\>x|]>,
  </equation*>

  where <math|<wide|b|~><rsub|\<ell\>,q>=<around|(|b<rsub|\<ell\>,q>/a<rsub|\<ell\>>|)><rsup|2>>
  and <math|<wide|a|~><rsub|\<ell\>>=b<rsub|\<ell\>,q>/a<rsup|2><rsub|\<ell\>>>,
  which jointly with (<reference|cso>) implies that

  <eqnarray|<tformat|<table|<row|<cell|lim<rsub|N<rsub|\<ell\>>\<rightarrow\>\<infty\>>
  P*<around*|[|<around|(|\<Lambda\><rsub|\<ell\>>-<wide|b|~><rsub|\<ell\>,q>|)>/<wide|a|~><rsub|\<ell\>,q>\<leqslant\>x|]>=exp
  <around*|(|-2*e<rsup|-x/2>|)>.>>>>>

  By Lemma 3.1.9 of Csrg and Horvath (1997), it can be shown that

  <eqnarray|<tformat|<table|<row|<cell|lim<rsub|N<rsub|\<ell\>>\<rightarrow\>\<infty\>>
  P*<around*|[|<around*|(|<frac|1|<wide|\<sigma\>|^><rsup|2><rsub|\<ell\>>>*max<rsub|n<rsub|\<ell\>>+q\<leqslant\>k\<leqslant\>n<rsub|\<ell\>+1>-q>
  N<rsub|\<ell\>>*<around|(|<wide|\<sigma\>|^><rsup|2><rsub|\<ell\>>-<wide|\<sigma\>|^><rsup|2><rsub|\<ell\>,k>|)>-<wide|b|~><rsub|\<ell\>,q>|)>/<wide|a|~><rsub|\<ell\>,q>\<leqslant\>x|]>=exp
  <around*|(|-2*e<rsup|-x/2>|)>.>>>>>

  Let <math|T<rsub|\<ell\>,k>=N<rsub|\<ell\>>*<around|(|<wide|\<sigma\>|^><rsup|2><rsub|\<ell\>>-<wide|\<sigma\>|^><rsup|2><rsub|\<ell\>,k>|)>>
  and <math|T<rsub|\<ell\>>=max<rsub|n<rsub|\<ell\>>+q\<leqslant\>k\<leqslant\>n<rsub|\<ell\>+1>-q>
  T<rsub|\<ell\>,k>>. Given a significant level <math|\<alpha\>>, the CUSUM
  test for testing if there is a change point in the model (<reference|ap1>)
  is given in the following: If

  <eqnarray|<tformat|<table|<row|<cell|<around*|\<nobracket\>|T<rsub|\<ell\>>\<gtr\><around*|[|<wide|b|~><rsub|\<ell\>,q>+2*<wide|a|~><rsub|\<ell\>,q>*log|(>-2/log
  <around|(|1-\<alpha\>|)>)|]>*<wide|\<sigma\>|^><rsup|2><rsub|\<ell\>>,>>>>>

  there exists a <math|k\<in\><around|{|n<rsub|\<ell\>>+q,\<ldots\>,n<rsub|\<ell\>+1>-q|}>>
  such that <math|<mbeta><rsub|1>\<neq\><mbeta><rsub|2>> in the model
  (<reference|ap1>).

  Denote <math|C<rsub|\<ell\>>=<big|sum><rsub|i=n<rsub|\<ell\>>><rsup|n<rsub|\<ell\>+1>><mx><rsub|i><mx><rsub|i><rsup|T>>,
  <math|<wide|<mbeta>|^><rsub|\<ell\>>=C<rsub|\<ell\>><rsup|-1>*<big|sum><rsub|i=n<rsub|\<ell\>>><rsup|n<rsub|\<ell\>+1>><mx><rsub|i>y<rsub|i>>,
  <math|C<rsub|\<ell\>,k>=<big|sum><rsub|i=n<rsub|\<ell\>>><rsup|k><mx><rsub|i><mx><rsub|i><rsup|T>>,
  <math|C<rsub|\<ell\>,k><rsup|0>=C<rsub|\<ell\>>-C<rsub|\<ell\>,k>>,
  <math|S<rsub|\<ell\>,k>=<big|sum><rsub|i=n<rsub|\<ell\>>><rsup|k><mx><rsub|i><around|(|y<rsub|i>-<mx><rsub|i><rsup|T><wide|<mbeta>|^><rsub|\<ell\>>|)>>
  for <math|k=n<rsub|\<ell\>>+q,\<ldots\>,n<rsub|\<ell\>+1>-q>. By Hukov,
  Prkov and Steinebach (2007),

  <eqnarray|<tformat|<table|<row|<cell|T<rsub|\<ell\>>=max<rsub|n<rsub|\<ell\>>+q\<leqslant\>k\<leqslant\>n<rsub|\<ell\>+1>-q>
  S<rsub|\<ell\>,k><rsup|T>*C<rsub|\<ell\>,k><rsup|-1>*C<rsub|\<ell\>><around|(|C<rsub|\<ell\>,k><rsup|0>|)><rsup|-1>*S<rsub|\<ell\>,k>.<eq-number><label|hus2>>>>>>

  Since <math|S<rsub|\<ell\>,k>> and <math|C<rsub|\<ell\>,k>> can be computed
  recursively, the computing time of <math|T<rsub|\<ell\>>> is reduced to
  <math|O*<around|(|n<rsub|\<ell\>+1>-n<rsub|\<ell\>>|)>> from
  <math|O<around|(|<around|(|n<rsub|\<ell\>+1>-n<rsub|\<ell\>>|)><rsup|2>|)>>
  by using (<reference|hus2>).

  <vspace|.2in><no-indent><with|font-family|ss|A.2. Proof of Lemma 1>

  <no-indent>Denote the elements of <math|<ma><rsub|c>> by
  <math|<ma><rsub|c>=<around|{|r<rsub|1>,r<rsub|2>,\<ldots\>,r<rsub|K<rsub|0>>|}>>.
  In view of <math|<around|\||n-p<rsub|n>*m+r<rsub|k>*m-a<rsub|k,n>|\|>\<leqslant\>m>,
  <math|a<rsub|k,n>/n\<rightarrow\>\<tau\><rsub|i>>, for
  <math|k=1,\<ldots\>,K<rsub|0>> and <math|m=o<around|(|n|)>>, by Assumption
  C1, it follows that

  <\equation*>
    <frac|1|n>*<big|sum><rsub|i=r<rsub|1>><rsup|r<rsub|2>><mX><rsub|<around|(|i|)>><rsup|T><mX><rsub|<around|(|i|)>>\<rightarrow\><around|(|\<tau\><rsub|2>-\<tau\><rsub|1>|)>*W,<space|1em>\<ldots\>,<space|1em><frac|1|n>*<big|sum><rsub|i=r<rsub|K<rsub|0>>><rsup|p<rsub|n>+1><mX><rsub|<around|(|i|)>><rsup|T><mX><rsub|<around|(|i|)>>\<rightarrow\><around|(|1-\<tau\><rsub|K<rsub|0>>|)>*W.
  </equation*>

  Hence,

  <eqnarray|<tformat|<table|<row|<cell|>|<cell|>|<cell|<frac|1|n><mX><rsub|<ma><rsub|c>><rsup|T><mX><rsub|<ma><rsub|c>>=U<rsup|T><matrix|<tformat|<table|<row|<cell|<frac|1|n>*<big|sum><rsub|i=r<rsub|1>><rsup|r<rsub|2>-1><mX><rsub|<around|(|i|)>><rsup|T><mX><rsub|<around|(|i|)>>>|<cell|0>|<cell|\<cdots\>>|<cell|0>>|<row|<cell|0>|<cell|<frac|1|n>*<big|sum><rsub|i=r<rsub|2>><rsup|r<rsub|3>-1><mX><rsub|<around|(|i|)>><rsup|T><mX><rsub|<around|(|i|)>>>|<cell|\<cdots\>>|<cell|0>>|<row|<cell|\<vdots\>>|<cell|\<vdots\>>|<cell|\<ddots\>>|<cell|\<vdots\>>>|<row|<cell|0>|<cell|0>|<cell|\<cdots\>>|<cell|<frac|1|n>*<big|sum><rsub|i=r<rsub|K>><rsup|p<rsub|n>+1><mX><rsub|<around|(|i|)>><rsup|T><mX><rsub|<around|(|i|)>>>>>>>U>>|<row|<cell|>|<cell|>|<cell|>>|<row|<cell|>|<cell|>|<cell|\<rightarrow\>U<rsup|T><matrix|<tformat|<table|<row|<cell|<around|(|\<tau\><rsub|2>-\<tau\><rsub|1>|)>*W>|<cell|0>|<cell|\<cdots\>>|<cell|0>>|<row|<cell|0>|<cell|<around|(|\<tau\><rsub|3>-\<tau\><rsub|2>|)>*W>|<cell|\<cdots\>>|<cell|0>>|<row|<cell|\<vdots\>>|<cell|\<vdots\>>|<cell|\<ddots\>>|<cell|\<vdots\>>>|<row|<cell|0>|<cell|0>|<cell|\<cdots\>>|<cell|<around|(|1-\<tau\><rsub|K>|)>*W>>>>>U<wide|=|^>*\<cal-W\><rsub|<ma><rsub|c>>\<gtr\>0,<eq-number><label|wc>>>>>>

  where

  <\equation*>
    U=<matrix|<tformat|<table|<row|<cell|I<rsub|q>>|<cell|0>|<cell|\<cdots\>>|<cell|0>>|<row|<cell|I<rsub|q>>|<cell|I<rsub|q>>|<cell|\<cdots\>>|<cell|0>>|<row|<cell|\<vdots\>>|<cell|\<vdots\>>|<cell|\<ddots\>>|<cell|\<vdots\>>>|<row|<cell|I<rsub|q>>|<cell|I<rsub|q>>|<cell|\<cdots\>>|<cell|I<rsub|q>>>>>>.
  </equation*>

  <vspace|.2in><no-indent><with|font-family|ss|A.3. Proof of Lemma 2>

  <no-indent>As in the proof of Lemma 1, denote
  <math|<ma><rsub|c>=<around|{|r<rsub|1>,\<ldots\>,r<rsub|K<rsub|0>>|}>>. It
  is easy to see that

  <\equation*>
    <wide|X|~><rsub|n><rsup|T><mx><rsub|\<omega\>>/m=<frac|1|m>*<big|sum><rsub|\<ell\>=1><rsup|p<rsub|n>><around*|(|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|l>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|2|2|cell-halign|l>|<cwith|1|-1|2|2|cell-rborder|0ln>|<table|<row|<cell|<big|sum><rsub|i=1><rsup|n>>|<cell|<mx><rsub|i><mx><rsub|i><rsup|T><bomega><rsub|\<ell\>><around|(|i|)>>>|<row|<cell|<big|sum><rsub|i=n-p<rsub|n>*m+1><rsup|n>>|<cell|<mx><rsub|i><mx><rsub|i><rsup|T><bomega><rsub|\<ell\>><around|(|i|)>>>|<row|<cell|<big|sum><rsub|i=n-<around|(|p<rsub|n>-1|)>*m+1><rsup|n>>|<cell|<mx><rsub|i><mx><rsub|i><rsup|T><bomega><rsub|\<ell\>><around|(|i|)>>>|<row|<cell|>|<cell|\<vdots\>>>|<row|<cell|<big|sum><rsub|i=n-m+1><rsup|n>>|<cell|<mx><rsub|i><mx><rsub|i><rsup|T><bomega><rsub|\<ell\>><around|(|i|)>>>>>>|)>.
  </equation*>

  Consider the first row of <math|<wide|X|~><rsub|n<ma><rsub|c>><rsup|T><mx><rsub|\<omega\>>/m>.
  By Assumption C1, <math|<big|sum><rsub|i=n-<around|(|p<rsub|n>-r<rsub|j>+1|)>*m+1><rsup|n-<around|(|p<rsub|n>-r<rsub|j>|)>*m><mx><rsub|i><mx><rsub|i><rsup|T>/m\<rightarrow\>W>.
  Hence For large <math|n>,

  <eqnarray|<tformat|<table|<row|<cell|>|<cell|>|<cell|<around*|\||<frac|1|m>*<big|sum><rsub|\<ell\>=1><rsup|p<rsub|n>><big|sum><rsub|i=1><rsup|n><mx><rsub|i><mx><rsub|i><rsup|T><bomega><rsub|\<ell\>><around|(|i|)>|\|>\<leqslant\><frac|1|m>*<big|sum><rsub|j=1><rsup|K<rsub|0>><around*|\||<big|sum><rsub|i=n-<around|(|p<rsub|n>-r<rsub|j>+1|)>*m+1><rsup|a<rsub|j,n>><mx><rsub|i><mx><rsub|i><rsup|T><mdelta><rsub|j>|\|>>>|<row|<cell|>|<cell|\<leqslant\>>|<cell|<frac|1|m>*<big|sum><rsub|j=1><rsup|K<rsub|0>><around|\<\|\|\>|<mdelta><rsub|j>|\<\|\|\>><around*|\||<big|sum><rsub|i=n-<around|(|p<rsub|n>-r<rsub|j>+1|)>*m+1><rsup|n-<around|(|p<rsub|n>-r<rsub|j>|)>*m><mx><rsub|i><mx><rsub|i><rsup|T>|\|>\<leqslant\>2*K<rsub|0><around|\<\|\|\>|W|\<\|\|\>>*max<rsub|1\<leqslant\>i\<leqslant\>K<rsub|0>><around*|\||<mdelta><rsub|j>|\|><eq-number><label|l21>>>>>>

  Similarly, it can be shown that for large <math|n> and
  <math|1\<leqslant\>s\<leqslant\>n>,

  <eqnarray|<tformat|<table|<row|<cell|<around*|\||<frac|1|m>*<big|sum><rsub|\<ell\>=1><rsup|p<rsub|n>><big|sum><rsub|i=s><rsup|n><mx><rsub|i><mx><rsub|i><rsup|T><bomega><rsub|\<ell\>><around|(|i|)>|\|>>|<cell|\<leqslant\>>|<cell|2<around|\<\|\|\>|W|\<\|\|\>><around*|{|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|l>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|2|2|cell-halign|l>|<cwith|1|-1|2|2|cell-rborder|0ln>|<table|<row|<cell|<big|sum><rsub|j=1><rsup|K<rsub|0>><around*|\||<mdelta><rsub|j>|\|>,>|<cell|1\<leqslant\>s\<leqslant\>a<rsub|1,n>,>>|<row|<cell|<big|sum><rsub|j=2><rsup|K<rsub|0>><around*|\||<mdelta><rsub|j>|\|>,>|<cell|a<rsub|1,n>\<less\>s\<leqslant\>a<rsub|2,n>,>>|<row|<cell|\<vdots\>>|<cell|\<vdots\>>>|<row|<cell|<around*|\||<mdelta><rsub|K<rsub|0>>|\|>,>|<cell|a<rsub|K<rsub|0>-1,n>\<less\>s\<leqslant\>a<rsub|K<rsub|0>,n>,>>|<row|<cell|0,>|<cell|<text|elsewhere>;>>>>>|\<nobracket\>>>>|<row|<cell|>|<cell|>|<cell|>>|<row|<cell|>|<cell|\<leqslant\>>|<cell|2*K<rsub|0><around|\<\|\|\>|W|\<\|\|\>>*max<rsub|1\<leqslant\>i\<leqslant\>K<rsub|0>><around*|\||<mdelta><rsub|j>|\|>.<eq-number><label|l22>>>>>>

  In view of (<reference|l21>)-(<reference|l22>), each element of
  <math|<wide|X|~><rsup|T><rsub|n><mx><rsub|\<omega\>>/m> is bounded by
  <math|2*K<rsub|0><around|\<\|\|\>|W|\<\|\|\>>*max<rsub|1\<leqslant\>i\<less\>K<rsub|0>><around*|\||<mdelta><rsub|j>|\|>>.
  The proof is complete.

  <vspace|.2in><no-indent><with|font-family|ss|A.4. Proof of Lemma 3>

  <no-indent>By the definition of <math|<wide|X|~><rsub|n>>, it follows that

  <eqnarray|<tformat|<table|<row|<cell|<wide|X|~><rsub|n><rsup|T><me><rsub|n>/<sqrt|n>=<frac|1|<sqrt|n>><around*|(|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|l>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|2|2|cell-halign|l>|<cwith|1|-1|2|2|cell-rborder|0ln>|<table|<row|<cell|<big|sum><rsub|i=1><rsup|n>>|<cell|<mx><rsub|i>\<varepsilon\><rsub|i>>>|<row|<cell|<big|sum><rsub|i=n-p<rsub|n>*m+1><rsup|n>>|<cell|<mx><rsub|i>\<varepsilon\><rsub|i>>>|<row|<cell|<big|sum><rsub|i=n-<around|(|p<rsub|n>-1|)>*m+1><rsup|n>>|<cell|<mx><rsub|i>\<varepsilon\><rsub|i>>>|<row|<cell|>|<cell|\<vdots\>>>|<row|<cell|<big|sum><rsub|i=n-m+1><rsup|n>>|<cell|<mx><rsub|i>\<varepsilon\><rsub|i>>>>>>|)>.>>>>>

  Consider the first element of <math|<wide|X|~><rsub|n><rsup|T><me><rsub|n>/<sqrt|n>>.
  By Assumption C1, for <math|j=1,\<ldots\>,q>,
  <math|<big|sum><rsub|i=1><rsup|n>x<rsup|2><rsub|i,j>/n\<rightarrow\>W<rsub|j*j>>.
  By applying Markov's inequality, we have
  <math|<big|sum><rsub|i=1><rsup|n>x<rsub|i,j>*\<varepsilon\><rsub|i>/<sqrt|n>=O<rsub|p><around|(|1|)>>.

  In the following, we show that for any <math|\<epsilon\>\<gtr\>0>, there
  exists an <math|M<rsub|\<epsilon\>>> such that

  <\equation*>
    p<rsub|n,j>*<wide|=|^>P*<around*|(|<frac|1|<sqrt|n>>*max<rsub|1\<leqslant\>k\<leqslant\>p<rsub|n>><around*|\||<big|sum><rsub|i=n-<around|(|p<rsub|n>-k+1|)>*m+1><rsup|n>x<rsub|i,j>*\<varepsilon\><rsub|i>|\|>\<gtr\>M<rsub|\<epsilon\>>|)>\<less\>\<epsilon\>.
  </equation*>

  Denote <math|\<eta\><rsub|\<ell\>,j>=<big|sum><rsub|n-<around|(|p<rsub|n>-\<ell\>+1|)>*m+1><rsup|n-<around|(|p<rsub|n>-\<ell\>|)>*m>x<rsub|i,j>*\<varepsilon\><rsub|i>>.
  Then we have

  <\equation*>
    p<rsub|n,j>=P*<around*|(|<frac|1|<sqrt|n>>*max<rsub|1\<leqslant\>t\<leqslant\>p<rsub|n>><around*|\||<big|sum><rsub|\<ell\>=1><rsup|t>\<eta\><rsub|p<rsub|n>-\<ell\>+1,j>|\|>\<gtr\>M<rsub|\<epsilon\>>|)>.
  </equation*>

  Note that for any <math|v\<gtr\>u\<gtr\>0>, by Assumption C1, we have

  <\equation*>
    V*a*r<around*|(|<big|sum><rsub|\<ell\>=u><rsup|v>\<eta\><rsub|\<ell\>,j>/<sqrt|m>|)>\<leqslant\>2*<around|(|v-u|)>*W<rsub|j*j>*\<sigma\><rsup|2>\<leqslant\>2*<around|(|v-u|)>*\<sigma\><rsup|2>*max<rsub|1\<leqslant\>j\<leqslant\>q>
    W<rsub|j*j>,
  </equation*>

  when <math|n> is large enough. By Lemma 2.1 of Lavielle (1999), it follows
  that

  <eqnarray|<tformat|<table|<row|<cell|p<rsub|n,j>>|<cell|=>|<cell|P*<around*|(|max<rsub|1\<leqslant\>t\<leqslant\>p<rsub|n>><around*|\||<big|sum><rsub|\<ell\>=1><rsup|t>\<eta\><rsub|p<rsub|n>-\<ell\>+1,j>/<sqrt|m>|\|>\<gtr\>M<rsub|\<epsilon\>>*<sqrt|n/m>|)>>>|<row|<cell|>|<cell|\<leqslant\>>|<cell|<frac|c*p<rsub|n>|M<rsup|2><rsub|\<epsilon\>>*n/m>\<leqslant\>c/M<rsup|2><rsub|\<epsilon\>>\<less\>\<epsilon\>,>>>>>

  which means that each element of vector
  <math|<wide|X|~><rsub|n><rsup|T><me><rsub|n>/<sqrt|n>> is bounded uniformly
  in probability. The proof of Lemma 3 is complete.

  <vspace|.2in><no-indent><with|font-family|ss|A.5. Proof of Theorem 3>

  <no-indent>Let <math|<mmu>=<around|(|<mmu><rsub|0><rsup|T>,<mmu><rsub|1><rsup|T>,\<ldots\>,<mmu><rsub|p<rsub|n>><rsup|T>|)><rsup|T>>
  be bounded. Put <math|<mtheta>=<mtheta><rsub|n>+<frac|<mmu>|<sqrt|n>>> and

  <\equation*>
    \<psi\><rsub|n><around|(|<mmu>|)>=<around*|\||<around*|\||<my>-<mX><rsup|<around|(|1|)>><rsub|n><around*|(|<mbeta><rsub|0>+<frac|<mmu><rsub|0>|<sqrt|n>>|)>-<big|sum><rsub|j=1><rsup|p<rsub|n>><mX><rsup|<around|(|j+1|)>><rsub|n><around*|(|<with|font-series|right|<around*|\<nobracket\>|<around*|\<nobracket\>|<around*|\<nobracket\>|<rsub|j>+<frac|<mmu><rsub|j>|<sqrt|n>>|)>|\|>|\|><rsup|2>+\<lambda\><rsub|n>*<big|sum><rsub|r=1><rsup|p<rsub|n>><frac|1|<around|\||<wide||~><rsub|r>|\|><rsup|\<nu\>>>*<around*|\||<rsub|r>+<frac|<mmu><rsub|r>|<sqrt|n>>|\|>.>|\<nobracket\>>|\<nobracket\>>|\<nobracket\>>
  </equation*>

  Let <math|<wide|<mmu>|\<breve\>><rsub|n>=arg min
  \<psi\><rsub|n><around|(|<mmu>|)>=arg min
  <around*|(|\<psi\><rsub|n><around|(|<mmu>|)><rsub|n>-\<psi\><rsub|n><around|(|<mzero>|)>|)>>.
  Thus <math|<wide|<mtheta>|\<breve\>>=<mtheta><rsub|n>+<wide|<mmu>|\<breve\>><rsub|n>/<sqrt|n>>,
  and we only need to investigate the limiting behavior of
  <math|<wide|<mmu>|\<breve\>><rsub|n>>. Write
  <math|\<psi\><rsub|n><around|(|<mmu>|)>-\<psi\><rsub|n><around|(|<mzero>|)><wide|=|^>*V<rsub|n><around|(|<mmu>|)>>,
  which can be expressed as

  <eqnarray|<tformat|<table|<row|<cell|V<rsub|n><around|(|<mmu>|)>>|<cell|=>|<cell|<around*|(|<mmu><rsup|T><around*|(|<frac|1|n>*<wide|X|~><rsub|n><rsup|T>*<wide|X|~><rsub|n>|)><mmu>-2<mmu><rsup|T><frac|<wide|X|~><rsub|n><rsup|T><me><rsub|n>|<sqrt|n>>-2<mmu><rsup|T><frac|<wide|X|~><rsub|n><rsup|T><mx><rsub|\<omega\>>|<sqrt|n>>|)>>>|<row|<cell|>|<cell|+>|<cell|<frac|\<lambda\><rsub|n>|<sqrt|n>>*<big|sum><rsub|r=1><rsup|p<rsub|n>><frac|1|<around|\||<wide||~><rsub|r>|\|><rsup|\<nu\>>>*<sqrt|n><around*|(|<around*|\||<with|font-series|right|<around*|\<nobracket\>|<around*|\<nobracket\>|<rsub|r>+<frac|<mmu><rsub|r>|<sqrt|n>>|\|>-<around|\||<rsub|r>|\|>|)>.>|\<nobracket\>>|\<nobracket\>>>>>>>

  Consider the following two cases:

  <\itemize>
    <item*|>Case I. For any <math|r\<nin\><ma><rsub|c>>,
    <math|<mmu><rsub|r>=<with|font-series|bold|0>>;

    <item*|>Case II: There are some <math|r\<nin\><ma><rsub|c>> such that
    <math|<mmu><rsub|r>\<neq\><with|font-series|bold|0>>. Denote the number
    of such <math|r>s as <math|n<rsub|c>>.
  </itemize>

  We first consider the case I. By Lemmas 1-2 and the assumption that
  <math|m/<sqrt|n>\<rightarrow\>0>, it can be shown that as
  <math|n\<to\>\<infty\>>,

  <\enumerate>
    <item*|(A1)><math|<mmu><rsup|T><around*|(|<frac|1|n>*<wide|X|~><rsub|n><rsup|T>*<wide|X|~><rsub|n>|)><mmu>=<mmu><rsub|<ma><rsub|c>><rsup|T><around*|(|<frac|1|n>*<around|(|<wide|X|~><rsub|n<ma><rsub|c>><rsup|T>*<wide|X|~><rsub|n<ma><rsub|c>>|)>|)><mmu><rsub|<ma><rsub|c>>\<rightarrow\><mmu><rsub|<ma><rsub|c>><rsup|T>\<cal-W\><rsub|<ma><rsub|c>><mmu><rsub|<ma><rsub|c>>>;

    <item*|(A2)><math|<mmu><rsup|T><wide|X|~><rsub|n><rsup|T><me>/<sqrt|n>=<mmu><rsub|<ma><rsub|c>><rsup|T><around|(|<wide|X|~><rsub|n<ma><rsub|c>><rsup|T><me>|)>/<sqrt|n>\<rightarrow\><rsub|d><mmu><rsub|<ma><rsub|c>><rsup|T><mw><rsub|<ma><rsub|c>>>,
    where <math|<mw><rsub|<ma><rsub|c>>=N*<around|(|<mzero>,\<sigma\><rsup|2>*\<cal-W\><rsub|<ma><rsub|c>>|)>>;

    <item*|(A3)><math|<mmu><rsup|T><wide|X|~><rsub|n><rsup|T><mx><rsub|\<omega\>>/<sqrt|n>\<rightarrow\>0>.
  </enumerate>

  Note that for any <math|r\<nin\><ma><rsub|c>>, the second term of
  <math|V<rsub|n><around|(|<mmu>|)>> equals to 0. Let
  <math|r\<in\><ma><rsub|c>>. By Assumption C3, it follows that
  <math|1/<around|\||<wide||~><rsub|r>|\|><rsup|\<nu\>>\<leqslant\>c<rsup|-\<nu\>>>
  in probability. Since <math|<sqrt|n><around*|\||\|<with|font-series|right|<around*|\<nobracket\>|<rsub|r>+<frac|<mmu><rsub|r>|<sqrt|n>>\|-<around|\||<rsub|r>|\|>|\|>\<leqslant\><around|\||<mmu><rsub|r>|\|>>|\<nobracket\>>>,
  and <math|<around|\||<ma><rsub|c>|\|>=K<rsub|0>>, by the assumption that
  <math|\<lambda\><rsub|n>/<sqrt|n>\<to\>0>, we have

  <\equation*>
    <frac|\<lambda\><rsub|n>|<sqrt|n>>*<big|sum><rsub|r=1><rsup|p<rsub|n>><frac|1|<around|\||<wide||~><rsub|r>|\|><rsup|\<nu\>>>*<sqrt|n><around*|(|<around*|\||<with|font-series|right|<around*|\<nobracket\>|<around*|\<nobracket\>|<rsub|r>+<frac|<mmu><rsub|r>|<sqrt|n>>|\|>-<around|\||<rsub|r>|\|>|)>\<rightarrow\><rsub|p>0,>|\<nobracket\>>|\<nobracket\>>
  </equation*>

  which, jointly with (A1)-(A3) above, implies that
  <math|V<rsub|n><around|(|<mmu>|)>\<rightarrow\><rsub|p><mmu><rsup|T><rsub|<ma><rsub|c>>\<cal-W\><rsub|<ma><rsub|c>><mmu><rsub|<ma><rsub|c>>-2<mmu><rsup|T><rsub|<ma><rsub|c>><mw><rsub|<ma><rsub|c>>>,
  as <math|n\<to\>\<infty\>>.

  We now consider the case II. By Lemmas 2-3 and the assumption that
  <math|m/<sqrt|n>\<rightarrow\>0>, it can be shown that

  <\enumerate>
    <item*|(B1)><math|<mmu><rsup|T><around*|(|<frac|1|n>*<wide|X|~><rsub|n><rsup|T>*<wide|X|~><rsub|n>|)><mmu>\<geqslant\>0>;

    <item*|(B2)><math|<mmu><rsup|T><wide|X|~><rsub|n><rsup|T><me><rsub|n>/<sqrt|n>=O<rsub|p><around|(|n<rsub|c>|)>>;

    <item*|(B3)><math|<with|math-display|true|<frac|1|n<rsub|c>><mmu><rsup|T><wide|X|~><rsub|n><rsup|T><mx><rsub|\<omega\>>/<sqrt|n>\<rightarrow\>0>>.
  </enumerate>

  As argued previously, it can also be shown that

  <\enumerate>
    <item*|(B4)><math|<frac|\<lambda\><rsub|n>|<sqrt|n>>*<big|sum><rsub|r\<in\><ma><rsub|c>><frac|1|<around|\||<wide||~><rsub|r>|\|><rsup|\<nu\>>>*<sqrt|n><around*|(|<around*|\||<with|font-series|right|<around*|\<nobracket\>|<around*|\<nobracket\>|<rsub|r>+<frac|<mmu><rsub|r>|<sqrt|n>>|\|>-<around|\||<rsub|r>|\|>|)>\<to\>0.>|\<nobracket\>>|\<nobracket\>>>
  </enumerate>

  Now let <math|r\<nin\><ma><rsub|c>>. Since
  <math|<around*|\||{r,<with|font-series|right|<around*|\<nobracket\>|<rsub|r>=<with|font-series|bold|0>,<mmu><rsub|r>\<neq\><mzero>}|\|>>|\<nobracket\>>>
  <math|=n<rsub|c>>, by Assumption C3 and the assumption that
  <math|\<lambda\><rsub|n>*<around|(|n/p<rsub|n>|)><rsup|\<nu\>/2>/<sqrt|n>\<rightarrow\>\<infty\>>,
  it follows that

  <eqnarray*|<tformat|<table|<row|<cell|>|<cell|>|<cell|<frac|1|n<rsub|c>>*<big|sum><rsub|r\<nin\><ma><rsub|c>,<with|font-series|right|<rsub|r>>=<with|font-series|bold|0>,<mmu><rsub|r>\<neq\><with|font-series|bold|0>><frac|\<lambda\><rsub|n>|<sqrt|n>>*<frac|1|<around|\||<wide||~><rsub|r>|\|><rsup|\<nu\>>>*<sqrt|n><around*|(|<around*|\||<with|font-series|right|<around*|\<nobracket\>|<rsub|r>+<frac|<mmu><rsub|r>|<sqrt|n>>|\|>-\|<tformat|<table|<row|<cell|<rsub|r><around*|\|||)>>|<cell|>|<cell|>>|<row|<cell|>|<cell|=>|<cell|<frac|1|n<rsub|c>>*<big|sum><rsub|r\<nin\><ma><rsub|c>,<rsub|r>=<with|font-series|bold|0>,<mmu><rsub|r>\<neq\><with|font-series|bold|0>><frac|\<lambda\><rsub|n>|<sqrt|n>><around*|(|<frac|n|p<rsub|n>>|)><rsup|\<nu\>/2><around|\||<mmu><rsub|r>|\|>\<times\><around*|\||<sqrt|<frac|n|p<rsub|n>>>*<wide||~><rsub|r>|\|><rsup|-\<nu\>>\<rightarrow\><rsub|p>\<infty\>,>>>>>|\<nobracket\>>|\<nobracket\>>>>>>>

  which, jointly with (B1)-(B4), implies that
  <math|V<rsub|n><around|(|<mmu>|)>\<rightarrow\><rsub|p>\<infty\>>.

  So far we have showed that

  <\equation>
    V<rsub|n><around|(|<mmu>|)>\<rightarrow\><rsub|p>V<around|(|<mmu>|)>=<choice|<tformat|<table|<row|<cell|<mmu><rsup|T><rsub|<ma><rsub|c>>\<cal-W\><rsub|<ma><rsub|c>><mmu><rsub|<ma><rsub|c>>-2<mmu><rsup|T><rsub|<ma><rsub|c>><mw><rsub|<ma><rsub|c>>,>|<cell|<text|Case
    I>,>>|<row|<cell|\<infty\>,>|<cell|<text|Case II>.>>>>><label|convex>
  </equation>

  It can be seen that <math|V> is a convex function and has a unique minimum
  at <math|<wide|<mmu>|\<breve\>>> such that
  <math|<wide|<mmu>|\<breve\>><rsub|<wide|<ma>|\<bar\>><rsub|c>>=<mzero>> and
  <math|<wide|<mmu>|\<breve\>><rsub|<ma><rsub|c>>=\<cal-W\><rsub|<ma><rsub|c>><rsup|-1><mw><rsub|<ma><rsub|c>>>.
  Since <math|V<rsub|n><around|(|\<cdummy\>|)>> is also a convex function and
  has a unique minimum denoted by <math|<wide|<mmu>|\<breve\>><rsub|n>>, by
  (<reference|convex>),

  <\equation*>
    <wide|<mmu>|\<breve\>><rsub|n>=arg min
    V<rsub|n><around|(|<mmu>|)>\<rightarrow\><rsub|p>arg min
    V<around|(|<mmu>|)>=<wide|<mmu>|\<breve\>>,
  </equation*>

  and hence,

  <\equation*>
    <around|(|<wide|<mmu>|\<breve\>><rsub|n>|)><rsub|<ma><rsub|c>>\<rightarrow\><rsub|p><wide|<mmu>|\<breve\>><rsub|<ma><rsub|c>>=\<cal-W\><rsub|<ma><rsub|c>><rsup|-1><mw><rsub|<ma><rsub|c>><space|1em><text|and><space|1em><around|(|<wide|<mmu>|\<breve\>><rsub|n>|)><rsub|<wide|<ma>|\<bar\>><rsub|c>>\<rightarrow\><rsub|p><wide|<mmu>|\<breve\>><rsub|<wide|<ma>|\<bar\>><rsub|c>>=<mzero>.
  </equation*>

  In view of the fact that <math|<mw><rsub|<ma><rsub|c>>\<sim\>N*<around|(|<mzero>,\<sigma\><rsup|2>*\<cal-W\><rsub|<ma><rsub|c>>|)>>,
  the proof is complete.

  <vspace|8pt><no-indent><with|font-series|bold|Acknowledgements>

  This work was supported by the Natural Sciences and Engineering Research
  Council of Canada. The authors thank Professor Pierre Perron for his kindly
  sharing the U.S. ex-post real interest rate data with them.

  <vspace|.2in><no-indent><with|font-series|bold|References> <vspace|5mm>

  <\description>
    <item>Chen, J., and Gupta, A.K. (2000). Parametric Statistical Change
    Point Analysis, <em|Birkhuser>.

    <item>Csrg, M, Horvath, L., (1997). Limit Theorems in Change-Point
    Analysis, <em|Chichester:Wiley>.

    <item>Davis, R.A., Huang, D., and Yao Y.C. (1995). Testing for a Change
    in the Parameter Values and Order of an Autoregressive Model, <em|The
    Annals of Statistics>, 23, 282-304.

    <item>Davis, R.A., Lee, T.C.M., and Rodriguez-Yam, G.A. (2006).
    Structural Break Estimation for Nonstationary Time Series Models,
    <em|Journal of the American Statistical Association>, 101, 223-239.

    <item>Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004).
    Least Angle Regression, <em|Annals of Statistics>, 32, 407-499.

    <item>Fan, J., and Li, R. (2001). Variable Selection via Nonconcave
    Penalized Likelihood and Its Oracle Properties, <em|Journal of the
    American Statistical Association>, 96, 1348-C1360.

    <item>Fan, J., and Lv, J. (2008). Sure Independence Screening for
    Ultrahigh Dimensional Feature Space, <em|Journal of the Royal Statistical
    Scociety>, Ser. B, 70, 849-911.

    <item>Garcia, R., and Perron, P. (1996). An Analysis of the Real Interest
    Rate under Regime Shifts, <em|The Review of Economics and Statistics>,
    78, 111-125.

    <item>Harchaoui, Z., and Levy-Leduc, C. (2008). Catching Change-Points
    with Lasso, <em|Advances in Neural Information Processing Systems>.

    <item>Huang J., Ma, S., and Zhang, C. (2008). Adaptive Lasso for Sparse
    High-dimensional Regression Models, <em|Statistica Sinica>, 18,
    1603-1618.

    <item>Hukov, M., Prkov, Z., and Steinebach, J. (2007). On the
    detection of changes in autoregressive times series I. Asymptotics,
    <em|Journal of Statistical Planning and Inference>, 137, 1243-1259.

    <item>Kim, H.-J., Yu, B., and Feuer, E.J. (2009). Selecting the Number of
    Change-Points in Segmented Line Regression, <em|Statistica Sinica>, 19,
    597-609.

    <item>Kuelbs, J., and Philipp, W. (1980). Almost Sure Invariance
    Principles for Partial Sums of Mixing <math|B>-Valued Random Variables,
    <em|The Annals of Probability>, 8, 1003-1036.

    <item>Lavielle, M. (1999). Detection of Multiple Changes in a Sequence of
    Dependent Variables, <em|Stochastic Processes and their Applications>,
    83, 79-102.

    <item>Loschi, R.H., Pontel, J.G., and Cruz, F.R.B. (2010). Multiple
    Change-Point Analysis for Linear Regression Models, <em|Chilean Journal
    of Statistics>, 1, 93-112.

    <item>Pan, J., and Chen, J. (2006). Application of Modified Information
    Criterion to Multiple Change Point Problems, <em|Journal of Multivariate
    Analysis>, 97, 2221-2241.

    <item>Tibshirani, R. (1996). Regression Shrinkage and Selection via the
    Lasso, <em|Journal of the Royal Statistical Society>, Ser. B, 58,
    267-288.

    <item>Wang, H., Li, G., and Tsai, C. (2007). Regression Coefficient and
    Autoregressive Order Shrinkage and Selection via the Lasso, <em|Journal
    of the Royal Statistical Scociety>, Ser. B, 69, 63-78.

    <item>Zhang, C., and Huang, J. (2008). The Sparsity and Bias of the Lasso
    Selection in High-dimensional Linear Regression, <em|The Annals of
    Statistics>, 36, 1567-1594.

    <item>Zhang, C. (2010). Nearly Unbiased Variable Selection Under Minimax
    Concave Penalty, <em|The Annals of Statistics>, 38, 894-942.

    <item>Zhao, P., and Yu, B. (2006). On Model Selection Consistency of
    Lasso, <em|Journal of Machine Learning Research>, 7, 2541-2567.

    <item>Zou, H. (2006). The Adaptive Lasso and Its Oracle Properties,
    <em|Journal of the American Statistical Association>, 101, 1418-1429.
  </description>
</body>