The gather command is used to collect dtmf or speech input.

  "verb": "gather",
  "actionHook": "",
  "input": ["digits", "speech"],
  "bargein": true,
  "dtmfBargein": true,
  "finishOnKey": "#",
  "numDigits": 5,
  "timeout": 8,
  "recognizer": {
    "vendor": "Google",
    "language": "en-US",
    "hints": ["sales", "support"],
    "hintsBoost": 10
  "say": {
    "text": "To speak to Sales press 1 or say Sales.  To speak to customer support press 2 or say Support",
    "synthesizer": {
      "vendor": "Google",
      "language": "en-US"

You can use the following options in the gather command:

option description required
actionHook Webhook POST to invoke with the collected digits or speech. The payload will include a 'speech' or 'dtmf' property along with the standard attributes. See below for more detail. yes
bargein allow speech bargein, i.e. kill audio playback if caller begins speaking no
dtmfBargein allow dtmf bargein, i.e. kill audio playback if caller enters dtmf no
finishOnKey Dmtf key that signals the end of input no
input Array, specifying allowed types of input: ['digits'], ['speech'], or ['digits', 'speech']. Default: ['digits'] no
interDigitTimeout Amount of time to wait between digits after minDigits have been entered. no
listenDuringPrompt if false, do not listen for user speech until say or play has completed. Defaults to true no
minBargeinWordCount if bargein is true, only kill speech when this many words are spoken. Defaults to 1 no
minDigits Minimum number of dtmf digits expected to gather. Defaults to 1. no
maxDigits Maximum number of dtmf digits expected to gather no
numDigits Exact number of dtmf digits expected to gather no
partialResultHook Webhook to send interim transcription results to. Partial transcriptions are only generated if this property is set. no
play nested play Command that can be used to prompt the user no
recognizer.vendor Speech vendor to use (google, aws, or microsoft) no
recognizer.language Language code to use for speech detection. Defaults to the application level setting no
recognizer.vad.enable If true, delay connecting to cloud recognizer until speech is detected no
recognizer.vad.voiceMs If vad is enabled, the number of milliseconds of speech required before connecting to cloud recognizer no
recognizer.vad.mode If vad is enabled, this setting governs the sensitivity of the voice activity detector; value must be between 0 to 3 inclusive (lower numbers mean more sensitivity, i.e. more likely to return a false positive). Default: 2 no
recognizer.hints (google and microsoft only) Array of words or phrases to assist speech detection no
recognizer.hintsBoost (google only) A value between 0 to 20 inclusive; higher number means assign more weight to the hints no
recognizer.altLanguages (google only) An array of alternative languages that the speaker may be using no
recognizer.punctuation (google only) Enable automatic punctuation no
recognizer.profanityFilter (google only) If true, filter profanity from speech transcription . Default: no no
recognizer.vocabularyName (aws only) The name of a vocabulary to use when processing the speech. no
recognizer.vocabularyFilterName (aws only) The name of a vocabulary filter to use when processing the speech. no
recognizer.filterMethod (aws only) The method to use when filtering the speech: remove, mask, or tag. no
recognizer.profanityOption (microsoft only) masked, removed, or raw. Default: raw no
recognizer.outputFormat (microsoft only) simple or detailed. Default: simple no
recognizer.requestSnr (microsoft only) Request signal to noise information no
recognizer.initialSpeechTimeoutMs (microsoft only) Initial speech timeout in milliseconds no
recognizer.azureServiceEndpoint (microsoft only) URI of a custom speech endpoint to connect to no
recognizer.asrTimeout timeout value for continuous ASR feature no
recognizer.asrDtmfTerminationDigit DMTF key that terminates continuous ASR feature no
say nested say Command that can be used to prompt the user no
timeout The number of seconds of silence or inaction that denote the end of caller input. The timeout timer will begin after any nested play or say command completes. Defaults to 5 no

In the case of speech input, the actionHook payload will include a speech object with the response from Google speech:

"speech": {
			"stability": 0,
			"is_final": true,
			"alternatives": [{
				"confidence": 0.858155,
				"transcript": "sales please"

In the case of digits input, the payload will simple include a digits property indicating the dtmf keys pressed:

"digits": "0276"

Note: an HTTP POST will be used for both the action and the partialResultCallback since the body may need to contain nested JSON objects for speech details.

Note: the partialResultCallback web callback should not return content; any returned content will be discarded.

Prev: enqueue Next: hangup