Aishell2 Quartznet15x5

Aishell2 Quartznet15x5

Logo for Aishell2 Quartznet15x5
Description
Quartznet15x5 model is a model from the Jasper family. This is trained on Ai-shell2 mandarin chinese dataset
Publisher
NVIDIA
Latest Version
4
Modified
April 4, 2023
Size
87.5 MB

Overview

QuartzNet15x5 Encoder and Decoder neural module's checkpoints available here are trained using Neural Modules toolkit trained on the ai-shell2 Mandarin Chinese dataset.

  • QuartzNet15x5-Zh-Base.nemo - a compressed tarball that contains the encoder and decoder checkpoint, as well as the associated config file.

NVIDIA’s Apex/Amp O1 optimization level was used for training on 8xV100 GPUs.

More details

Most state-of-the-art (SOTA) ASR models are extremely large; they tend to have on the order of a few hundred million parameters. This makes them hard to deploy on a large scale given current limitations of devices on the edge. Quartznet model consists of 79 layers and has a total of 18.9 million parameters, with five blocks that repeat fifteen times plus four additional convolutional layers.

The model is composed of multiple blocks with residual connections between them, trained with CTC loss. Each block consists of one or more modules with 1D time-channel separable convolutional layers, batch normalization, and ReLU layers. Model achieves near state-of-the-art accuracy on LibriSpeech and Wall Street Journal, while having fewer parameters than all competing models. Neural Modules (NeMo) toolkit makes it easy to use this model for transfer learning or fine tuning. Encoder and Decoder checkpoints trained with NeMo can be used for fine tuning on new datasets.

Two types of data augmentation techniques: speed perturbation and Cutout. Speed perturbation means additional training samples were created by slowing down or speeding up the original audio data by 10%. Cutout refers to randomly masking out small rectangles out of the spectrogram input as a regularization technique.

Documentation

Please refer to https://github.com/NVIDIA/NeMo for further documentation. https://docs.nvidia.com/deeplearning/nemo/neural-modules-release-notes/index.html

Usage example:

python examples/asr/speech2text_infer.py --asr_model=QuartzNet15x5-Zh --dataset=test.json

You can also grab this model directly from your code by including this line:

asr_model = nemo_asr.models.ASRConvCTCModel.from_pretrained(model_info='QuartzNet15x5-Zh')