Showing 10 open source projects for "spectrogram"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 1
    Bert-VITS2

    Bert-VITS2

    VITS2 backbone with multilingual-bert

    ...The core idea is to use BERT-style contextual embeddings for text encoding while relying on a refined VITS2 architecture for acoustic generation and vocoding. The repository includes everything needed to train, fine-tune, and run the model, from configuration files to preprocessing scripts, spectrogram utilities, and training entrypoints for multi-GPU and multi-node setups. It provides emotional modeling through “emo embeddings,” allowing voices to be conditioned on different affective states during synthesis. Releases include optimizations for Japanese and English alignment, expanded training data, spec caching and pre-generation tools, as well as ONNX export for more lightweight inference deployments.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    vits_chinese

    vits_chinese

    Best practice TTS based on BERT and VITS

    ...By customizing or porting VITS for Chinese, this project aims to produce high-quality TTS outputs in a language that can be challenging due to tones, pronunciation variability, and prosody. The repository offers full training and inference pipelines: preprocessing, mel-spectrogram generation, training scripts, and audio synthesis. For users who don’t train their own models, the project provides pre-trained checkpoints (or instructions) and expects integration with a vocoder during speech synthesis.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Demucs

    Demucs

    Code for the paper Hybrid Spectrogram and Waveform Source Separation

    Demucs (Deep Extractor for Music Sources) is a deep-learning framework for music source separation—extracting individual instrument or vocal tracks from a mixed audio file. The system is based on a U-Net-like convolutional architecture combined with recurrent and transformer elements to capture both short-term and long-term temporal structure. It processes raw waveforms directly rather than spectrograms, allowing for higher-quality reconstruction and fewer artifacts in separated tracks. The...
    Downloads: 113 This Week
    Last Update:
    See Project
  • 4
    Riffusion

    Riffusion

    Real-time music generation using stable diffusion techniques AI

    Riffusion (hobby) is a Python-based open source library designed for real-time music and audio generation using stable diffusion techniques. Riffusion (hobby) works by generating and manipulating spectrogram images, which are then converted into playable audio clips, effectively bridging image-based diffusion models with sound synthesis. It implements a diffusion pipeline that supports prompt interpolation, allowing smooth transitions between different musical styles or prompts over time. Riffusion (hobby) serves as the core implementation for audio and image processing, providing essential building blocks for generating music from text prompts. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8 Monitoring Tools in One APM. Install in 5 Minutes. Icon
    8 Monitoring Tools in One APM. Install in 5 Minutes.

    Errors, performance, logs, uptime, hosts, anomalies, dashboards, and check-ins. One interface.

    AppSignal works out of the box for Ruby, Elixir, Node.js, Python, and more. 30-day free trial, no credit card required.
    Start Free
  • 5
    DiffSinger

    DiffSinger

    Singing Voice Synthesis via Shallow Diffusion Mechanism

    ...The method introduces a “shallow diffusion” mechanism: instead of diffusing over many steps, generation begins at a shallow step determined adaptively, which leverages prior knowledge learned by a simple mel-spectrogram decoder and speeds up inference.
    Downloads: 44 This Week
    Last Update:
    See Project
  • 6
    WaveRNN

    WaveRNN

    WaveRNN Vocoder + TTS

    ...A quick_start.py script allows users to immediately synthesize example sentences from a pretrained model and inspect both generated audio and attention plots. For custom TTS, the project guides you through training Tacotron, forcing GTA spectrogram export when desired, training WaveRNN with or without GTA, and then running joint generation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    TensorFlowTTS

    TensorFlowTTS

    Real-Time State-of-the-art Speech Synthesis for Tensorflow 2

    ...The library supports multiple languages (English, French, Korean, Chinese, German, etc.) and is relatively easy to adapt to new languages. With integrated vocoder + mel-spectrogram generation pipelines, pre-trained models, and fairly flexible architecture, TensorFlowTTS is a great off-the-shelf and extensible TTS engine for applications ranging from voice assistants to content generation or accessibility tools.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    U-Net Fusion RFI

    U-Net Fusion RFI

    U-Net for RFI Detection based on @jakeret's implementation

    See original code here: https://github.com/jakeret/tf_unet Currently this project is based on Tensorflow 1.13 code base and there are no plans to transfer to TF version 2. The primary improvements to this code base include a training and evaluation framework, along with a fusion based approach to detection, combining a number of models (currently hard coded to two trained models) along with Sum Threshold as an additional "expert." Additional work is being done to add custom layers to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Transformer TTS

    Transformer TTS

    Implementation of a Transformer based neural network

    TransformerTTS is an implementation of a non-autoregressive Transformer-based neural network for text-to-speech, built with TensorFlow 2. It takes inspiration from architectures like FastSpeech, FastSpeech 2, FastPitch, and Transformer TTS, and extends them with its own aligner and forward models. The system separates alignment learning and acoustic modeling: an autoregressive Transformer is used as an aligner to extract phoneme-to-frame durations, while a non-autoregressive...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Go from Code to Production URL in Seconds Icon
    Go from Code to Production URL in Seconds

    Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try it free
  • 10
    DC-TTS

    DC-TTS

    TensorFlow Implementation of DC-TTS: yet another text-to-speech model

    ...It follows the “Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention” paper, but the author adapts and extends the design to make it practical for real experiments. The model is split into two networks: Text2Mel, which maps text to mel-spectrograms, and SSRN (spectrogram super-resolution network), which converts low-resolution mel-spectrograms into high-resolution magnitude spectrograms suitable for waveform synthesis. Training scripts, data loaders, and hyperparameter configurations are provided to reproduce results on several datasets, including LJ Speech for English, a Korean single-speaker dataset, and audiobook data from Nick Offerman and Kate Winslet.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB