GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
An AI-powered security review GitHub Action using Claude
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Open-source multi-speaker long-form text-to-speech model
Capable of understanding text, audio, vision, video
A SOTA open-source image editing model
Tool for exploring and debugging transformer model behaviors
Qwen-Image is a powerful image generation foundation model
Qwen3-TTS is an open-source series of TTS models
Generate Any 3D Scene in Seconds
OpenTinker is an RL-as-a-Service infrastructure for foundation models
OCR expert VLM powered by Hunyuan's native multimodal architecture
GLM-4 series: Open Multilingual Multimodal Chat LMs
Tongyi Deep Research, the Leading Open-source Deep Research Agent
FAIR Sequence Modeling Toolkit 2
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Open-weight, large-scale hybrid-attention reasoning model
ChatGPT interface with better UI
Renderer for the harmony response format to be used with gpt-oss
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
State-of-the-art (SoTA) text-to-video pre-trained model
Open-source framework for intelligent speech interaction
Block Diffusion for Ultra-Fast Speculative Decoding
Chat & pretrained large audio language model proposed by Alibaba Cloud