We've just announced IQ AI.
Kimi is a series of large language models (LLMs) developed by the Beijing-based startup Moonshot AI. The models are noted for their large context windows and, in later versions, for their open-weight architecture and agentic intelligence capabilities, which enable them to perform complex, multi-step tasks.
The original Kimi Chatbot was launched by Moonshot AI in October 2023. A key feature that distinguished it at the time of its release was its large context window, capable of processing up to 200,000 Chinese characters in a single prompt. This capability positioned it as a strong contender in China's competitive AI market, with a focus on handling long documents and complex conversations. The model's long-context capabilities were a core part of Moonshot AI's strategy, which contributed to the company reaching a valuation of $2.5 billion by early 2024. [6] [7]
Kimi K2 was launched on July 11, 2025, by Moonshot AI, a company founded in March 2023. It is an open-weight Mixture-of-Experts (MoE) model designed with a focus on agentic intelligence. The release garnered significant attention in the AI research community, with some comparing its impact to that of DeepSeek's model release earlier in the year. [1] The model is noted for its performance in coding and reasoning benchmarks, where it matches or surpasses many contemporary open-source and proprietary models, including Western rivals like Anthropic's Claude, in non-thinking evaluations. [1] Machine-learning researcher Nathan Lambert described it as "the new best open model in the world" following its release. [1]
The emergence of Kimi K2 is viewed within the broader context of the U.S.-China AI competition, positioning Moonshot AI as a significant Chinese competitor to Western AI labs like OpenAI and Anthropic. The startup is reportedly backed by investors including Chinese technology giant Alibaba. [1]
The core design philosophy behind Kimi K2 is "agentic intelligence," which prioritizes the model's ability to act as an autonomous agent. Rather than simply responding to prompts, it is engineered to understand a user's objective, select appropriate tools (such as web browsers, code interpreters, or APIs), and execute a sequence of actions to accomplish the goal. This approach is intended to move beyond simple chat-based interactions toward more complex problem-solving. [2]
Moonshot AI has released the model in two main variants to cater to different use cases. Kimi-K2-Base is the foundational model, intended for researchers and developers who require full control for custom fine-tuning. Kimi-K2-Instruct is a post-trained version optimized for general-purpose chat and ready-to-use agentic applications. Both the model weights and the associated code are released under a Modified MIT License, promoting open research and development. [3]
Kimi K2 is built on a Mixture-of-Experts (MoE) architecture, which allows for a very large number of total parameters while keeping the number of activated parameters computationally manageable for each inference. This design enhances efficiency and scalability. The model has 1 trillion total parameters, with 32 billion activated per token. [4]
Key architectural specifications include:
These specifications are detailed in the project's official technical documentation. [3]
Kimi K2 was pre-trained on a dataset of 15.5 trillion tokens. A significant technical innovation during its development was the creation of the MuonClip optimizer. This optimizer was developed to address training instability, a common challenge when scaling large models, particularly the issue of "exploding attention logits." [2]
The MuonClip optimizer builds upon the Muon optimizer by introducing a technique called "qk-clip." This method stabilizes training by directly rescaling the weight matrices of the query (q) and key (k) projections after each update. By controlling the scale of attention logits at their source, MuonClip effectively prevented loss spikes, enabling a stable pre-training process across the entire 15.5T token dataset. [4]
The model's advanced agentic functions were developed through a multi-stage post-training process that focused on tool use and reinforcement learning.
To teach the model how to use tools effectively, the development team created a large-scale data synthesis pipeline. This system, inspired by the ACEBench framework, simulates complex, real-world scenarios involving hundreds of domains and thousands of tools. In these simulations, AI agents interact with simulated environments and user agents to generate realistic, multi-turn, tool-use data. An LLM-based judge then evaluates these interactions against predefined rubrics to filter for high-quality examples, which are used for training. [2]
Kimi K2's training incorporates a general reinforcement learning (RL) system designed to handle tasks with both verifiable rewards (e.g., solving a math problem) and non-verifiable rewards (e.g., writing a quality report). For non-verifiable tasks, the system employs a self-judging mechanism where the model acts as its own critic, providing scalable, rubric-based feedback. This critic is continuously improved and calibrated using on-policy rollouts from tasks with verifiable rewards, ensuring its evaluation accuracy remains high. This process allows the model to learn from a broader range of interactions, freeing it from the limitations of human-annotated data. [4]
According to its developers and independent analysis, Kimi K2 demonstrates state-of-the-art performance among open-source, non-thinking models and is highly competitive with leading proprietary models from Western labs. [1] [8] The model particularly excels in agentic tasks, coding, and mathematics.
In agentic coding evaluations, Kimi K2 achieved a score of 65.8% on SWE-bench Verified (single attempt), surpassing GPT-4.1 (54.6%) and performing comparably to Claude 4 Opus (72.5%). It also scored 47.3% on SWE-bench Multilingual. For general tool use, its score of 76.5% on AceBench (English) was competitive with Claude 4 Opus (75.6%) and GPT-4.1 (80.1%). The model shows strong performance on coding benchmarks, achieving a Pass@1 rate of 53.7% on LiveCodeBench v6, which is higher than both Claude 4 Opus (47.4%) and GPT-4.1 (44.7%). On OJBench, its score of 27.1% also exceeded its proprietary counterparts. In mathematics and STEM, Kimi K2 scored 49.5% on the AIME 2025 benchmark, significantly outperforming Claude 4 Opus (33.9%) and GPT-4.1 (37.0%). On the GPQA-Diamond benchmark, its score of 75.1% was on par with Claude 4 Opus (74.9%) and ahead of GPT-4.1 (66.3%). These results, detailed in the model's technical report, position Kimi K2 as a highly capable model, especially for tasks related to software engineering and autonomous problem-solving. [2] [4]
Kimi K2 is designed to autonomously use tools to complete complex user requests. It can interpret a high-level task description, determine the necessary steps, and execute them using integrated tools like code interpreters or web browsers without requiring a pre-scripted workflow.
One prominent example demonstrated by Moonshot AI is a salary data analysis task. Given a dataset and a high-level prompt, Kimi K2 performed a 16-step process using an IPython tool. This included loading and filtering data, categorizing remote work ratios, performing statistical analyses like two-way ANOVA and t-tests, generating multiple visualizations (e.g., violin plots, box plots, bar charts), and summarizing the findings. The final output was a complete, interactive HTML webpage presenting the analysis and an integrated simulator for personalized recommendations. [2]
Other demonstrated use cases include:
These examples highlight the model's ability to orchestrate multiple tools to achieve a complex, multi-faceted goal. [2]
Kimi K2 is accessible through several channels:
Kimi-K2-Base
and Kimi-K2-Instruct
are available on Hugging Face. They can be deployed on-premises or in the cloud using inference engines such as vLLM, SGLang, KTransformers, and TensorRT-LLM.The model and its source code are released under the Modified MIT License. [3]
The developers have identified several limitations in the initial release of Kimi K2. The model may generate an excessive number of tokens when faced with difficult reasoning tasks or ambiguously defined tools, which can sometimes result in incomplete outputs. In certain scenarios, enabling tool use can degrade performance on other tasks. Furthermore, for complex software development projects, the model performs better within an agentic framework than with simple one-shot prompting. Vision capabilities were not supported at the time of its launch, though future updates are planned to address these limitations. [2]