Tencent’s tech team has optimized DeepSeek’s open-source DeepEP communication framework,Bedside Sailors movie (1976) boosting its performance across different network environments, according to the Chinese AI startup. Testing showed a 100% improvement on RoCE networks and a 30% gain on InfiniBand (IB), offering more efficient solutions for AI model training. On GitHub, DeepSeek acknowledged the Chinese tech giant’s contribution had led to a “huge speedup.” DeepEP is a communication library tailored for a mixture of experts (MoE) and expert parallelism (EP), supporting high-throughput, low-latency GPU kernels and low-precision computing, including FP8. Tencent’s Starlink Networking team identified two main bottlenecks: underutilized dual-port NIC bandwidth and CPU control latency. After targeted optimizations, performance doubled on RoCE and improved by 30% on IB. The enhanced framework is now fully open-source and has been successfully deployed in training Tencent’s Hunyuan large model, demonstrating strong versatility within environments built on Tencent’s Starlink and H20 servers, Chinese tech media outlet iThome reported. [iThome, in Chinese]
Related Articles
2025-06-26 08:57
2084 views
Apple is reportedly still working on smart glasses of some kind
Apple's augmented reality glasses may be dead (or at least on hold), but it sounds like the company
Read More
2025-06-26 08:31
222 views
Best gift card deals: Hulu, Lyft, DoorDash, Meta Quest, Instacart, and more
The best gift card deals at a glance as of Dec. 10: OUR TOP PICK
Read More
2025-06-26 07:29
2163 views
Reddit launches 'Answers' AI search tool to help solve your problems
Reddit is launching an AI search tool to simplify all the instances in which you need to find an ans
Read More