{"id":82472,"date":"2025-12-18T17:03:31","date_gmt":"2025-12-18T09:03:31","guid":{"rendered":"https:\/\/aicats.wiki\/?p=82472"},"modified":"2025-12-18T17:03:38","modified_gmt":"2025-12-18T09:03:38","slug":"fp8%e6%b7%b1%e5%ba%a6%e8%a7%a3%e6%9e%90%ef%bc%9aai%e7%ae%97%e5%8a%9b%e6%99%82%e4%bb%a3%e7%9a%84%e9%ab%98%e6%95%88%e4%bd%8e%e8%80%97%e6%96%b0%e9%81%b8%e6%93%87%ef%bc%8c%e9%96%8b%e7%99%bc%e8%80%85","status":"publish","type":"post","link":"https:\/\/aicats.wiki\/tw\/2025\/12\/18\/82472-html","title":{"rendered":"FP8\u6df1\u5ea6\u89e3\u6790\uff1aAI\u7b97\u529b\u6642\u4ee3\u7684\u9ad8\u6548\u4f4e\u8017\u65b0\u9078\u64c7\uff0c\u958b\u767c\u8005\u5982\u4f55\u907f\u958b\u6838\u5fc3\u6548\u80fd\u9677\u9631\uff1f"},"content":{"rendered":"<ul class=\"wp-block-list\">\n<li><strong>FP8\uff088\u4f4d\u5143\u6d6e\u9ede\u6578\uff09\u4f4e\u7cbe\u5ea6\u683c\u5f0f\u6210\u70ba<a class=\"external\" href=\"https:\/\/aicats.wiki\/tw\/tag\/ai\" title=\"\u67e5\u770b\u8207 AI \u76f8\u95dc\u7684\u6587\u7ae0\" target=\"_blank\">AI<\/a>\u9ad8\u7b97\u529b\u548c\u4f4e\u8017\u80fd\u7684\u6700\u4f73\u9078\u64c7<\/strong>\uff0c\u9010\u6b65\u88abNVIDIA\u3001AMD\u7b49\u6676\u7247\u539f\u751f\u652f\u63f4\u3002<\/li>\n\n\n\n<li>\u6587\u7ae0\u8a73\u76e1\u5206\u6790<span class=\"strong\">FP8\u7684\u539f\u5247\u3001\u512a\u52e2\u8207\u98a8\u96aa<\/span>\uff0c\u8207BF16\u3001FP16\u3001FP32\u3001INT4\u7b49\u4e3b\u6d41\u683c\u5f0f\u6bd4\u8f03\u3002<\/li>\n\n\n\n<li>\u63d0\u4f9b<span class=\"strong\">\u5be6\u7528\u7684\u6df7\u5408\u7cbe\u6e96\u5ea6\u8a13\u7df4\u5de5\u7a0b\u843d\u5730\u65b9\u6848\u53ca\u907f\u5751\u6e05\u55ae<\/span>\uff0c\u5e6b\u52a9\u958b\u767c\u8005\u898f\u907f\u6548\u80fd\u8207\u6536\u6582\u9677\u9631\u3002<\/li>\n\n\n\n<li>\u76e4\u9ede\u4e86\u570b\u5167\u5916FP8\u5728\u4e3b\u6d41\u5927\u6a21\u578b\u8207\u7522\u696d\u93c8\u4e2d\u7684\u6700\u65b0\u61c9\u7528\u8207\u5de5\u5177\u3002<\/li>\n\n\n\n<li><strong>\u958b\u767c\u8005\u80fd\u900f\u904e\u672c\u6587\u638c\u63e1FP8\u9ad8\u6548\u90e8\u7f72\u8207\u98a8\u96aa\u8abf\u512a\u5be6\u6230\u65b9\u6cd5<\/strong>\uff0c\u52a9\u529b\u5927\u6a21\u578b\u9ad8\u8cea\u4f4e\u8017\u843d\u5730\u3002<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"624\" height=\"351\" src=\"https:\/\/aicats.wiki\/wp-content\/uploads\/2025\/12\/image-226.png\" alt=\"FP8\u6df1\u5ea6\u89e3\u6790\uff1aAI\u7b97\u529b\u6642\u4ee3\u7684\u9ad8\u6548\u4f4e\u8017\u65b0\u9078\u64c7\uff0c\u958b\u767c\u8005\u5982\u4f55\u907f\u958b\u6838\u5fc3\u6548\u80fd\u9677\u9631\uff1f\" class=\"wp-image-84178\" style=\"width:1139px;height:auto\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">AI\u9ad8\u901f\u767c\u5c55\u4e0b\u7684\u7b97\u529b\u74f6\u9838\u8207FP8\u7684\u5d1b\u8d77<\/h2>\n\n\n\n<p>\u96a8\u8457\u5927\u578bAI\u6a21\u578b\u53ca\u6df1\u5ea6\u5b78\u7fd2\u7684\u52a0\u901f\u767c\u5c55\uff0c\u5168\u884c\u696d\u9677\u5165\u7b97\u529b\u8207\u80fd\u8017\u7684\u96d9\u91cd\u300c\u7126\u616e\u300d\u3002<strong>\u5982\u4f55\u5728\u78ba\u4fdd\u6a21\u578b\u80fd\u529b\u7684\u540c\u6642\uff0c\u6700\u5927\u5e45\u5ea6\u63d0\u5347\u786c\u9ad4\u6548\u7387\uff0c\u964d\u4f4e\u8a13\u7df4\u63a8\u7406\u6210\u672c\uff1f FP8\uff088\u4f4d\u6d6e\u9ede\u6578\uff09<\/strong>\uff0c\u6b63\u6210\u70baAI\u4f01\u696d\u548c\u958b\u767c\u8005\u95dc\u6ce8\u7684\u300c\u65b0\u5bf5\u300d\u3002\u5176\u512a\u9ede\u4ee5\u53ca\u6f5b\u5728\u98a8\u96aa\u90fd\u88ab\u5ee3\u6cdb\u8a0e\u8ad6\uff0c\u6700\u524d\u6cbf\u6676\u7247\u5ee0\u5546\u5982<strong>NVIDIA Hopper\u67b6\u69cb<\/strong>\u3001AMD MI300\u90fd\u5df2\u539f\u751f\u652f\u63f4FP8\u683c\u5f0f\uff0c\u63a8\u52d5AI\u7522\u696d\u9081\u5411\u66f4\u6709\u6548\u7387\u3001\u7d93\u6fdf\u7684\u65b0\u7d00\u5143\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img decoding=\"async\" width=\"1920\" height=\"1000\" src=\"https:\/\/aicats.wiki\/wp-content\/uploads\/2025\/12\/image-227.jpg\" alt=\"FP8 nvidia\u90e8\u843d\u683c\u4ecb\u7d39\" class=\"wp-image-84181\" style=\"aspect-ratio:1.9200145676370837;width:1319px;height:auto\"\/><figcaption class=\"wp-element-caption\">\u5716\uff0f<a href=\"https:\/\/developer.nvidia.cn\/blog\/fp8-precision-performance\/\" title=\"\" target=\"_blank\"  rel=\"nofollow noopener\"  class=\"external\" >FP8 nvidia\u90e8\u843d\u683c\u4ecb\u7d39<\/a><\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">FP8\u53ca\u4e3b\u6d41\u8cc7\u6599\u7cbe\u5ea6\u683c\u5f0f\u5c0d\u6bd4<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">\u8cc7\u6599\u683c\u5f0f\u6982\u89bd<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>\u683c\u5f0f<\/th><th>\u4f4d\u5143\u5bec<\/th><th>\u7cbe\u78ba\u5ea6<\/th><th>\u52d5\u614b\u7bc4\u570d<\/th><th>\u6548\u80fd<\/th><th>\u4e3b\u8981\u61c9\u7528\u5834\u666f<\/th><\/tr><\/thead><tbody><tr><td><strong>FP8<\/strong><\/td><td>8<\/td><td>\u4f4e-\u4e2d<\/td><td>\u4e2d-\u9ad8<\/td><td>\u6975\u9ad8<\/td><td>\u63a8\u7406\u3001\u6df7\u5408\u7cbe\u6e96\u5ea6\u8a13\u7df4<\/td><\/tr><tr><td><strong>BF16<\/strong><\/td><td>16<\/td><td>\u4e2d\u7b49<\/td><td>\u9ad8<\/td><td>\u9ad8<\/td><td>\u5927\u6a21\u578b\u8a13\u7df4<\/td><\/tr><tr><td><strong>FP32<\/strong><\/td><td>32<\/td><td>\u6700\u9ad8<\/td><td>\u6975\u9ad8<\/td><td>\u4f4e<\/td><td>\u79d1\u5b78\u8a08\u7b97\u3001\u7cbe\u7149\u8a13\u7df4<\/td><\/tr><tr><td><strong>INT4<\/strong><\/td><td>4<\/td><td>\u6975\u4f4e<\/td><td>\u6975\u4f4e<\/td><td>\u6975\u9ad8<\/td><td>\u6975\u9650\u91cf\u5316\u3001\u908a\u7de3AI<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><strong>FP8\u5728\u9ad8\u541e\u5410\u7b97\u529b\u548c\u8d85\u4f4e\u5132\u5b58\u9700\u6c42\u7684\u8cfd\u9053\u4e0a\u6210\u70ba\u6027\u50f9\u6bd4\u64d4\u7576<\/strong>\uff0c\u4f46\u5176\u5e36\u4f86\u7684<strong>\u7cbe\u6e96\u5ea6\u654f\u611f\u6311\u6230\u3001\u786c\u9ad4\u9069\u914d\u8207\u6548\u80fd\u9677\u9631<\/strong>\u4e5f\u5728\u8003\u9a57\u958b\u767c\u8005\u5718\u968a\u7684\u5de5\u7a0b\u529f\u529b\u3002<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">FP8\u6df1\u5ea6\u539f\u7406\u8207\u843d\u5730\u7d30\u7bc0<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">\u4ec0\u9ebc\u662fFP8\uff1f\u70ba\u4ec0\u9ebc\u95dc\u9375\uff1f<\/h3>\n\n\n\n<p><strong>FP8\uff088-bit Floating Point\uff09<\/strong>\u662f\u300c\u7b2c\u4e09\u4ee3AI\u4f4e\u7cbe\u5ea6\u8a13\u7df4\u300d\u6280\u8853\u7684\u4ee3\u8868\uff0c\u7d93\u5178\u683c\u5f0f\u6709<strong>E4M3<\/strong>\uff084\u4f4d\u6578\u6307\u6578\u30013\u4f4d\u6578\u5c3e\u6578\uff09\u8207<strong>E5M2<\/strong>\uff085\u4f4d\u6307\u6578\u30012\u4f4d\u5c3e\u6578\uff09\u3002\u76f8\u8f03\u65bcFP16\u3001BF16\u7b49\u4e2d\u7cbe\u5ea6\u683c\u5f0f\uff0cFP8\u4ee5\u6bcf\u53c3\u65788\u4f4d\u5143\u6975\u5ea6\u7cbe\u7c21\u5b58\u5132\uff0c\u540c\u6642\u70ba\u901a\u7528\u6df1\u5ea6\u795e\u7d93\u7db2\u8def\u904b\u7b97\uff08\u5982\u77e9\u9663\u4e58\u3001\u5377\u7a4d\uff09\u63d0\u4f9bTensor Core\u7d1a\u52a0\u901f\u3002<\/p>\n\n\n\n<p><small>\u53c3\u8003\uff1a<a href=\"https:\/\/developer.nvidia.com\/zh-cn\/blog\/fp8-challenges-best-practices\/\" target=\"_blank\"  rel=\"nofollow noopener\"  class=\"external\" >https:\/\/developer.nvidia.com\/zh-cn\/blog\/fp8-challenges-best-practices\/<\/a><\/small><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">FP8\u7684\u4e3b\u8981\u512a\u52e2<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>\u8d85\u4f4e\u8a18\u61b6\u9ad4\u4f54\u7528<\/strong>\uff1a\u53c3\u6578\u5132\u5b58\u8207\u901a\u8a0a\u983b\u5bec\u6d88\u8017\u8f03FP16\/32\u6e1b\u534a\u4e43\u81f3\u56db\u5206\u4e4b\u4e00\uff0c\u5927\u5e45\u63d0\u9ad8\u4f3a\u670d\u5668\u541e\u5410\u7387\u3002<\/li>\n\n\n\n<li><strong>Tensor Core\u52a0\u901f<\/strong>\uff1a\u5982NVIDIA Hopper\u7b49\u786c\u9ad4\u4e0b\uff0cFP8\u77e9\u9663\u904b\u7b97\u541e\u5410\u7387\u70baFP16\u76842\u500d\uff0c\u6709\u6548\u7e2e\u77ed\u8a13\u7df4\u8207\u63a8\u7406\u6642\u9593\u3002<\/li>\n\n\n\n<li><strong>\u63a8\u7406-\u8a13\u7df4\u4e00\u81f4\u6027\u63d0\u5347<\/strong>\uff1a\u6a21\u578b\u82e5\u7528FP8\u8a13\u7df4\uff0c\u63a8\u7406\u7aef\u53ef\u76f4\u63a5\u7e7c\u627f\u6b0a\u91cd\uff0c\u6e1b\u5c11\u5f8c\u91cf\u5316\u908f\u8f2f\u8907\u96dc\u5ea6\u3002<\/li>\n\n\n\n<li><strong>\u80fd\u8017\u8207\u6210\u672c\u512a\u5316<\/strong>\uff1a\u540c\u7b49\u786c\u9ad4\u8cc7\u6e90\u4e0b\u8a13\u7df4\u66f4\u5927\u6a21\u578b\u3001\u66f4\u5feb\u6a21\u578b\uff0c\u5c24\u5176\u9069\u7528\u65bcTransformer\u3001LLM\u7b49\u5927\u6a21\u578b\u3002<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"1920\" height=\"1008\" src=\"https:\/\/aicats.wiki\/wp-content\/uploads\/2025\/12\/image-227-1.jpg\" alt=\"nvidia\u6280\u8853\u4ecb\u7d39\" class=\"wp-image-84186\"\/><figcaption class=\"wp-element-caption\">\u5716\uff0f<a href=\"https:\/\/developer.nvidia.cn\/blog\/fp8-challenges-best-practices\/\" title=\"\" target=\"_blank\"  rel=\"nofollow noopener\"  class=\"external\" >nvidia\u6280\u8853\u4ecb\u7d39<\/a><\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">FP8\u7684\u95dc\u9375\u9650\u5236\u53ca\u98a8\u96aa<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>\u6578\u503c\u7a69\u5b9a\u6027\u96e3\u984c<\/strong>\uff1a\u5c3e\u6578\u3001\u6307\u6578\u4f4d\u5927\u5e45\u964d\u4f4e\uff0c\u6975\u7aef\u6578\u503c\u8207\u7570\u5e38\u6536\u6582\u98a8\u96aa\u986f\u8457\u63d0\u5347\uff0c\u5bb9\u6613\u51fa\u73feloss spike\u7b49\u8a13\u7df4\u4e0d\u7a69\u5b9a\u73fe\u8c61\u3002<\/li>\n\n\n\n<li><strong>\u7b97\u5b50\u8207\u6a21\u578b\u654f\u611f\u6027<\/strong>\uff1a\u5982Attention\u3001\u6b78\u4e00\u5316\uff08LayerNorm\u3001RMSNorm\uff09\u7b49\u5c0d\u7cbe\u5ea6\u6975\u5ea6\u654f\u611f\uff0c\u904e\u5ea6\u58d3\u7e2e\u53ef\u80fd\u5c0e\u81f4\u640d\u5931\u7cbe\u5ea6\u969c\u7919\u6536\u6582\u3002<\/li>\n\n\n\n<li><strong>\u786c\u9ad4\u76f8\u5bb9\u6027\u8981\u6c42\u9ad8<\/strong>\uff1a\u9700\u6700\u65b0GPU\uff08\u5982NVIDIA H100\u3001A100\u68af\u5ea6\u4ee5\u4e0a\uff09\u53ca\u65b0\u4e00\u4ee3AI\u8a13\u7df4\u6846\u67b6\u652f\u63f4FP8\u5168\u93c8\u8def\u6df7\u5408\u904b\u7b97\u3002<\/li>\n\n\n\n<li><strong>\u5de5\u7a0b\u7dad\u904b\u8907\u96dc\u5ea6\u63d0\u5347<\/strong>\uff1a\u9700\u4f9d\u8cf4\u8907\u96dc\u7684mix precision policy\uff08\u5982Per-Tensor Scaling\u3001Delayed Scaling\u7b49\uff09\u5be6\u73fe\u5408\u7406\u6578\u503c\u52d5\u614b\u7bc4\u570d\u63a7\u5236\uff0c\u958b\u767c\u8005\u8abf\u512a\u6210\u672c\u4e0a\u5347\u3002<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">FP8\u6df7\u5408\u7cbe\u5ea6\u8a13\u7df4\u7684\u5de5\u7a0b\u5be6\u73fe\u8207\u6700\u4f73\u5be6\u8e10<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">\u6df7\u5408\u7cbe\u6e96\u5ea6\u8a13\u7df4\uff1aO1+O2\u6a21\u5f0f<\/h3>\n\n\n\n<p><strong>\u6df7\u5408\u7cbe\u6e96\u5ea6\u8a13\u7df4\uff08Mixed Precision Training\uff09<\/strong>\u662f\u5be6\u73feFP8\u843d\u5730\u7684\u95dc\u9375\u6a5f\u5236\u3002\u4e3b\u6d41\u6846\u67b6\uff08PyTorch\u3001TF\u7b49\uff09\u901a\u5e38\u652f\u63f4AMP\uff08Automatic Mixed Precision\uff09\uff0c\u4f46\u5728FP8\u5834\u666f\u4e0b\u9700\u63a1\u7528\u66f4\u7d30\u7dfb\u7684<strong>O1+O2\u7b56\u7565<\/strong>\uff1a<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>\u767d\u540d\u55ae\u7b97\u5b50FP8\u4f4e\u7cbe\u6e96\u5ea6<\/strong>\uff1a\u5982\u5927\u578b\u77e9\u9663\u4e58\uff08MatMul\uff09\u3001\u5927\u5377\u7a4d\u7b49\u63a1\u7528FP8\u3002<\/li>\n\n\n\n<li><strong>\u9ed1\u540d\u55ae\u7b97\u5b50\u9ad8\u7cbe\u6e96\u5ea6\u56de\u9000\uff08BF16\/FP32\uff09<\/strong>\uff1a\u5982LayerNorm\u3001Softmax\u3001Embedding\u7b49\u7cbe\u6e96\u5ea6\u8981\u6c42\u6975\u9ad8\u7684\u74b0\u7bc0\u3002<\/li>\n\n\n\n<li><strong>Master Weight\u4fdd\u7559\uff08FP32\uff09<\/strong>\uff1a\u9632\u6b62\u5c0f\u68af\u5ea6\u907a\u5931\uff0c\u53c3\u6578\u66f4\u65b0\u4fdd\u7559\u4e00\u4efd\u5168\u7cbe\u5ea6\u526f\u672c\u3002<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">\u52d5\u614b\u7e2e\u653e\u8207Delayed Scaling Recipe<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Per-tensor Dynamic Scaling<\/strong>\uff1a\u70ba\u6bcf\u500b\u5f35\u91cf\u9078\u64c7\u5408\u9069\u7e2e\u653e\u56e0\u5b50\uff0c\u5c07\u5be6\u969b\u503c\u5c0d\u61c9\u5230FP8\u52d5\u614b\u7bc4\u570d\uff0c\u9632\u6b62\u6ea2\u4f4d\/\u4e0b\u6ea2\u3002<\/li>\n\n\n\n<li><strong>\u6b77\u53f2\u6700\u5927\u503c\u4f30\u8a08\uff08Delayed Scaling\uff09<\/strong>\uff1a\u4f7f\u7528\u6b77\u53f2\u8fed\u4ee3\u6700\u5927Amax\u503c\u4f30\u7b97\u76ee\u524d\u53c3\u6578\u7e2e\u653e\uff0c\u878d\u5408\u541e\u5410\u8207\u7cbe\u78ba\u5ea6\u3002<\/li>\n\n\n\n<li><strong>Just In Time Scaling<\/strong>\uff1a\u5728\u90e8\u5206\u6975\u7aef\u5834\u666f\u5617\u8a66\u5373\u6642\u7e2e\u653e\uff0c\u9032\u4e00\u6b65\u964d\u4f4e\u4e0b\u6ea2\u6b21\u6578\u3002<\/li>\n<\/ul>\n\n\n\n<p><small>\u6280\u8853\u7d30\u7bc0\u8acb\u53c3\u8003NVIDIA \u201cFP8\u8a13\u7df4\u7684\u6311\u6230\u53ca\u6700\u4f73\u5be6\u8e10\u201d <a href=\"https:\/\/developer.nvidia.com\/zh-cn\/blog\/fp8-challenges-best-practices\/\" target=\"_blank\"  rel=\"nofollow noopener\"  class=\"external\" >https:\/\/developer.nvidia.com\/zh-cn\/blog\/fp8-challenges-best-practices\/<\/a><\/small><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">\u6838\u5fc3\u512a\u5316\u8207\u6548\u80fd\u9677\u9631\u898f\u907f<\/h3>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1951\" height=\"1016\" src=\"https:\/\/aicats.wiki\/wp-content\/uploads\/2025\/12\/image-227.png\" alt=\"NVIDIA Transformer Engine\" class=\"wp-image-84191\"\/><figcaption class=\"wp-element-caption\">\u5716\uff0f<a href=\"https:\/\/github.com\/NVIDIA\/TransformerEngine\" target=\"_blank\" rel=\"noopener\" class=\"external\" >NVIDIA Transformer Engine<\/a><\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>\u98a8\u96aa\u9ede<\/th><th>\u63cf\u8ff0\/\u5178\u578b\u75c7\u72c0<\/th><th>\u907f\u5751\u5efa\u8b70<\/th><\/tr><\/thead><tbody><tr><td><strong>Launch Bound<\/strong><\/td><td>Kernel\u9593\u6c23\u6ce1\u904e\u591a\u3001Host\u7aeflaunch\u8986\u84cb<\/td><td>\u7b97\u5b50\u878d\u5408\u3001CUDA Graph\u5408\u4f75<\/td><\/tr><tr><td><strong>\u540c\u6b65\u963b\u585e<\/strong><\/td><td>Host-Device\u983b\u7e41\u540c\u6b65\u3001\u6548\u80fd\u6296\u52d5<\/td><td>\u907f\u514d\u540c\u6b65Op\uff0c\u6279\u6b21\u8655\u7406\u908f\u8f2f<\/td><\/tr><tr><td><strong>FP8\u4e0d\u652f\u63f4\u5168\u90e8\u7b97\u5b50<\/strong><\/td><td>\u7279\u6b8a\u81ea\u8a02\u904b\u7b97\u672a\u9069\u914dFP8<\/td><td>\u91cd\u8981\u7b97\u5b50\u9ad8\u7cbe\u5ea6\u56de\u9000<\/td><\/tr><tr><td><strong>\u8a13\u7df4\u4e0d\u6536\u6582\/\u6f02\u79fb<\/strong><\/td><td>loss\u7a81\u7136\u5347\u9ad8\uff0c\u68af\u5ea6\u7206\u70b8\/\u6d88\u5931<\/td><td>\u6df7\u5408\u7cbe\u6e96\u7b56\u7565+\u8d85\u53c3\u8abf\u512a\uff0c\u5b9a\u671f\u7528BF16\u53c3\u8003\u8a13\u7df4\u5c0d\u6bd4<\/td><\/tr><tr><td><strong>\u63a8\u7406\u7aef\u4e0d\u4e00\u81f4\/\u6548\u80fd\u53cd\u964d<\/strong><\/td><td>FP8\u6b0a\u91cd\u76f4\u63a5\u7528BF16\/FP16\u63a8\u7406\u4e1f\u5931\u7cbe\u5ea6<\/td><td>\u63a8\u7406\u7aef\u4fdd\u5b88\u63a1\u7528BF16\/FP8\u4e00\u81f4\u683c\u5f0f<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>\u5145\u5206\u7814\u7a76\u65b0\u786c\u9ad4\u652f\u63f4\u72c0\u6cc1<\/strong>\uff1a\u512a\u9078Hopper\u67b6\u69cb\uff08\u5982H100\uff09\u3001AMD MI300\u7b49\u539f\u751fFP8\u652f\u63f4\u5e73\u53f0\uff0c\u907f\u514d\u8001\u820aGPU\u3002<\/li>\n\n\n\n<li><strong>\u7d50\u5408PyTorch Transformer Engine<\/strong>\uff1a\u5229\u7528\u5176\u5c0dFP8\u5feb\u901f\u9069\u914d\u80fd\u529b\u8207\u6548\u80fd\u8abf\u512a\u80fd\u529b\uff0c\u5982<a href=\"https:\/\/github.com\/NVIDIA\/TransformerEngine\" target=\"_blank\" rel=\"noopener\" class=\"external\" >NVIDIA Transformer Engine<\/a>\u3002<\/li>\n\n\n\n<li><strong>\u5b9a\u671f\u8207BF16 baseline\u5c0d\u9f4a\u6536\u6582\u8def\u5f91<\/strong>\uff1a\u5982OpenAI, Meta\u7b49\u5efa\u8b70\u6bcf\u9694\u4e00\u5b9aepoch\u7528BF16\u8a13\u7df4\u5c0d\u6bd4\uff0c\u78ba\u4fddFP8\u8a13\u7df4\u672a\u767c\u751f\u6536\u6582\u6f02\u79fb\u3002<\/li>\n\n\n\n<li><strong>\u7b97\u5b50\u8a3b\u518a\u8207\u81ea\u8a02\u76f8\u5bb9\u958b\u767c<\/strong>\uff1a\u95dc\u9375\u6a21\u578b\u81ea\u8a02\u7b97\u7b26\u9700\u55ae\u7368\u9069\u914dFP8\uff0c\u5426\u5247\u6613\u51fa\u73fe\u300c\u9ed1\u76d2\u7570\u5e38\u300d\u3002<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">FP8\u5728\u5be6\u969bAI\u7522\u54c1\u8207\u793e\u7fa4\u7684\u61c9\u7528<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">\u7522\u696d\u843d\u5730\u6848\u4f8b<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>NVIDIA NeMo LLM\u6846\u67b6<\/strong>\uff1a\u652f\u63f4FP8\u6df7\u5408\u7cbe\u5ea6\u7aef\u5230\u7aef\u8a13\u7df4\uff08\u8a73\u898b<a href=\"https:\/\/docs.nvidia.com\/nemo-framework\/\" target=\"_blank\"  rel=\"nofollow noopener\"  class=\"external\" ><a href=\"https:\/\/docs.nvidia.com\/nemo-framework\/\" target=\"_blank\"  rel=\"nofollow noopener\"  class=\"external\" >NeMo\u5b98\u65b9\u6587\u6a94<\/a><\/a>\uff09\uff0c\u5df2\u61c9\u7528\u65bcLlama\u3001Mixtral\u7b49\u4e3b\u6d41\u5927\u6a21\u578b\u3002<\/li>\n\n\n\n<li><strong>DeepSeek-V2\/ChatGLM3\u7b49\u570b\u7522\u5927\u6a21\u578b<\/strong>\uff1a\u900f\u904eFP8\u5927\u898f\u6a21\u8a13\u7df4\u5927\u5e45\u964d\u4f4e\u904b\u7b97\u6210\u672c\uff0c7B\/70B\u6a21\u578b\u7684\u8a13\u7df4\u80fd\u8017\u96d9\u4f4d\u6578\u4e0b\u964d\uff0c\u958b\u6e90\u793e\u7fa4\u5ee3\u6cdb\u63a1\u7528\u3002<\/li>\n\n\n\n<li><strong>\u5927\u6a21\u578b\u7626\u8eab\u63a8\u7406\u4e00\u9ad4\u5316\u90e8\u7f72<\/strong>\uff1aFP8\u8a13\u7df4-\u63a8\u7406\u93c8\u8def\u7e2e\u77ed\uff0c\u6e1b\u5c11INT4\u91cf\u5316\u6642\u7684\u640d\u5931\u8207\u8abf\u512a\u6642\u9593\u3002<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">\u63a8\u85a6\u5de5\u5177\u7522\u54c1\u53ca\u8cc7\u6e90<\/h3>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1920\" height=\"1010\" src=\"https:\/\/aicats.wiki\/wp-content\/uploads\/2025\/12\/image-228.jpg\" alt=\"NeMo\u5b98\u65b9\u6587\u6a94\" class=\"wp-image-84194\"\/><figcaption class=\"wp-element-caption\">\u5716\uff0f<a href=\"https:\/\/docs.nvidia.com\/nemo-framework\/\" target=\"_blank\"  rel=\"nofollow noopener\"  class=\"external\" >NeMo\u5b98\u65b9\u6587\u6a94<\/a><\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>\u540d\u7a31<\/th><th>\u7c21\u8ff0<\/th><th>\u5de5\u5177\u9023\u7d50<\/th><\/tr><\/thead><tbody><tr><td><strong>NVIDIA Transformer Engine<\/strong><\/td><td>FP8\/BF16\/FP16\u6df7\u5408\u7cbe\u5ea6\u5143\u4ef6\u5eab<\/td><td><a href=\"https:\/\/github.com\/NVIDIA\/TransformerEngine\" target=\"_blank\" rel=\"noopener\" class=\"external\" >GitHub<\/a><\/td><\/tr><tr><td><strong>NVIDIA NeMo Framework<\/strong><\/td><td>\u7aef\u5230\u7aef\u5927\u6a21\u578b\u8a13\u7df4\u63a8\u7406\u89e3\u6c7a\u65b9\u6848<\/td><td><a href=\"https:\/\/developer.nvidia.com\/nvidia-nemo\" target=\"_blank\" rel=\"noopener\" class=\"external\" >\u5b98\u7db2<\/a><\/td><\/tr><tr><td><strong>HuggingFace Transformers<\/strong><\/td><td>\u793e\u7fa4\u4e3b\u529bLLM Transformer\u5be6\u73fe<\/td><td><a href=\"https:\/\/huggingface.co\/transformers\/\" target=\"_blank\" rel=\"noopener\" class=\"external\" >\u5b98\u7db2<\/a><\/td><\/tr><tr><td><strong>PyTorch AMP<\/strong><\/td><td>\u81ea\u52d5\u6df7\u5408\u7cbe\u6e96\u5ea6\u8a13\u7df4\u539f\u751f\u652f\u6301<\/td><td><a href=\"https:\/\/pytorch.org\/docs\/stable\/amp.html\" target=\"_blank\" rel=\"noopener\" class=\"external\" >PyTorch AMP\u6587\u6a94<\/a><\/td><\/tr><tr><td><strong>DeepSpeed<\/strong><\/td><td>\u8d85\u5927\u6a21\u578b\u5206\u6563\u5f0f\u8207\u6df7\u5408\u7cbe\u5ea6\u6700\u4f73\u5316\u958b\u6e90<\/td><td><a href=\"https:\/\/www.deepspeed.ai\/\" target=\"_blank\" rel=\"noopener\" class=\"external\" >DeepSpeed<\/a><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">\u958b\u767c\u8005\u300c\u907f\u5751\u6e05\u55ae\u300d\uff1a\u5982\u4f55\u5b89\u5168\u7528\u597dFP8\uff1f<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">\u958b\u767c\u8005\u5e38\u898b\u554f\u984c\u53ca\u89e3\u6c7a\u5efa\u8b70<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>\u5834\u666f<\/th><th>\u6f5b\u5728\u554f\u984c<\/th><th>\u63a8\u85a6\u505a\u6cd5<\/th><\/tr><\/thead><tbody><tr><td><strong>\u9996\u6b21\u7528FP8\u5927\u6a21\u578b\u8a13\u7df4<\/strong><\/td><td>\u6a21\u578bloss\u4e0d\u7a69\u5b9a\uff0c\u7cbe\u5ea6\u4e0b\u964d<\/td><td>\u4f9d\u7167\u5b98\u65b9AMP\u6df7\u5408\u7b56\u7565\uff0c\u4fdd\u7559Master Weight\uff0c\u8abf\u512a\u8d85\u53c3\u6578\uff0c\u555f\u7528Delayed Scaling<\/td><\/tr><tr><td><strong>\u81ea\u8a02\u6a21\u7d44FP8\u9069\u914d<\/strong><\/td><td>LayerNorm\u3001Softmax\u7b49\u51fa\u932f<\/td><td>\u5c0d\u7cbe\u5ea6\u8981\u6c42\u9ad8\u6a21\u7d44\u7528BF16\/FP32\u56de\u9000<\/td><\/tr><tr><td><strong>\u5206\u4f48\u5f0f\u8a13\u7df4\/\u63a8\u7406\u901a\u8a0a<\/strong><\/td><td>FP8\u901a\u8a0a\u7570\u5e38\/\u6548\u80fd\u672a\u63d0\u5347<\/td><td>\u78ba\u8a8d\u65b0\u4e00\u4ee3\u786c\u9ad4\/\u7db2\u8def\u983b\u5bec\u5df2\u9069\u914d<\/td><\/tr><tr><td><strong>\u63a8\u7406\u7aef\u90e8\u7f72\u91cf\u5316\u4e00\u81f4\u6027<\/strong><\/td><td>\u7cbe\u5ea6\u640d\u5931\u6216\u63a8\u7406\u901f\u5ea6\u4e0d\u9054\u9810\u671f<\/td><td>\u4fdd\u8b49\u63a8\u7406\u7aef\u4e5f\u555f\u7528FP8\/Per-tensor Scaling<\/td><\/tr><tr><td><strong>\u7570\u5e38Debug\u96e3\u5b9a\u4f4d<\/strong><\/td><td>\u5d29\u6f70\u3001\u68af\u5ea6\u7206\u70b8\/\u6d88\u5931\uff0c\u6027\u80fd\u6c23\u6ce1<\/td><td>\u958b\u555fBF16\/FP32\u53c3\u8003\u5c0d\u6bd4\uff0c\u5229\u7528CUDA Graph\u8207Profiler\u5206\u6790\uff0c\u4f9dNVIDIA <a href=\"https:\/\/developer.nvidia.com\/zh-cn\/blog\/fp8-challenges-best-practices\/\" target=\"_blank\"  rel=\"nofollow noopener\"  class=\"external\" >\u6548\u80fd\u8abf\u512a\u5efa\u8b70<\/a>\u8abf\u512a<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">\u7d50\u5c3e<\/h2>\n\n\n\n<p>FP8\u7684\u5230\u4f86\u4ee3\u8868\u8457AI\u7b97\u529b\u8207\u5de5\u7a0b\u6d6a\u6f6e\u7684\u65b0\u5e73\u8861\u9ede\uff0c\u5c24\u5176\u5c0d\u843d\u5730LLM\u3001AIGC\u3001RAG\u7b49\u5927\u6a21\u578b\u5834\u666f\u5177\u6709\u9769\u547d\u610f\u7fa9\u3002<strong>\u5b83\u65e2\u662f\u901a\u5f80AI\u666e\u53ca\u548c\u964d\u672c\u589e\u6548\u7684\u201c\u91d1\u9470\u5319\u201d\uff0c\u4e5f\u6697\u85cf\u5de5\u7a0b\u5be6\u73fe\u3001\u6027\u80fd\u8abf\u512a\u4ee5\u53ca\u63a8\u7406\u4e00\u81f4\u6027\u7684\u96d9\u91cd\u9677\u9631\u3002<\/strong>\u958b\u767c\u8005\u5728\u8ffd\u6c42\u7b97\u529b\u6975\u9650\u4e4b\u969b\uff0c\u66f4\u8981\u91cd\u8996\u6548\u80fd\u76e3\u63a7\u8207\u7cbe\u78ba\u5ea6\u6536\u6582\u5c0d\u9f4a\uff0c\u4e26\u6301\u7e8c\u5438\u6536\u696d\u754c\u7684\u6700\u4f73\u5be6\u52d9\u8207\u65b0\u5de5\u4ff1\u751f\u614b\u3002 FP8\u7684\u5c08\u696d\u843d\u5730\uff0c\u662fAI\u7522\u696d\u9032\u6b65\u7684\u91cd\u8981\u5206\u6c34\u5dba\uff0c\u503c\u5f97\u6240\u6709AI\u5be6\u8e10\u8005\u5171\u540c\u63a2\u7d22\u8207\u5b78\u7fd2\u3002<\/p>\n\n\n\n<p><small>\u5982\u9700\u9032\u4e00\u6b65\u53d6\u5f97FP8\u8a13\u7df4\u5be6\u52d9\u3001\u6700\u4f73\u5de5\u5177\u53caNVIDIA\u5b98\u65b9\u6587\u4ef6\u8acb\u8a2a\u554f<a href=\"https:\/\/developer.nvidia.com\/zh-cn\/blog\/fp8-challenges-best-practices\/\" target=\"_blank\" rel=\"noopener\" class=\"external\" >NVIDIA\u958b\u767c\u8005\u535a\u5ba2<\/a><\/small><\/p>","protected":false},"excerpt":{"rendered":"<p>AI\u9ad8\u901f\u53d1\u5c55\u4e0b\u7684\u7b97\u529b\u74f6\u9888\u4e0eFP8\u7684\u5d1b\u8d77 \u968f\u7740\u5927\u578bAI\u6a21\u578b\u53ca\u6df1\u5ea6\u5b66\u4e60\u7684\u52a0\u901f\u53d1\u5c55\uff0c\u5168\u884c\u4e1a\u9677\u5165\u7b97\u529b\u4e0e\u80fd\u8017\u7684\u53cc\u91cd\u201c\u7126\u8651 [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_crsspst_to_aicatswiki":true,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[305],"tags":[247,1011],"content_visibility":[262],"class_list":["post-82472","post","type-post","status-publish","format-standard","hentry","category-ai-tools-platforms","tag-ai"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/aicats.wiki\/tw\/wp-json\/wp\/v2\/posts\/82472","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/aicats.wiki\/tw\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aicats.wiki\/tw\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aicats.wiki\/tw\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/aicats.wiki\/tw\/wp-json\/wp\/v2\/comments?post=82472"}],"version-history":[{"count":1,"href":"https:\/\/aicats.wiki\/tw\/wp-json\/wp\/v2\/posts\/82472\/revisions"}],"predecessor-version":[{"id":84199,"href":"https:\/\/aicats.wiki\/tw\/wp-json\/wp\/v2\/posts\/82472\/revisions\/84199"}],"wp:attachment":[{"href":"https:\/\/aicats.wiki\/tw\/wp-json\/wp\/v2\/media?parent=82472"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aicats.wiki\/tw\/wp-json\/wp\/v2\/categories?post=82472"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aicats.wiki\/tw\/wp-json\/wp\/v2\/tags?post=82472"},{"taxonomy":"content_visibility","embeddable":true,"href":"https:\/\/aicats.wiki\/tw\/wp-json\/wp\/v2\/content_visibility?post=82472"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}