探討關于多模態(tài)大模型落地機器人行業(yè)發(fā)展
發(fā)布時間:2024-10-31 來源:http:///
近期國內多家企業(yè)在“大模型+機器人”已實現(xiàn)技術突破。
Recently, many domestic enterprises have achieved technological breakthroughs in "big models+robots".
業(yè)內認為,隨著技術的不斷進步和應用場景的擴大,多模態(tài)大模型與機器人的需求將會不斷增加,為企業(yè)提供了廣闊市場空間。此外,與其他行業(yè)的合作也將為多模態(tài)大模型與機器人的發(fā)展帶來新機遇,例如與醫(yī)療、制造等行業(yè)的合作,可實現(xiàn)更廣泛的應用場景和商業(yè)價值。
The industry believes that with the continuous advancement of technology and the expansion of application scenarios, the demand for multimodal large models and robots will continue to increase, providing a broad market space for enterprises. In addition, cooperation with other industries will also bring new opportunities for the development of multimodal large models and robots, such as cooperation with industries such as healthcare and manufacturing, which can achieve a wider range of application scenarios and commercial value.
多模態(tài)機器人實現(xiàn)技術突破
Breakthrough in multimodal robot technology
截12月13日收盤,步科股份、埃夫特、綠的諧波等多只機器人概念股漲超4%。消息面上,特斯拉發(fā)布Optimus-Gen 2(第二代擎天柱)人形機器人視頻,其搭載由特斯拉設計的執(zhí)行器與傳感器,行走速度提高30%,平衡力及全身控制均得到提高。
As of the close on December 13th, several robot concept stocks such as BuTech, Evertech, and Green Harmonic have risen by over 4%. On the news front, Tesla released a video of the Optimus Gen 2 (second generation Optimus Prime) humanoid robot, which is equipped with Tesla designed actuators and sensors, increasing walking speed by 30% and improving balance and full body control.
“多模態(tài)”AI是指能處理文本、音頻、圖像、視頻和代碼等多種形式內容的大模型。隨著多模態(tài)大模型快速迭代,國際大廠不斷關注其在機器人領域的應用,并在機器人規(guī)劃、控制、導航等主要任務上進行了探索。
Multimodal AI refers to large models capable of processing various forms of content such as text, audio, images, videos, and code. With the rapid iteration of multimodal large models, international giants are constantly paying attention to their applications in the field of robotics and exploring their main tasks such as robot planning, control, and navigation.
止于善投資總經理何理告訴《證券日報》記者:“多模態(tài)大模型融合視覺、語音和傳感器數(shù)據(jù)處理技術,極大豐富了機器人認知和決策層面。該技術在機器人中的應用,有望使機器人在復雜交互、自然語言理解和環(huán)境適應等領域邁出重大進步,激發(fā)其作為高度自主助手或勞動力的無限可能性。”
Zhi Zhi Shan Investment's General Manager He Li told Securities Daily reporters, "The fusion of multimodal large models with visual, speech, and sensor data processing technology greatly enriches the cognitive and decision-making levels of robots. The application of this technology in robots is expected to make significant progress in areas such as complex hybridization, natural language understanding, and environmental adaptation, stimulating their infinite possibilities as highly autonomous assistants or laborers
國內已有企業(yè)在此領域搶先布局。12月12日晚,奧比中光發(fā)布大模型機械臂1.0產品,可通過語音Prompts作為輸入,利用多種大模型的理解能力和視覺感知能力,生成空間語義信息,讓機械臂理解、執(zhí)行動作。在其同步披露的視頻中,機械臂成功完成了一系列語音口令,包括“把綠色方塊放到黃色框中”“請恢復開始的狀態(tài)”等。
Domestic enterprises have already taken the lead in this field. On the evening of December 12th, Obi Zhongguang released the Large Model Robot Arm 1.0 product, which can use voice Prompts as input and utilize the understanding and visual perception abilities of multiple large models to generate spatial semantic information, allowing the robot arm to understand and execute actions. In its synchronously disclosed video, the robotic arm successfully completed a series of voice commands, including "put the green square in the yellow box" and "please restore the initial state".
奧比中光聯(lián)合創(chuàng)始人、CTO肖振中告訴《證券日報》記者:“公司希望通過工程化研究,使大模型機械臂在實際場景落地,包括提升機械臂自動繞開復雜障礙物來完成人類指令的能力,大模型+機械臂的泛化性問題,終實現(xiàn)通用場景落地。”
Xiao Zhenzhong, co-founder and CTO of Obi Zhongguang, told Securities Daily reporters, "The company hopes to use engineering research to enable the implementation of large model robotic arms in practical scenarios, including improving the ability of robotic arms to automatically bypass complex obstacles to complete human commands, solving the generalization problem of large models and robotic arms, and ultimately achieving universal scenario implementation
據(jù)不完全統(tǒng)計,中科創(chuàng)達、億嘉和等上市公司亦于近期相繼披露了基于多模態(tài)大模型的機器人研發(fā)進展情況。
According to incomplete statistics, listed companies such as Zhongke Chuangda and Yijiahe have recently disclosed their progress in robot research and development based on multimodal large models.
商業(yè)大規(guī)模應用仍需時間
Large scale commercial applications still require time
我國機器人行業(yè)已具備一定產業(yè)基礎。頭腦聰明、四肢靈活得多的模態(tài)機器人正成為多方競逐未來產業(yè)的新賽道。
China's robotics industry has established a certain industrial foundation. Modal robots with intelligent minds and much more flexible limbs are becoming a new track for multi-party competition in future industries.
何理認為,在國內市場,企業(yè)已積極投入關鍵技術環(huán)節(jié)的研發(fā)和生產,尤其是在傳感器、精密機械部件、執(zhí)行器以及創(chuàng)新材料和輕量化結構件領域,展示了蓬勃發(fā)展勢頭。
He Li believes that in the domestic market, enterprises have actively invested in the research and development and production of key technological links, especially in the fields of sensors, precision mechanical components, actuators, innovative materials, and lightweight structural components, demonstrating a vigorous development momentum.
諧波減速器是工業(yè)機器人的核心零部件。綠的諧波披露,已較早完成工業(yè)機器人諧波減速器技術研發(fā)并實現(xiàn)規(guī)?;a,在該領域率先實現(xiàn)了對進口產品的替代,極大降低了國產機器人企業(yè)的采購成本及采購周期。其推出的新一代Y系列諧波減速器,通過數(shù)理模型創(chuàng)新,軸承設計及加工工藝優(yōu)化,其剛度指標較現(xiàn)有其他產品提升了一倍。
Harmonic reducer is the core component of industrial robots. Green harmonic disclosure has completed the research and development of industrial robot harmonic reducer technology earlier and achieved large-scale production. It has taken the lead in replacing imported products in this field, greatly reducing the procurement cost and procurement cycle of domestic robot enterprises. The new generation Y series harmonic reducer launched by it has doubled its stiffness index compared to other existing products through mathematical model innovation, bearing design and processing technology optimization.
肖振中對此表示認同,他告訴《證券日報》記者:“大語言模型(Large Language Model,LLM)結合視覺傳感,會讓各類機器人、機械臂落地到更多場景中,如工業(yè)制造、柔性物流、商用服務等。目前大模型跟實際數(shù)據(jù)的結合還存在一定差距,大模型運行消耗的算力也偏大,應用需要三五年的時間逐步落地,業(yè)務成熟可能需要更久。”
Xiao Zhenzhong agrees with this and told Securities Daily reporters: "The combination of Large Language Model (LLM) and visual sensing will enable various robots and robotic arms to land in more scenarios, such as industrial manufacturing, flexible logistics, commercial services, etc. At present, there is still a certain gap between the integration of large models and actual data, and the computing power consumed by the operation of large models is also relatively high. The application will take three to five years to gradually land, and business maturity may take longer
“但公司堅信這是正確的方向,前景廣闊?!毙ふ裰斜硎?,奧比中光正搭建機器人及AI視覺中臺,通過多模態(tài)視覺大模型及智能算法研發(fā),結合機器人視覺傳感器,形成自主移動定位導航和避障的完整產品方案,積極迎接智能機器人時代。
But the company firmly believes that this is the right direction with broad prospects, "said Xiao Zhenzhong. Obi Zhongguang is building a robot and AI vision platform, and through the research and development of multimodal vision models and intelligent algorithms, combined with robot vision sensors, has formed a complete product solution for autonomous mobile positioning, navigation, and obstacle avoidance, actively welcoming the era of intelligent robots.
本文的精彩內容來自:大型機器人模型制作 更多的詳細內容請點擊我們網站:http://謝謝您的到來
The exciting content of this article comes from the production of large-scale robot models. For more detailed content, please click on our website: http:// Thank you for coming