Yin Song
Senior Applied Scientist at Amazon Web Services
18th November, 2024, 10:00AM - 11:00AM (GST)
Title: | Porting the Falcon LLM to AWS ML Accelerator Chips using the Neuron SDK |
Abstract: | This work demonstrates a successful effort to move the 11B parameter Falcon 2 large language onto AI chips developed by AWS, Trainium. In particular we leverage the NeruonX Distributed software library (NxD), to achieve a 30 percent boost in end-to-end inference performance relative to comparable Amazon EC2 instances. We tested the model on two versions of the NXD library, 2.19 and 2.20, and found that the latter version both reduced implementation complexity and increased performance. To achieve this we altered the weights arrangement for keys, queries, and values in Falcon 2 to accommodate the NxD format, which enables proper sharding. We also followed guidance from NxD for the context encoding and decoding graphs to share the same KV caches and weights. This uses the same buffer in runtime, which reduces data movement and speeds up computation. |
Bio: | Yin Song is a seasoned professional with a profound background in data science, machine learning, and artificial intelligence (AI). Currently, he serves as a senior applied scientist at the AWS Prototyping team based in Sydney, Australia, where he has been a part of the AWS family for over five years and six months. At the AWS Prototyping team, Yin plays a pivotal role in helping customers envision and realize complex use cases of AWS Services by building tailored working prototypes. His current focus revolves around conducting research on fine-tuning and serving AI models, enabling customers to develop impactful end- to-end AI solutions. Yin's passion for open-source initiatives is evident through his leadership in a generative AI project. He and his team have published a series of Large Language and Vision Models on HuggingFace, including FalconLite, MistralLite, MegaBeam-Mistral-7B-512k, and long-llava-qwen2-7b. These models have gained widespread adoption in the open-source communities, accumulating over 500,000 downloads to date. Before joining AWS, Yin served as a senior data scientist at Telstra, Australia's largest telecommunication operator. In this role, he collaborated with diverse stakeholders to identify and address significant business challenges through the application of big data, machine learning, and AI techniques. Prior to Telstra, Yin worked as a senior big data and optimization engineer at ROKT and a data scientist at Brandscreen, both leading startups in the online advertising industry. Yin's academic achievements include a Ph.D. degree from the Faculty of Engineering and IT at the University of Technology, Sydney (UTS), obtained in 2014. His research interests spanned machine learning, pattern recognition, and data mining, with applications across various domains such as financial data, biomedical and health data, and social media data. During his doctoral studies, Yin published four journal articles and seven conference papers, three of which were in the ERA ratings of A/A*. These publications appeared in reputable journals and conferences, including IEEE transactions on cybernetics, the Journal of Tsinghua University (Science and Technology), Tsinghua Science and Technology, International Joint Conference on Neural Networks (IJCNN), The European Conference on Yin's educational journey began with a master's degree in engineering (integrated circuit engineering) from the Institute of Microelectronics at Tsinghua University, China, in 2010. He previously obtained a bachelor's degree in science (science and technology of electronic information) from the College of Information Science and Technology at Beijing Normal University, China, in 2007. |