after the teasing The world with a glimpse at Microsoft Azure is Meta Finally Dropped the Llama 3, the latest generation of its LLM that offers SOTA performance and performance.

Click here to check out the model on GitHub..

The model is available in 8B and 70B parameter versions and is trained on 15 trillion tokens, making it seven times larger than Llama 2’s dataset. Llama 3 provides improved reasoning and coding capabilities, and its training process is three times more efficient than its predecessor.

Models are still available. A huggable face.

Meta is also training a model with over 400 billion parameters that Mark Zuckerberg said in a rail on Instagram would be the top performing model out there.

VAbycyUAcXI YTX1fJkf5codkJmQVQBStkIIKtrv8N1bagI1ZltZmjFJrKuSwBM5YZ6n1zxk2UvaTzBWM6o8h ydToCaToSZsqTEJqK fEWu XcvpdFdB9hVKZ6keCovPtAkGBgLvttsRWZeyFPoHVg

The 7B model outperforms the Gemma and Mistral on all benchmarks and the 70B model outperforms the Gemini Pro 1.5 and Claude 3 Sonnet.

Llama 3 models will soon be accessible on AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM WatsonX, Microsoft Azure, NVIDIA NIM, and Snowflake. Additionally, the models will be compatible with hardware platforms provided by AMD, AWS, Dell, Intel, NVIDIA, and Qualcomm.

In addition to the model, Meta has added its latest models to Meta AI, now powered by Llama 3, and expanded its availability to more countries. Meta AI is accessible via Facebook, Instagram, WhatsApp, Messenger, and the web, enabling users to accomplish tasks, learn, create, and engage with their interests.

Additionally, consumers will soon be able to experience the multimodal Meta AI on Ray-Ban Meta smart glasses.

Meta AI is powered by Llama 3 and is now available in 13 new countries. This includes improved search capabilities and modern web experiences. The latest updates in image generation on Meta AI allow users to create, animate and share images with a simple text prompt.

The model uses 128K token words for more efficient language encoding, leading to significant performance improvements. To increase the estimation efficiency, Grouped Question Attention (GQA) is applied in both 8B and 70B parameter models. The models were trained on a sequence of 8,192 tokens with masking to preserve document boundaries.

Llama 3’s training data consists of 15 trillion tokens obtained from publicly available data, which is seven times larger than Llama 2’s data set. The model was trained on two custom-built 24k GPU clusters.

It contains four times as much code and more than 5% high-quality non-English data spanning 30+ languages, although English is the most proficient. Advanced data filtering methods, including heuristic filters and semantic deduplication, ensure high-quality training data.

Here’s a sneak preview of the upcoming 400 billion parameter Llama 3 model.

HfAjGbRKYW



Source link