Search

Red Pajama 2: The Public Dataset With a Whopping 30 Trillion Tokens

$ 5.99 · 4.7 (73) · In stock

Together, the developer, claims it is the largest public dataset specifically for language model pre-training

Integrated AI: The sky is comforting (2023 AI retrospective) – Dr Alan D. Thompson – Life Architect

Redpajama-Data-v2 is Incredible : r/LocalLLaMA

Mandala #122 - TrendyMandalas

RedPajama training progress at 440 billion tokens

2311.17035] Scalable Extraction of Training Data from (Production) Language Models

RedPajama Project: An Open-Source Initiative to Democratizing LLMs - KDnuggets

cerebras/SlimPajama-627B · Datasets at Hugging Face

ChatGPT / Generative AI recent news, page 3 of 19

Leaderboard: OpenAI's GPT-4 Has Lowest Hallucination Rate

Shamane Siri, PhD on LinkedIn: RedPajama-Data-v2: an Open Dataset with 30 Trillion Tokens for Training…

Top 10 List of Large Language Models in Open-Source