Update readme.md

2025-03-20 17:32:56 -07:00 · 2025-03-20 17:32:56 -07:00 · e631404eef
commit e631404eef
parent c69a4a50d2
1 changed files with 20 additions and 4 deletions
--- a/readme.md
+++ b/readme.md
@ -1,12 +1,28 @@
 # Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI-2)
-This repository contains the ARC-AGI-2 task data (ARC-AGI-1 can be found [here]()).
+This repository contains the ARC-AGI-2 task data (ARC-AGI-1 can be found [here](https://github.com/fchollet/arc-agi)).
 *"ARC can be seen as a general artificial intelligence benchmark, as a program synthesis benchmark, or as a psychometric intelligence test. It is targeted at both humans and artificially intelligent systems that aim at emulating a human-like form of general fluid intelligence."*
 A foundational description of the dataset, its goals, and its underlying logic, can be found in: [On the Measure of Intelligence](https://arxiv.org/abs/1911.01547) and the [ARC-AGI-2 Presentation](https://docs.google.com/presentation/d/1hQrGh5YI6MK3PalQYSQs4CQERrYBQZue8PBLjjHIMgI/edit?usp=sharing)
-As a reminder, a test-taker is said to solve a task when, upon seeing the task for the first time, they are able to produce the correct output grid for *all* test inputs in the task (this includes picking the dimensions of the output grid). For each test input, the test-taker is allowed 2 trials (this holds for all test-takers, either humans or AI).
+## Dataset composition
 ARC-AGI-2 contains 1,000 training tasks and 120 public evaluation tasks.
 The training tasks are intended to demonstrate the task format and the Core Knowledge priors used by ARC-AGI. They can be used for training AI models.
 The public evaluation tasks are intended for testing AI models that have never seen these tasks before. Average human performance on these tasks in our test sample was 60%.
 ARC-AGI-2 also features two private test sets not included in the repo:
 - A semi-private set intended for testing remotely-hosted commercial models with low leakage probability. It is calibrated to be the same human-facing difficulty as the public evaluation set.
 - A fully-private set intended for testing self-contained models during the ARC Prize competition, with near-zeo leakage probability. It is also calibrated to be the same difficulty.
 This multi-tiered structure allows for both open research and a secure, high-stakes competition.
 ## Task success criterion
 A test-taker is said to solve a task when, upon seeing the task for the first time, they are able to produce the correct output grid for *all* test inputs in the task (this includes picking the dimensions of the output grid). For each test input, the test-taker is allowed 2 trials (this holds for all test-takers, either humans or AI).
 ## Task file format
@ -32,7 +48,7 @@ When looking at a task, a test-taker has access to inputs & outputs of the demon
 ## Usage of the testing interface
-You can view tasks on [ARCPrize.org/play](https://arcprize.org/play) or clone the [ARC-AGI-1 testing interface](https://github.com/fchollet/ARC-AGI/tree/master/apps) located at `apps/testing_interface.html`. Open it in a web browser (Chrome recommended). It will prompt you to select a task JSON file.
+You can view tasks on [ARCPrize.org/play](https://arcprize.org/play) or clone the [ARC-AGI testing interface](https://github.com/fchollet/ARC-AGI/tree/master/apps) located at `apps/testing_interface.html`. Open it in a web browser (Chrome recommended). It will prompt you to select a task JSON file.
 After loading a task, you will enter the test space, which looks like this:
@ -62,4 +78,4 @@ When your output grid is ready, click the green "Submit!" button to check your a
 After you've obtained the correct answer for the current test input grid, you can switch to the next test input grid for the task using the "Next test input" button (if there is any available; most tasks only have one test input).
-When you're done with a task, use the "load task" button to open a new task.
+When you're done with a task, use the "load task" button to open a new task.