Update readme.md
This commit is contained in:
parent
c69a4a50d2
commit
e631404eef
22
readme.md
22
readme.md
@ -1,12 +1,28 @@
|
|||||||
# Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI-2)
|
# Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI-2)
|
||||||
|
|
||||||
This repository contains the ARC-AGI-2 task data (ARC-AGI-1 can be found [here]()).
|
This repository contains the ARC-AGI-2 task data (ARC-AGI-1 can be found [here](https://github.com/fchollet/arc-agi)).
|
||||||
|
|
||||||
*"ARC can be seen as a general artificial intelligence benchmark, as a program synthesis benchmark, or as a psychometric intelligence test. It is targeted at both humans and artificially intelligent systems that aim at emulating a human-like form of general fluid intelligence."*
|
*"ARC can be seen as a general artificial intelligence benchmark, as a program synthesis benchmark, or as a psychometric intelligence test. It is targeted at both humans and artificially intelligent systems that aim at emulating a human-like form of general fluid intelligence."*
|
||||||
|
|
||||||
A foundational description of the dataset, its goals, and its underlying logic, can be found in: [On the Measure of Intelligence](https://arxiv.org/abs/1911.01547) and the [ARC-AGI-2 Presentation](https://docs.google.com/presentation/d/1hQrGh5YI6MK3PalQYSQs4CQERrYBQZue8PBLjjHIMgI/edit?usp=sharing)
|
A foundational description of the dataset, its goals, and its underlying logic, can be found in: [On the Measure of Intelligence](https://arxiv.org/abs/1911.01547) and the [ARC-AGI-2 Presentation](https://docs.google.com/presentation/d/1hQrGh5YI6MK3PalQYSQs4CQERrYBQZue8PBLjjHIMgI/edit?usp=sharing)
|
||||||
|
|
||||||
As a reminder, a test-taker is said to solve a task when, upon seeing the task for the first time, they are able to produce the correct output grid for *all* test inputs in the task (this includes picking the dimensions of the output grid). For each test input, the test-taker is allowed 2 trials (this holds for all test-takers, either humans or AI).
|
## Dataset composition
|
||||||
|
|
||||||
|
ARC-AGI-2 contains 1,000 training tasks and 120 public evaluation tasks.
|
||||||
|
|
||||||
|
The training tasks are intended to demonstrate the task format and the Core Knowledge priors used by ARC-AGI. They can be used for training AI models.
|
||||||
|
The public evaluation tasks are intended for testing AI models that have never seen these tasks before. Average human performance on these tasks in our test sample was 60%.
|
||||||
|
|
||||||
|
ARC-AGI-2 also features two private test sets not included in the repo:
|
||||||
|
|
||||||
|
- A semi-private set intended for testing remotely-hosted commercial models with low leakage probability. It is calibrated to be the same human-facing difficulty as the public evaluation set.
|
||||||
|
- A fully-private set intended for testing self-contained models during the ARC Prize competition, with near-zeo leakage probability. It is also calibrated to be the same difficulty.
|
||||||
|
|
||||||
|
This multi-tiered structure allows for both open research and a secure, high-stakes competition.
|
||||||
|
|
||||||
|
## Task success criterion
|
||||||
|
|
||||||
|
A test-taker is said to solve a task when, upon seeing the task for the first time, they are able to produce the correct output grid for *all* test inputs in the task (this includes picking the dimensions of the output grid). For each test input, the test-taker is allowed 2 trials (this holds for all test-takers, either humans or AI).
|
||||||
|
|
||||||
## Task file format
|
## Task file format
|
||||||
|
|
||||||
@ -32,7 +48,7 @@ When looking at a task, a test-taker has access to inputs & outputs of the demon
|
|||||||
|
|
||||||
## Usage of the testing interface
|
## Usage of the testing interface
|
||||||
|
|
||||||
You can view tasks on [ARCPrize.org/play](https://arcprize.org/play) or clone the [ARC-AGI-1 testing interface](https://github.com/fchollet/ARC-AGI/tree/master/apps) located at `apps/testing_interface.html`. Open it in a web browser (Chrome recommended). It will prompt you to select a task JSON file.
|
You can view tasks on [ARCPrize.org/play](https://arcprize.org/play) or clone the [ARC-AGI testing interface](https://github.com/fchollet/ARC-AGI/tree/master/apps) located at `apps/testing_interface.html`. Open it in a web browser (Chrome recommended). It will prompt you to select a task JSON file.
|
||||||
|
|
||||||
After loading a task, you will enter the test space, which looks like this:
|
After loading a task, you will enter the test space, which looks like this:
|
||||||
|
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user