Compare commits

..

No commits in common. "f3283f727488ad98fe575ea6a5ac981e4a188e49" and "d56ea3d19d892dbb42ab9f104236e71da15d7fdb" have entirely different histories.

8 changed files with 505 additions and 8084 deletions

View File

@ -4,15 +4,9 @@ This document tracks changes and updates to the ARC-AGI-2 dataset tasks.
## Updates ## Updates
### 2025-04-17
* Public eval task `d8e07eb2` - [Single train pair update](https://github.com/arcprize/ARC-AGI-2/commit/14fba87526c727b80b3a9b85d5933fd7825b991f)
### 2025-04-14 ### 2025-04-14
* Public Eval Tasks were updated with minor adjustments (off-by-one-pixel errors and slight ambiguities) to train and test pairs. No major task refactors. Updated tasks: * Public Eval Tasks were updated with minor adjustments (off-by-one-pixel errors and slight ambiguities) to train and test pairs. No major task refactors. Updated tasks:
* `38007db0` - [Single test pair update](https://github.com/arcprize/ARC-AGI-2/commit/385b761253cf7157ad503909f4d8224b8d85ca97#diff-41216bd1be9cb219575a44e2a21a7dcf18667c75dfa292d52ea7878a3148bcd1)
* `36a08778` - [Single test pair update](https://github.com/arcprize/ARC-AGI-2/commit/385b761253cf7157ad503909f4d8224b8d85ca97)
* `247ef758` - [Single test pair update](https://github.com/arcprize/ARC-AGI-2/commit/8b454b595552981fc9aa8e9540f3e68c92b0f03a) * `247ef758` - [Single test pair update](https://github.com/arcprize/ARC-AGI-2/commit/8b454b595552981fc9aa8e9540f3e68c92b0f03a)
* `f560132c` - [Single test pair update](https://github.com/arcprize/ARC-AGI-2/commit/30c145f7c524c932d95d4a512abdd5318ef21bf9) * `f560132c` - [Single test pair update](https://github.com/arcprize/ARC-AGI-2/commit/30c145f7c524c932d95d4a512abdd5318ef21bf9)
* `f931b4a8` - [Train pair update](https://github.com/arcprize/ARC-AGI-2/commit/86a8149f53ce915c069cf586f061eb0af0204713) * `f931b4a8` - [Train pair update](https://github.com/arcprize/ARC-AGI-2/commit/86a8149f53ce915c069cf586f061eb0af0204713)

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -152,6 +152,36 @@
} }
], ],
"test": [ "test": [
{
"input": [
[3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 4, 3, 3, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 4, 0, 3, 0, 8, 7, 7, 7, 0, 0],
[0, 0, 4, 0, 3, 0, 8, 0, 0, 7, 0, 0],
[0, 0, 4, 0, 3, 0, 8, 0, 0, 7, 0, 0],
[0, 0, 6, 5, 5, 5, 5, 5, 0, 7, 0, 0],
[0, 0, 6, 0, 0, 0, 0, 5, 0, 7, 0, 0],
[0, 0, 6, 0, 0, 0, 0, 5, 0, 7, 0, 0],
[0, 3, 1, 1, 1, 0, 2, 2, 2, 2, 9, 0],
[0, 3, 0, 0, 1, 0, 2, 0, 0, 0, 9, 0],
[0, 3, 0, 0, 1, 0, 2, 0, 0, 0, 9, 0]
],
"output": [
[3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0],
[0, 0, 4, 0, 0, 0, 0, 0, 0, 7, 0, 0],
[0, 0, 4, 0, 0, 0, 8, 7, 7, 7, 0, 0],
[0, 0, 4, 0, 0, 0, 8, 5, 0, 7, 0, 0],
[0, 0, 4, 0, 0, 0, 8, 5, 0, 7, 0, 0],
[0, 0, 6, 5, 5, 5, 5, 5, 0, 7, 0, 0],
[0, 0, 6, 0, 1, 0, 2, 2, 2, 2, 9, 0],
[0, 0, 6, 0, 1, 0, 2, 0, 0, 0, 9, 0],
[0, 0, 1, 1, 1, 0, 2, 0, 0, 0, 9, 0]
]
},
{ {
"input": [ "input": [
[7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],

View File

@ -35,9 +35,9 @@
], ],
"output": [ "output": [
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3], [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
[3, 3, 0, 0, 0, 3, 3, 1, 1, 1, 3, 3, 7, 3, 3, 3, 3, 3, 6, 6, 3, 3], [3, 3, 0, 0, 0, 3, 3, 2, 2, 3, 3, 3, 7, 3, 3, 3, 3, 3, 6, 6, 3, 3],
[3, 3, 0, 0, 0, 3, 3, 3, 1, 3, 3, 3, 7, 7, 7, 3, 3, 3, 3, 6, 3, 3], [3, 3, 0, 0, 0, 3, 3, 2, 2, 2, 3, 3, 7, 7, 7, 3, 3, 3, 3, 6, 3, 3],
[3, 3, 0, 0, 0, 3, 3, 1, 1, 1, 3, 3, 7, 3, 3, 3, 3, 3, 6, 6, 3, 3], [3, 3, 0, 0, 0, 3, 3, 3, 2, 3, 3, 3, 7, 3, 3, 3, 3, 3, 6, 6, 3, 3],
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3], [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
[6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6], [6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6],
[8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8], [8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8],
@ -465,4 +465,4 @@
] ]
} }
] ]
} }

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -4,14 +4,14 @@ This repository contains the ARC-AGI-2 task data (ARC-AGI-1 can be found [here](
*"ARC can be seen as a general artificial intelligence benchmark, as a program synthesis benchmark, or as a psychometric intelligence test. It is targeted at both humans and artificially intelligent systems that aim at emulating a human-like form of general fluid intelligence."* *"ARC can be seen as a general artificial intelligence benchmark, as a program synthesis benchmark, or as a psychometric intelligence test. It is targeted at both humans and artificially intelligent systems that aim at emulating a human-like form of general fluid intelligence."*
A foundational description of the dataset, its goals, and its underlying logic, can be found in: [On the Measure of Intelligence](https://arxiv.org/abs/1911.01547), the [ARC-AGI-2 Presentation](https://docs.google.com/presentation/d/1hQrGh5YI6MK3PalQYSQs4CQERrYBQZue8PBLjjHIMgI/edit?usp=sharing) and [ARC-AGI-2 Technical Report](http://arcprize.org/blog/arc-agi-2-technical-report) A foundational description of the dataset, its goals, and its underlying logic, can be found in: [On the Measure of Intelligence](https://arxiv.org/abs/1911.01547) and the [ARC-AGI-2 Presentation](https://docs.google.com/presentation/d/1hQrGh5YI6MK3PalQYSQs4CQERrYBQZue8PBLjjHIMgI/edit?usp=sharing)
## Dataset composition ## Dataset composition
ARC-AGI-2 contains 1,000 public training tasks and 120 public evaluation tasks. ARC-AGI-2 contains 1,000 training tasks and 120 public evaluation tasks.
The training tasks are intended to demonstrate the task format and the Core Knowledge priors used by ARC-AGI. They can be used for training AI models. The training tasks are intended to demonstrate the task format and the Core Knowledge priors used by ARC-AGI. They can be used for training AI models.
The public evaluation tasks are intended for testing AI models that have never seen these tasks before. Average human performance on these tasks in our test sample was 66%. The public evaluation tasks are intended for testing AI models that have never seen these tasks before. Average human performance on these tasks in our test sample was 60%.
ARC-AGI-2 also features two private test sets not included in the repo: ARC-AGI-2 also features two private test sets not included in the repo:
@ -48,7 +48,7 @@ When looking at a task, a test-taker has access to inputs & outputs of the demon
## Usage of the testing interface ## Usage of the testing interface
You can view tasks on [ARCPrize.org/play](https://arcprize.org/play) or clone the [ARC-AGI-1 testing interface](https://github.com/fchollet/ARC-AGI/tree/master/apps). Open it in a web browser (Chrome recommended). It will prompt you to select a task JSON file. You can view tasks on [ARCPrize.org/play](https://arcprize.org/play) or clone the [ARC-AGI testing interface](https://github.com/fchollet/ARC-AGI/tree/master/apps) located at `apps/testing_interface.html`. Open it in a web browser (Chrome recommended). It will prompt you to select a task JSON file.
After loading a task, you will enter the test space, which looks like this: After loading a task, you will enter the test space, which looks like this: