Compare commits

...

10 Commits

Author SHA1 Message Date
Greg Kamradt
f3283f7274 readme updates 2025-05-15 16:15:22 -07:00
Greg Kamradt
2c42f4d6f2 Update faa9f03d with edits 2025-05-15 16:12:29 -07:00
Greg Kamradt
f4852d1766 faa9f03d first text index removed 2025-05-15 16:07:33 -07:00
Greg Kamradt
fa11dfc31c removing first test index f560132c 2025-05-15 16:06:45 -07:00
Greg Kamradt
f85d970504 removing b6f77b65 first test index 2025-05-15 16:04:52 -07:00
Greg Kamradt
fb0a4bfce8 removing abc82100 first test index 2025-05-15 16:04:03 -07:00
Greg Kamradt
124910ab8e removing 4a21e3da first test index 2025-05-15 16:02:37 -07:00
Greg Kamradt
1ef37bc909 changelog update 2025-04-17 10:37:44 -07:00
Greg Kamradt
14fba87526 task update 2025-04-17 10:36:10 -07:00
Greg Kamradt
fd80c5ad77 adding 2 tasks 2025-04-15 14:33:56 -07:00
8 changed files with 8084 additions and 505 deletions

View File

@ -4,9 +4,15 @@ This document tracks changes and updates to the ARC-AGI-2 dataset tasks.
## Updates ## Updates
### 2025-04-17
* Public eval task `d8e07eb2` - [Single train pair update](https://github.com/arcprize/ARC-AGI-2/commit/14fba87526c727b80b3a9b85d5933fd7825b991f)
### 2025-04-14 ### 2025-04-14
* Public Eval Tasks were updated with minor adjustments (off-by-one-pixel errors and slight ambiguities) to train and test pairs. No major task refactors. Updated tasks: * Public Eval Tasks were updated with minor adjustments (off-by-one-pixel errors and slight ambiguities) to train and test pairs. No major task refactors. Updated tasks:
* `38007db0` - [Single test pair update](https://github.com/arcprize/ARC-AGI-2/commit/385b761253cf7157ad503909f4d8224b8d85ca97#diff-41216bd1be9cb219575a44e2a21a7dcf18667c75dfa292d52ea7878a3148bcd1)
* `36a08778` - [Single test pair update](https://github.com/arcprize/ARC-AGI-2/commit/385b761253cf7157ad503909f4d8224b8d85ca97)
* `247ef758` - [Single test pair update](https://github.com/arcprize/ARC-AGI-2/commit/8b454b595552981fc9aa8e9540f3e68c92b0f03a) * `247ef758` - [Single test pair update](https://github.com/arcprize/ARC-AGI-2/commit/8b454b595552981fc9aa8e9540f3e68c92b0f03a)
* `f560132c` - [Single test pair update](https://github.com/arcprize/ARC-AGI-2/commit/30c145f7c524c932d95d4a512abdd5318ef21bf9) * `f560132c` - [Single test pair update](https://github.com/arcprize/ARC-AGI-2/commit/30c145f7c524c932d95d4a512abdd5318ef21bf9)
* `f931b4a8` - [Train pair update](https://github.com/arcprize/ARC-AGI-2/commit/86a8149f53ce915c069cf586f061eb0af0204713) * `f931b4a8` - [Train pair update](https://github.com/arcprize/ARC-AGI-2/commit/86a8149f53ce915c069cf586f061eb0af0204713)

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -152,36 +152,6 @@
} }
], ],
"test": [ "test": [
{
"input": [
[3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 4, 3, 3, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 4, 0, 3, 0, 8, 7, 7, 7, 0, 0],
[0, 0, 4, 0, 3, 0, 8, 0, 0, 7, 0, 0],
[0, 0, 4, 0, 3, 0, 8, 0, 0, 7, 0, 0],
[0, 0, 6, 5, 5, 5, 5, 5, 0, 7, 0, 0],
[0, 0, 6, 0, 0, 0, 0, 5, 0, 7, 0, 0],
[0, 0, 6, 0, 0, 0, 0, 5, 0, 7, 0, 0],
[0, 3, 1, 1, 1, 0, 2, 2, 2, 2, 9, 0],
[0, 3, 0, 0, 1, 0, 2, 0, 0, 0, 9, 0],
[0, 3, 0, 0, 1, 0, 2, 0, 0, 0, 9, 0]
],
"output": [
[3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0],
[0, 0, 4, 0, 0, 0, 0, 0, 0, 7, 0, 0],
[0, 0, 4, 0, 0, 0, 8, 7, 7, 7, 0, 0],
[0, 0, 4, 0, 0, 0, 8, 5, 0, 7, 0, 0],
[0, 0, 4, 0, 0, 0, 8, 5, 0, 7, 0, 0],
[0, 0, 6, 5, 5, 5, 5, 5, 0, 7, 0, 0],
[0, 0, 6, 0, 1, 0, 2, 2, 2, 2, 9, 0],
[0, 0, 6, 0, 1, 0, 2, 0, 0, 0, 9, 0],
[0, 0, 1, 1, 1, 0, 2, 0, 0, 0, 9, 0]
]
},
{ {
"input": [ "input": [
[7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],

View File

@ -35,9 +35,9 @@
], ],
"output": [ "output": [
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3], [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
[3, 3, 0, 0, 0, 3, 3, 2, 2, 3, 3, 3, 7, 3, 3, 3, 3, 3, 6, 6, 3, 3], [3, 3, 0, 0, 0, 3, 3, 1, 1, 1, 3, 3, 7, 3, 3, 3, 3, 3, 6, 6, 3, 3],
[3, 3, 0, 0, 0, 3, 3, 2, 2, 2, 3, 3, 7, 7, 7, 3, 3, 3, 3, 6, 3, 3], [3, 3, 0, 0, 0, 3, 3, 3, 1, 3, 3, 3, 7, 7, 7, 3, 3, 3, 3, 6, 3, 3],
[3, 3, 0, 0, 0, 3, 3, 3, 2, 3, 3, 3, 7, 3, 3, 3, 3, 3, 6, 6, 3, 3], [3, 3, 0, 0, 0, 3, 3, 1, 1, 1, 3, 3, 7, 3, 3, 3, 3, 3, 6, 6, 3, 3],
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3], [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
[6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6], [6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6],
[8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8], [8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8],

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -4,14 +4,14 @@ This repository contains the ARC-AGI-2 task data (ARC-AGI-1 can be found [here](
*"ARC can be seen as a general artificial intelligence benchmark, as a program synthesis benchmark, or as a psychometric intelligence test. It is targeted at both humans and artificially intelligent systems that aim at emulating a human-like form of general fluid intelligence."* *"ARC can be seen as a general artificial intelligence benchmark, as a program synthesis benchmark, or as a psychometric intelligence test. It is targeted at both humans and artificially intelligent systems that aim at emulating a human-like form of general fluid intelligence."*
A foundational description of the dataset, its goals, and its underlying logic, can be found in: [On the Measure of Intelligence](https://arxiv.org/abs/1911.01547) and the [ARC-AGI-2 Presentation](https://docs.google.com/presentation/d/1hQrGh5YI6MK3PalQYSQs4CQERrYBQZue8PBLjjHIMgI/edit?usp=sharing) A foundational description of the dataset, its goals, and its underlying logic, can be found in: [On the Measure of Intelligence](https://arxiv.org/abs/1911.01547), the [ARC-AGI-2 Presentation](https://docs.google.com/presentation/d/1hQrGh5YI6MK3PalQYSQs4CQERrYBQZue8PBLjjHIMgI/edit?usp=sharing) and [ARC-AGI-2 Technical Report](http://arcprize.org/blog/arc-agi-2-technical-report)
## Dataset composition ## Dataset composition
ARC-AGI-2 contains 1,000 training tasks and 120 public evaluation tasks. ARC-AGI-2 contains 1,000 public training tasks and 120 public evaluation tasks.
The training tasks are intended to demonstrate the task format and the Core Knowledge priors used by ARC-AGI. They can be used for training AI models. The training tasks are intended to demonstrate the task format and the Core Knowledge priors used by ARC-AGI. They can be used for training AI models.
The public evaluation tasks are intended for testing AI models that have never seen these tasks before. Average human performance on these tasks in our test sample was 60%. The public evaluation tasks are intended for testing AI models that have never seen these tasks before. Average human performance on these tasks in our test sample was 66%.
ARC-AGI-2 also features two private test sets not included in the repo: ARC-AGI-2 also features two private test sets not included in the repo:
@ -48,7 +48,7 @@ When looking at a task, a test-taker has access to inputs & outputs of the demon
## Usage of the testing interface ## Usage of the testing interface
You can view tasks on [ARCPrize.org/play](https://arcprize.org/play) or clone the [ARC-AGI testing interface](https://github.com/fchollet/ARC-AGI/tree/master/apps) located at `apps/testing_interface.html`. Open it in a web browser (Chrome recommended). It will prompt you to select a task JSON file. You can view tasks on [ARCPrize.org/play](https://arcprize.org/play) or clone the [ARC-AGI-1 testing interface](https://github.com/fchollet/ARC-AGI/tree/master/apps). Open it in a web browser (Chrome recommended). It will prompt you to select a task JSON file.
After loading a task, you will enter the test space, which looks like this: After loading a task, you will enter the test space, which looks like this: