Here is Gwern Danbooru 2021 dataset as ADDON to 2018 one with additional 1.260.067 Danbooru images
for 01.01.2019-31.12.2021 rating:safe resized to 512x512 px with some meta-information used
for image recognition training in zipped format, acceptible to all torrent clients.
Meta information included in “initial” JSON format for posts and “advanced” JSON for all entities (read Gwern description for details).
NOTE a BOORU CHAR dataset with 1280px samples from several imageboards:
"+" much better initial image selection, bigger image size -->> can be pleasuly viewed
"+" convenient folders, verbose file naming, tags to EXIF -->> flexible subsampling using file system only
"+" much more computed metadata -->> a lot of analysis or subsampling without recompute
"-" uncomplete, similarities lost, less image count <<-- hard initial filter, lossy preprocessing
"-" less consistent, not complete tags and imageboard metadata <<-- diverse sources, diverse retrieval methods
"?" not completely SFW by design
BOORU CHAR is my mainstream (release 2021 , release 2015 and release 2022 at the moment) but I’ll seed Gwern sets too.
Comments - 1
SomaHeir
Thanks!!!