Site generator for Apple Notes

“when you don't create things, you become defined by your tastes rather than ability. your tastes only narrow & exclude people. so create.” -why the lucky stiff

I built a site generator for Apple Notes. It is a different way to publish and manage blogs. I will show you a quick tour: https://www.youtube.com/watch?v=cACYpfAA_hI

Let me say this out load: this website is just a folder in Apple Notes.

It is intentionally simple. Whenever I create a note, a website syncs it up instantly. Isn’t that cool? If you ever shared a note from Apple Notes, you can see that this is a much easier way to do so.

I am running a private alpha as in I am the only user right now; however, if you are interested, I created a form where you can leave a comment.

Backstory

Every story has a beginning. This one began in November 2014. I live in St. Petersburg, Russia, and this alone paints a picture of a dark and gloomy day; the sleet is pouring down and shutting everything from sight. Today is November 24th, and I am heading to the computer store to pick up the only computer I will be using ever since. This wasn’t a pre-Christmas gift — it was an “I can afford it” purchase. MacBook Pro with a gorgeous retina display (I don’t even mind that it is glossy), a MagSafe charger, and the unibody aluminum body.

Today, it is covered with scratches and other mistakes I made in my youth. However, none of my prior computers survived eight years of working non-stop. Sure, I may say on a particularly gloomy day that today’s applications are not designed for elderly CPUs and my laptop especially, but I wholeheartedly love this machine. I think there is something utterly beautiful about long-lasting products.

This brings me back in memory. My first computer was a mere 4MHz ZX Spectrum clone proudly made in the USSR. A kickstart to run and break digital things. It is older than dirt now and still works, but it does nothing more than a bookend. While nostalgic, I decided to look at what applications I use daily for over a decade. It turned out to be a very short list.

I think I opened my first Emacs session in 2004. I was working on my bachelor thesis, having a blast exploring, designing and implementing NLP algorithms to automatically sort, rank and categorize Russian news. Not sure whether I was more excited about computer science or at last practically using Lisp.

In 2009 I opened a Dropbox account. Albeit, I am on a free plan. I don’t have a smartphone to take photos, and my backups & in-flight projects are on a special diet, yet I use Dropbox daily.

1Password. 2010. I don’t know if there was another easy-to-use password manager back in the day. A stroke of genius is to let one synchronize secrets to a Dropbox vault. A seamless integration that kept me sane and safe for 12 years.

Finally, Apple Notes. I bought the iPod touch in 2012 as a note-taking device. Today, I have a collection of over a thousand stories I have been telling myself and others all these years. Personal notes, love letters, pet projects, angry remarks, crazy ideas, draft emails, memorable quotes. Some notes are brighter than others, some are snarky and provocative, some are poignant, so tears are shed. This is my time-tested way of keeping an online journal.

A common theme that shines as a lighthouse across these notes is I breathe and live by making things regardless of how dark and cold it could be outside. One day I was joking that the moment I stop building something new, please check my pulse.

Sadly, most of those stories are collecting electronic dust and dreaming of sheep. But is it a sense, a hint of anxiety — what is to become of me? Am I making things because I love the art of building? Am I doing it for the challenge? It sounds selfish to create and not give something back. The effect of giving is dizzying. I know it.

You may be wondering: “I thought this was an article about a site generator for Apple Notes, and while I can appreciate your story, what is it? What am I supposed to do with it? Please, I’ll be straight with you. I am not building a fancy site generator for commercial success, I am not doing this for fame either. This is a passion project that I started for fun. A project that began with a silly and curious “what if?” question, which evolved to “why not?”. I hope you can share the same sentiment with me as you read it.

How to draw an owl

Here’s the plan. I will write a script that programmatically opens Apple Notes, converts them to HTML pages, and uploads files to a server. Easy peasy. Certainly, there are personal preferences, I would rather convert notes to HTML pages on the backend, for example. This approach may or may not work for you, but that is my baseline.

On a high-level, I would need to solve three tasks:

  • Figure out how to access Apple Notes programmatically
  • Flesh out the conversion algorithm to HTML pages
  • Figure out a reasonable hosting solution
  • Apple Notes under a microscope

    Part 1. Failed attempts

    So my journey begins. A trivial question, how does one programmatically access Apple Notes?

    There is no official API, and this is unlikely to change — a sweet era of privacy-first applications. Unfortunately, that also drastically narrows my options. Alright, this is not trivial as I thought it would be.

    Apple script could be an interesting idea to try. Go ahead, open Script Editor and run this script. It prints the most recent note in “iCloud” account.

    tell application "Notes"
    	tell account "iCloud"
    		get body of notes[0]
    	end tell
    end tell

    This evening I ran a few experiments and found that the resulting HTML is inconvenient and sometimes limiting. All images are encoded as base64 strings. Some objects, like tags, are not present. I don’t know whether it is a limitation of my system or default behavior. I am running Big Sur. At this point, I am no longer confident that it is a good idea to rely on an external algorithm to generate HTML files. Who knows if tomorrow, another surprise awaits me. Certainly, I couldn’t talk to someone at Apple to clarify internals. I couldn’t open a repository to read source code or submit a patch.

    FWIW, Bear (a competitive note taking application) does provide an Automator script to help users migrate from Apple Notes, and they also mentioned similar limitations:

  • Task lists convert into bulleted lists
  • Rich media links will convert to plain text links
  • Non-photo attachments like PDFs or other files are not supported and will be excluded from export to HTML files. They will remain safely in Apple Notes
  • What if there was something like a TX/RX pinout for serial communications that I could use to interact with Apple Notes? It does sound familiar, something I did before. At this point, I have three more ideas.

    What if I could intercept traffic to Apple servers and reverse-engineer the API? What if I built a proxy or unofficial Apple Notes SDK? I don’t think it is impossible, but I am also not particularly excited. I have seen enough rabbit holes in my life. It could be a fun project, but I have a different goal. I am building a static site generator.

    I could also reverse-engineer the iCloud web interface. It is a rich web application complementary to a native macOS client. The real question is, what API does it use? I briefly skimmed through Javascript files and acknowledgements page. A few libraries caught my attention: ProtoBuf.js, ByteBuffer.js, Base64.js, and Pako.js (never heard about it, a zlib compression library). These are suspicious ones.

    This is a nice reminder of the old internet days when I could “View source” of a webpage and learn something new. These days are soon to be gone — minified javascript files, WASM-compiled applications, etc.

    Part 2. Notes internals

    My last idea is simple, and I can test in an hour. If it fails, I may decide to see how deep actually the rabbit hole goes or abandon the project completely. Pretty much every application needs to persist data on a hard drive. It may or may not be encrypted. It may or may not be using a proprietary format. It may or may not be doing something else nuts.

    It turned out that Apple Notes uses SQLite database to store notes locally. Go ahead, open ~/Library/Group Containers/group.com.apple.notes/NoteStore.sqlite and run .tables. You should see something like this:

    sqlite> .tables
    ACHANGE                ZICLOCATION            Z_PRIMARYKEY
    ATRANSACTION           ZICNOTEDATA            
    ATRANSACTIONSTRING     ZICSERVERCHANGETOKEN   
    ZICCLOUDSTATE          Z_METADATA             
    ZICCLOUDSYNCINGOBJECT  Z_MODELCACHE

    In older macOS releases SQLite database is named differently. For example, “Yosemite 10.10” stores data in “NotesV4.storedata”, and “Mavericks 10.9” in “NotesV2.storedata”. Can’t tell what happened to the third version. Sounds like a fun fact for a Friday trivia night.

    The database schema is not permanent and after iOS9 has been substantially redesigned. Everything is different in iOS8. I was lucky to have a very old backup to test. It would be interesting to hear a migration story. I found this article on Apple’s website that adds more colors. iOS11 or iOS12 introduced another change — Account and Folder objects are stored in a different table. iOS13 release introduced a breaking change to drawings. Anyhow, if I can make sense of the schema for the latest macOS, I can support other versions too. Given enough time.

    The total number of records in ZICNOTEDATA table roughly matches my notes. Some plaintext columns in ZICCLOUDSYNCINGOBJECT also look interesting. My working theory is that the former table stores notes, and the latter describes some other kind of other objects? I am making a mental note and carrying on.

    The only column that may contain data in ZICNOTEDATA is ZDATA, which is a blob. What format does Apple use to store notes? Here is an example:

    0000000 1f 8b 08 00 00 00 00 00 00 13 e3 60 10 9a c7 c4
    0000010 c1 20 c0 20 35 9d 49 48 d1 23 35 27 27 5f 21 3c
    0000020 bf 28 27 85 8b 2b 24 23 b3 58 01 88 12 15 4a 52
    0000030 8b 4b 14 f2 f2 4b 52 f5 a4 04 b8 58 40 aa 81 ea
    0000040 c1 b4 06 23 58 84 11 28 c2 26 05 a6 35 98 a4 84
    0000050 c0 22 6c 02 4c 10 11 05 46 0d 66 a8 2a 0e 01 69
    0000060 a8 2a 16 29 31 2e 0e a0 09 ff 81 80 1f 68 1a 9c
    0000070 ad 24 c3 25 c5 25 60 fe b8 b0 c0 fc 99 63 df 9a
    0000080 9a 7f 8b 8b d4 ad b3 84 98 38 94 81 98 51 8b 8f
    0000090 83 55 08 68 b3 04 63 c6 a1 d3 1b 66 b0 01 f9 4c
    00000a0 50 fe 61 28 1f 26 7f 04 cc e7 e2 60 14 62 80 b2
    00000b0 79 38 38 84 98 80 32 47 a1 3c 36 30 ef 18 94 c7
    00000c0 0a e6 1d 87 f2 18 c1 bc 13 20 1e 00 94 6f 3b 42
    00000d0 23 01 00 00

    The first result in Google for "1f8b 0800” signature suggests that it is a gziped string. Can I decode it? Sure I can:

    > require 'zlib'
    > f = File.open('helloworld.bin', 'rb').read
    > Zlib::Inflate.new(Zlib::MAX_WBITS + 16).inflate(f)
     => "\b\x00\x12\x9E\x02\b\x00\x10\x00\x1A\x97\x02\x12!Hello World\n\nThis is a test note.\x1A\x10\n\x04\b\x00\x10\x00\x10\x00\x1A\x04\b\x00\x10\x00(\x01\x1A\x10\n\x04\b\x01\x10\x00\x10\x06\x1A\x04\b\x01\x10\x00(\x02\x1A\x12\n\x04\b\x01\x10\x06\x10\x02\x1A\x04\b\x01\x10\x00 \x01(\x03\x1A\x10\n\x04\b\x01\x10\b\x10\e\x1A\x04\b\x01\x10\x00(\x04\x1A\x16\n\b\b\x00\x10\xFF\xFF\xFF\xFF\x0F\x10\x00\x1A\b\b\x00\x10\xFF\xFF\xFF\xFF\x0F\"\x1C\n\x1A\n\x107\xE3qp7\xE6A\x8E\xAC|\xFE\xA3r';j\x12\x02\b#\x12\x02\b\x01*\x0E\b\x05\x12\x04\b\x00\x18\x01h\xC2\xCB\xB0\x98\x06*\x0E\b\x02\x12\x04\b\x00\x18\x01h\xC3\xCB\xB0\x98\x06*\x0E\b\x05\x12\x04\b\x00\x18\x01h\xC4\xCB\xB0\x98\x06*\n\b\x01\x12\x00h\xC4\xCB\xB0\x98\x06*\f\b\b\x12\x02\x18\x01h\xC5\xCB\xB0\x98\x06*\f\b\x06\x12\x02\x18\x01h\xC6\xCB\xB0\x98\x06*\f\b\x05\x12\x02\x18\x01h\xC7\xCB\xB0\x98\x06*\f\b\x01\x12\x02\x18\x01h\xC8\xCB\xB0\x98\x06"

    At the very least, I can see my test note in plain text. I will take it as my first victory.

    Part 3. Protobufs

    I remembered that the iCloud web application uses zlib & protobuf libraries and decided to give protobuf-inspector a try to analyze raw data. A protobuf is a cross-platform data format to serialize structured data. It is like JSON, except it is a binary format described in a definition file, which compiles to a target language. Once it is compiled, there is no easy way to retrieve original fields and meanings.

    I think I could have disassembled Notes code, but it feels a bit too much for this project. It shouldn’t be that hard to extract at least basic attributes.

    $ protobuf_inspector < proto.bin
    root:
        1 <varint> = 0
        2 <chunk> = message:
            1 <varint> = 0
            2 <varint> = 0
            3 <chunk> = message:
                2 <chunk> = "Hello World\n\nThis is a test note."
                3 <chunk> = message:
                    1 <chunk> = message(1 <varint> = 0, 2 <varint> = 0)
                    2 <varint> = 0
                    3 <chunk> = message(1 <varint> = 0, 2 <varint> = 0)
                    5 <varint> = 1
                3 <chunk> = message:
                    1 <chunk> = message:
                        1 <varint> = 0
                        2 <varint> = 4294967295
                    2 <varint> = 0
                    3 <chunk> = message:
                        1 <varint> = 0
                        2 <varint> = 4294967295
                5 <chunk> = message:
                    1 <varint> = 5
                    2 <chunk> = message(1 <varint> = 0, 3 <varint> = 1)
                    13 <varint> = 1661740482
                5 <chunk> = message:
                    1 <varint> = 2
                    2 <chunk> = message(1 <varint> = 0, 3 <varint> = 1)
                    13 <varint> = 1661740483
                5 <chunk> = message:
                    1 <varint> = 5
                    2 <chunk> = message(1 <varint> = 0, 3 <varint> = 1)
                    13 <varint> = 1661740484

    (this is not a complete message, just for illustration purposes)

    This is a very important bit. It does confirm that ZDATA column stores data in zlib compressed protobufs. This is a relief that Notes doesn’t use a proprietary format which would have taken me weeks to reverse engineer. This data structure encodes rich content. If I figure out the meaning of these fields, I will be able to convert a note to an HTML file.

    A field in a protobuf is just a binary structure that represents something. Something that makes sense for Apple Notes, but not for me. However, something rings a bell — I have seen this before. I am looking at you, blocks that start with “5 <chunk> = message”. This section looks suspiciously similar to a length-delimited fields data structure. Right now, I am not sure why there are more chunks than in the original message. I created a simple note saying “Hello World\n\nThis is a test note”. It is like a note is broken down into multiple blocks. Undos, history? Anyway, it is a matter of time to figure this out, so I added it to the “look at this later” list and moved on.

    It is time to talk about ZICCLOUDSYNCINGOBJECT table. It stores embeddable attachments, something that is too large or complex to represent in ZICNOTEDATA table. For example, this query returns 5 attachments like tables, images and a URL:

    sqlite> SELECT a.ZIDENTIFIER, a.ZTYPEUTI, b.ZIDENTIFIER, b.ZFILENAME, a.ZURLSTRING, a.ZTITLE
       ...> FROM ZICCLOUDSYNCINGOBJECT a
       ...> LEFT JOIN ZICCLOUDSYNCINGOBJECT b on a.ZMEDIA = b.Z_PK
       ...> WHERE a.ZCRYPTOTAG IS NULL AND a.ZTYPEUTI IS NOT NULL
       ...> LIMIT 5 OFFSET 45;
    C5AFE612-364D-4C13-8911-BB7762F3F245|com.apple.notes.table||||
    761CB35C-9473-4701-A316-68E40DE11B75|public.png|AAAD6540-977E-4DDC-866A-7437E667391C|image.png||New file
    8777A83F-98B5-4C58-8099-AFB0D7B89DCE|com.apple.notes.table||||
    1A8B8E06-42B5-475F-8715-523827576486|public.jpeg|F669B83A-0C47-408E-9645-015737F3B11F|Pasted Graphic.jpg||Pasted Graphic.jpg
    6A6BE95D-6CE5-4B02-BC6B-83BE9BFAAA70|public.url|||https://www.apple.com/macbook-air-m2/|MacBook Air with M2 chip
    

    The most important column is ZMERGEABLEDATA, which stores actual data. It is a blob (or a zlib compressed protobuf to be precisely clear), and I am not surprised. I know how to parse it already.

    I started looking at these objects, and I realized that “tables” use very complex protobufs. Why is that? What did I miss? I am pretending to wear an Apple engineer’s hat to reason about that, but my brain doesn’t come up with an answer.

    Part 4. Complete protobufs

    Suddenly, I felt a sparkle. Wait. I think I already mentioned something about protobufs a few paragraphs above? Let’s rewind. Oh boy, it is clear! This is so enlightening! I can’t wait to tell you. It is like I was reading a cookbook in a language I don’t understand and trying to write down a recipe by looking at illustrations. At last, I found a translation to the language I can speak natively.

    iCloud interface uses protobufs and zlib libraries, right? They must be using the same protocol to broadcast changes to all platforms. If I am lucky, they might have included something interesting in compressed javascript files. Anyhow, let’s see if there is anything useful. Of course, there is.

    This command gets me complete protobufs.

    $ uglifyjs -b <=(curl 'https://www.icloud.com/applications/notes3/2221Project27/en-us/main.js') | grep '"proto2"'

    I am not showing the actual output of the command, but it prints protobufs in cleartext. I have to say that it was very kind of Apple not to delete comments. For example, this line “// Not used, for future compatibility” is helpful. Thank you! The only question I have is why do they keep protobufs in plaintext?! It is possible to compile definitions to javascript classes. Certainly, it would have made this project more difficult to pull together. It would have taken me more much longer to reverse-engineer protobufs.

    I will explain why I felt so excited. A complete protobuf allows me to recreate internal data structures that Apple Notes uses to store data. One-to-one mapping. A data structure gives more than a meaning, it helps me to reason about algorithms and other implementation details.

    This is my second victory. I can decode messages now. Let’s give it a spin:

    > Topotext::NoteStoreProto.decode(data)
     =>
    <Topotext::NoteStoreProto: document: <Topotext::Document: version: 0, note: <Topotext::String: string: "Hello World
    
    This is a test note.", substring: [<Topotext::Substring: charID: <Topotext::CharID: replicaID: 0, clock: 0>, length: 0, timestamp: <Topotext::CharID: replicaID: 0, clock: 0>, child: [1]>, <Topotext::Substring: charID: <Topotext::CharID: replicaID: 1, clock: 0>, length: 6, timestamp: <Topotext::CharID: replicaID: 1, clock: 0>, child: [2]>, <Topotext::Substring: charID: <Topotext::CharID: replicaID: 0, clock: 4294967295>, length: 0, timestamp: <Topotext::CharID: replicaID: 0, clock: 4294967295>, child: []>], timestamp: <Topotext::VectorTimestamp: clock: [<Topotext::VectorTimestamp::Clock: replicaUUID: "7�qp7�A��|��r';j", replicaClock: [<Topotext::VectorTimestamp::Clock::ReplicaClock: clock: 35>, <Topotext::VectorTimestamp::Clock::ReplicaClock: clock: 1>]>]>, attributeRun: [<Topotext::AttributeRun: length: 5, paragraphStyle: <Topotext::ParagraphStyle: style: 0, writingDirection: :LeftToRight>, timestamp: 1661740482>, <Topotext::AttributeRun: length: 2, paragraphStyle: <Topotext::ParagraphStyle: style: 0, writingDirection: :LeftToRight>, timestamp: 1661740483>, <Topotext::AttributeRun: length: 5, paragraphStyle: <Topotext::ParagraphStyle: style: 0, writingDirection: :LeftToRight>, timestamp: 1661740484>, <Topotext::AttributeRun: length: 1, paragraphStyle: <Topotext::ParagraphStyle: >, timestamp: 1661740484>], attachment: []>>>

    (not a complete output, just for illustration purposes)

    It is time to get my hands dirty and find out how to work with complex objects in ZICCLOUDSYNCINGOBJECT such as tables, drawings, images and others. In fact, let’s call those objects “attachments” from now on.

    Part 4. CRDT

    After a few nights, I understood what a “table” is. I found a hint, some type names include references to “com.apple.CRDT.CRTree”. A reasonable guess is tables must maintain CRDT properties and preserve the semantics of adding, removing, ordering columns and rows in the face of conflicts. What if a user deletes a column and another one makes a change to a cell at the same time? What if two users try to rearrange the same column? How to resolve these operations? Great questions! If you know someone who ever tried to figure out how Apple’s CRDT works, I would love to chat! Fortunately, I don’t need to answer these questions right now, I already know enough about data structures to come up with the algorithm and parse table objects.

    If you haven’t heard about CRDT, let me explain. CRDT is a data structure that could be replicated across multiple clients without a central authority. Here’s a simple illustration of how it works. Say, we have three players: A, B, and C. Player A starts with a number 1, B — 2, and C — 3. Each player wants to sum up these numbers. CRDT broadcasts numbers and each player will eventually come up with the same result, which is 6. Player A starts by sending number 1 to player B and C, and other players do the same. It does not matter in which order a player receives a number, the total will always be the same. It happens concurrently, independently, and even if players may have different states at any particular time, they are guaranteed to eventually converge. Congratulations! We just created a very simplistic CRDT algorithm with a single operation.

    This cool because this is a product story. If I read iOS13 changelog right, it is the first iOS version that introduced collaboration features. CRDT was a pragmatic decision to implement them. This is why they rearranged columns and introduced breaking changes that required users to migrate data. How cool is that?

    Anyway, a table is an object that consists of an ordered set of row ids, an ordered set of column ids, and cell columns identified by column id and row id. There are a few other gotchas; however, I will explain them in the reference document.

    For rich text, they use something homegrown called “topotext”. I think, Apple Notes maintains two algorithms to replicate data: one for topotext and the other one for attachments. Topotext is the older data structure, and a data structure for attachments feels more like a proper CRDT.

    Protobufs from the web interface don’t include drawings, but each object provides a fallback image. As long as drawings are read-only, the web interface doesn’t need to merge changes on-the-fly and can rely on a pre-processed image. At this point, I don’t think it is worth pursuing complete protobuf definitions as what I have already is sufficient to access everything.

    It is not worth explaining how objects like images or hashtags are structured. Trust me, it is very simple. I will prepare another writeup as a reference document later. Also, there are some obscure features like password-protected notes that I don’t use — I won’t be bothering to reverse-engineer them. If you use them, I would be delighted to learn more!

    Great, at this point I can export notes to in-memory objects. My third victory.

    There is something important I have to tell you. SQLite database stores lots of data, and I am not talking about personal notes. For example, if a note is shared with someone, SQLite keeps a record, including an email address and possibly a phone number. Also it is funny that Apple Notes recognizes uploaded images and stores keywords in a database. Keywords that Notes uses to search them. I didn’t know that.

    Part 5. Once internals are cracked

    Once I learned how objects are represented in SQLite database, I have pretty much solved another task — generate HTML documents.

    After a bunch of tests, I ended up with these paragraph styles for topotext objects:

    PARAGRAPH_STYLE_TITLE = 0
    PARAGRAPH_STYLE_HEADING = 1
    PARAGRAPH_STYLE_SUBHEADING = 2
    PARAGRAPH_STYLE_MONOSPACED = 4
    PARAGRAPH_STYLE_BULLETED_LIST = 100
    PARAGRAPH_STYLE_DASHED_LIST = 101
    PARAGRAPH_STYLE_NUMBERED_LIST = 102
    PARAGRAPH_STYLE_TODO_LIST = 103

    Font styles are identified by these attributes:

  • fontHints: 1 - bold, 2 - italic
  • underline: a boolean flag
  • strikethrough: a boolean flag
  • superscript: a boolean flag
  • As a side note, I think those styles could have been implemented in a single column — a binary mask, but it is what it is. At least in my algorithm I can create a synthetic mask to simplify the logic, i.e. a topotext with style 15 will represent a bold, italic, underlined, and struck text. Nested lists are funky, but I will describe them in the next post. The final implementation that parses topotext objects is about 100 lines. Attachments are implemented in their own classes with a few getters and generate_html method. For a curious reader, here’s a list of attachments that it can export:

  • office/photo/audio/video files
  • drawings/sketches
  • thumbnails
  • galleries (scanned documents)
  • mentions and hashtags
  • urls
  • vcards
  • tables
  • So far, I have been querying the SQLite database with manually crafted SQL queries. Now, it is time to make it more interesting. A couple of lifetimes ago, I needed to implement a data abstraction layer for a legacy database. Everything that you could expect and even worse like no standardization across table names, odd primary/foreign keys. It was a mess.

    I decided to take Sequel again (an ORM layer for Ruby applications) to see how far I can go. What a breeze of fresh air. It is great to see in one place which columns and relationships I need to define to make everything work. Once models are ready, I can query Apple Notes from a Ruby script. Look at this beauty! This script searches a folder in the first account for an image of an animal that is bigger than 700px.

    > User::Account.server(database: "tmp/test.sqlite").first.
    > folders_dataset.offset(1).first
    > notes.
    > map(&:attachments).flatten.
    > find { |el| el.is_a?(User::ObjectTypes::EmbeddedPublicJpeg) &&
    >   el.height > 700 &&
    >   el.summary[/animal/i] }.
    > media_filepath
     => "Accounts/<uuid>/Media/F669B83A-0C47-408E-9645-015737F3B11F/Pasted Graphic.jpg"

    I think that Sequel models are easy to follow. Certainly, they don’t work for iOS9 or iOS11 SQLite databases, but you can see that it won’t be hard to version them to support other Notes releases. I parse protobufs in Note#attachments and Note#generate_html methods.

    class Account < Sequel::Model(UserDatabase[:ZICCLOUDSYNCINGOBJECT])
      one_to_many :folders, key: :ZOWNER
      one_to_many :notes, key: :ZACCOUNT3
    
      dataset_module do
        def default_scope
          where(Sequel.~(ZNAME: nil))
        end
      end
    
      set_dataset(default_scope)
    end
    
    class Folder < Sequel::Model(UserDatabase[:ZICCLOUDSYNCINGOBJECT])
      many_to_one :account, key: :ZOWNER
      one_to_many :notes, key: :ZFOLDER
      one_to_many :folders, key: :ZPARENT
    
      dataset_module do
        def default_scope
          where(Sequel.~(ZTITLE2: nil))
        end
      end
    
      set_dataset(default_scope)
    end
    
    class Note < Sequel::Model(UserDatabase[:ZICNOTEDATA])
      many_to_one :account, key: :ZACCOUNT3
      many_to_one :folder, key: :ZFOLDER
      # one_to_many :attachments  # <- no SQL, I parse AttributeRuns from a protobuf
    
      dataset_module do
        def default_scope
          join(:ZICCLOUDSYNCINGOBJECT, Z_PK: :ZNOTE).where(Sequel.~(ZDATA: nil))
        end
      end
    
      set_dataset(default_scope)
    
      def generate_html
        # parse protobufs in ZNOTEDATA - rich text
        # parse protobufs in ZICNOTEDATA - media files, attachments
      end
    end

    Notes as a website

    At this point I can convert notes to individual HTML documents. However, I need a website. A preliminary list of gentlemen’s features that I am looking for: smart navigation, clean URLs, RSS feeds, custom themes, and a generator must be blazing fast.

    I am not reinventing the wheel. There are enough static site generators already that check all these boxes. I decided to drop Hugo on top as a nice cherry. Of course, it is a standard Hugo, batteries included. Whatever theme, templates, or configuration I want to use. The world is my oyster. For what it is worth, I use Hugo for all my static websites already.

    Locally I run a tiny application that I called notes.agent. It listens to events in SQLite database and whenever a change happens, it sends a diff to the server.

    I don’t replicate the entire SQLite database. I sync only those changes that are made to published folders. I explicitly filter out sensitive data like CKShareParticipant records (shared notes could potentially include PII). I don’t want it, I don’t need it. Also, some notes could be password-protected, I filter them out too. If someone wrote a password-protected note, I guess one doesn’t want the rest of the world to read it. A diffing algorithm didn’t make into this post, I will write another post.

    Apple Sign-in powers authentication. Whenever a client sends data to the server, a request is authenticated. I don’t need email or personal information. Agent application requests access_code and jwt and sends them to the server. On the backend, I check that they both are valid, and jwt matches identity. If Apple server says OK, this is a valid user.

    I thought it would be fun if each user gets an individual database per website. A multi-tenant system. This is an additional security layer that guarantees that my notes will never be published to another website. If I need to delete a website, I just delete a SQLite file. As a nice bonus, I can also version changes and create incremental backups per user.

    Hosting

    This chapter is tiny. I thought for the sake of completeness, it is worth at least to write down the current approach.

    All websites are static files that are uploaded to partitioned folders like var/a/t/attractive-grey-snail. Nginx as a reverse proxy serves websites based on SERVER_NAME environment variable. I don’t use CDN. I am not too worried about DMCA takedown requests — I am the only user. However, if things change, I will take action.

    Relevant projects

    At the finishing line I decided to google whether someone else went through the same journey as I did. A correctness check by looking at math answers on the last page. Here is a list of notable projects that I found:

  • notesutils — I think this is the first library that could successfully export notes as HTML files. It uses a drop-in protobuf parser, which I think is cool. I particularly liked the HTML generation algorithm. It is concise and much cleaner than my implementation
  • mac_apt — this is a really juicy tool that could process Mac full disk images and extract data from Apple Notes, Safari, and lots of other applications. It doesn’t use protobufs, and so I doubt it can export rich text data; however, it also gave me a few hints that I missed. For instance, it implements the algorithm to decrypt password protected notes
  • apple_cloud_notes_parser — this tool uses custom protobuf definitions to extract rich text and attachments from Notes. There are also a few interesting gotchas about older macOS releases, which is very neat. It runs raw SQL queries and doesn’t use ORM.
  • What is next?

    I created a gorgeous hack for Apple Notes. A hack that could help me not to use SFTP, a git repository, and a publishing platform to maintain a website. I will migrate my static sites to Notes to prove that the parsing algorithm is sound and doesn’t break on any edge cases.

    I know that certain things must be improved. This is a collection of my “oh, shit” moments that didn’t make into this post:

  • This website says that it was published in August, but it doesn’t make any sense — I published it on September, 12. My first draft, a note was created in August. It comes with metadata.
  • No syntax highlighting in code blocks. It may or may not be possible to customize Hugo to auto-detect styles; but, I think that there are other situations where metadata per paragraph could be handy.
  • I need to write down a rich text parsing algorithm especially for nested and todo lists
  • A diffing algorithm needs extra attention. This is a custom replication solution which must be bullet-proof. It filters out sensitive data.
  • But what is coming next after that? I had to create this reserved space where I could pour my crazy ideas.

    Wouldn’t it be nuts if Apple Notes could deliver RSS feeds? I publish a website with Notes and could also receive new posts from friends in Notes too (perhaps, a smart folder?)

    Wouldn’t it be crazy if Apple Notes could version changes and have unlimited undo?

    Wouldn’t it be fun to have an Apple Notes flavored markup language? A note could become a photo gallery, a survey, a documentation website, or a digital garden! Hear me out, it could be your very own Pinterest, Tumblr, or Flickr.

    Wouldn’t it be sick if I could add Stripe integration and sell my 3D-prints and hardware junk I collected over these years directly from Notes?

    Or not.

    Anyhow, would you like to say “Hi”, here is my email: hi@notespub.com

    Last but not least, I work for Census, and I admire it. If you like helping data teams power their operations and their organizations, I think you should join us — we are hiring!

    Thanks to Boris Mizhen, Boris Jabes, and Brad Buda for reading drafts of this.

    P.S. What makes Apple Notes special?

    Apple Notes is a batteries-included note-taking application. Easy to get started, but expensive to get out. There is no bulk export feature. PDF export is simply inconvenient. Hear me out, Apple produced over a billion devices. Notes is preinstalled on every single one. It is the only preinstalled application with a hard lock-in. For me it was not worth the hassle to use other applications like Bear, SimpleNote or Evernote as my digital garden would be jeopardized instantly. Rich text notes will be lost, attachments will be lost. This article in Wikipedia sums it up nicely.

    What can I say? Not anymore.

    P.P.S. Is it open source?

    I have to admit, that source code is in an embarrassing state. Besides this (also embarrassing) excuse, this is a toy project. At this stage this is a project that I enjoy tinkering around with for fun, an experiment. It might or might not become more widely used, but I see more cons if I release it to Github right now. If you think I am wrong, I guess, let’s chat? It should not be that hard for a software developer to follow steps in the article to get everything up and running in a day. I already explained how to approach the difficult parts. Something is not clear? I will prepare another writeup. That said, I might change my mind and open source it sooner than I think.