so

Not Evernote

Volume 1, Issue 7; 08 Mar 2017

It’s really useful to have a place to store all the electronic bits and bobs that cross your desk. Useful enough, in my case at least, to roll my own.

Are you not the future of all the memories stored within you? The future of the past?

Paul Valéry

Evernote was the first “note taking, organizing, and archiving” application that really worked for me. It got me into the habit of downloading, scanning, or otherwise saving all manner of documents. (No, really, I mean all manner of things; I have a folder called Ephemera with scans of ticket stubs in it.)

It was a great app. It’s probably still a great app. But over time, I drifted away from it. There’s no smoking gun. I was (am still, it occurs to me) a premium user; I didn’t begrudge them the price changes, ya gotta make payroll. They invested in features like “Work Chat” that I didn’t care about at all, and pushed them hard, but not (quite) to the point where I’d walk away because of them.

If I had to pick a single issue that was to blame for my departure, I’d say it was lack of a Linux client. Yeah, I get it, small market share, notoriously unwilling to pay for software, etc. That some suits got together and decided that investment in a Linux client could never be recouped is what it is. It dramatically increased the friction I experienced trying to use the app. Oh, and the web client really sucks. Yeah, maybe more than anything else, that’s the problem.

Doesn’t matter. I want to interrupt my gripe session with a really important shout-out to the Evernote team: they published an API that let me access all of my data. I consider the ability to get my data out a non-negotiable feature of a platform where I’m going to store information.

One last observation. Evernote can read all the data stored in it. I get why that has to be the case, they’re doing all the heavy lifting of analysis and indexing that let’s me search it. But it means anything I store in it is at risk. So I never uploaded a whole range of documents that I would ideally like to have in my note system: tax returns, medical documents, bank records, credit card statements, etc.

Epiphany · I was unhappily using Evernote for a long time, always vaguely irritated by the fact that whatever Evernote is doing, MarkLogic could almost certainly do it at least as well, if not better. Well. Except that Evernote does all that magic OCR stuff that let’s you search for text in mobile phone images and screen captures and the like that you uploaded.

The epiphany I had was that I don’t actually care about the absolute quality of any OCRed text. I don’t plan to print or display or use the OCRed text in any interesting way except search. So, if I take an image and OCR it with something like Tesseract and the resulting text is close enough so that a MarkLogic search (with all the stemming and other magic it has) does “well enough”, I’m golden. And, in fact, I am.

Let me be quick to point out that Evernote does better. If you look at the searchable OCR data that Evernote is working with (you can get it through their API), you’ll find that they’ve augmented it significantly. They seem to take each OCRed token and compute a bunch of permutations on it that they also store in the file. This increases the chance that they’ll store the word you’re searching for even if the OCR was off by a little bit. Clever stuff that I’ve made no effort to reimplement.

Yeah, and?

And I have a little system that lets you upload text, HTML, PDF, binary, or any of several flavors of Markdown. It stores these in an envelope document, with some metadata. If you upload some binary format, it runs OCR over it and stores the result in the envelope as well.

A little XQuery and some JavaScript later and I have a workable replacement for Evernote. It does just a core set of features I care about: store and display (searchable) documents, put them in folders, associate tags, and associate GPS coordinates with them. I even repurposed my weblog SimpleMDE editing framework to allow me to edit text notes in a browser.

They’re all stored on a system that I control, so I don’t mind uploading everything to it. (You can argue that I’m a moron and Evernote is more secure, but I can counter-argue that Evernote is a way bigger target.)

When MarkLogic 9 rolls out, I’ll add encryption at rest and feel really pretty confident that it’s safe.

Please provide your name and email address. Your email address will not be displayed and I won’t spam you, I promise. Your name and a link to your web address, if you provide one, will be displayed.

Your name:

Your email:

Web address:

Do you comprehend the words on this page? (Please demonstrate that you aren't a mindless, screen-scraping robot.)

What is six minus three?  (e.g. six plus two is 8)

Enter your comment in the box below. You may style your comment with the CommonMark flavor of Markdown.

All comments are moderated. I don’t promise to preserve all of your formatting and I reserve the right to remove comments for any reason.