Transcript
Hello, and welcome to the Linux Lemming.
I’m your host, Rasta Calavera, and here we are now at Episode 5.
I know it’s been a while since a release has come out, so we’re going to start today
by talking about where are my episodes.
So my release schedule was recently disrupted due to illness.
Luckily, I’ve been healthy, but members of my family and the academic institutions that
they attend were impacted by the COVID pandemic.
Thankfully, nobody in my family contracted COVID, but we had to practice quarantine procedure
to keep ourselves and our communities safe.
So this took time away from the project, and something like this may occur again in the
future.
So it’s nice that I don’t have a committed release schedule, but again, as time allows,
these episodes will continue.
So moving on to our next topic, what is the status with documentation?
So very recently, I have been banging my head against the wall, trying to get Wallabag
to run on a Raspberry Pi 4.
If you’re not familiar with the Wallabag project, it is an internet archiving bookmarking type
piece of software, similar to other read-it-later softwares like Pocket or Instapaper, Feedly
to some extent, very similar to those types of softwares.
Some of the people in the Wallabag community are very nurturing, and I’m very thankful
to have them around.
One big issue that I came up with was that there is no ARM image for the Wallabag project.
They don’t offer anything for the ARM platform by default.
So you have to look for community Docker images or even building your own Docker image.
I went down the route of trying to find something on Docker Hub, and I found a workable image.
One key thing to note is that the latest tag in Docker Hub is not actually the latest version
of the project.
So this, I’ve always heard about this being an issue in the past and how you should always
use the version tag that you want to run, but I’ve never had a problem using the latest,
and I think that is mainly because I stick to Linux server images whenever possible.
But in this image, I had to use a 2.4.2 tag if I wanted the latest and greatest version
of Wallabag, which I certainly did, because once I did get the project up and running,
there was a feature that I desperately wanted where when I imported an OPML file, which
is like all of my RSS feeds, they all came in as unread, and I have read those and want
to keep those, and there was no mass select option in the older version of Wallabag.
Actually that experimental feature just came out about a month ago or so.
So I desperately wanted that new image.
When I went out into the community, when I was having problems getting it up and running,
my biggest frustration is documented on Reddit in the self-hosted subreddit.
If anybody out there wants to see that whole crusade for aid, a user did provide me a working
compose file, so I had submitted what I was currently using, and it had undergone some
review and some tweaks, and that really helped me troubleshoot.
And I’m just very appreciative, and it’s always nice when someone takes that time and
that extra care on an asynchronous platform like Reddit or in a forum, and it brings
me to the reflection of really what I was doing wrong.
So this is an instance where the problem existed between the keyboard and the chair.
Part of it was my own ignorance in not reading very closely the documentation and having
more confidence in my own abilities than I actually possessed.
So that brings us to what were some of those problems.
Wallabag does have documentation, but I didn’t like the approach of having to clone their
repo and use their own compose file, and since I was doing the LSIO swag reverse proxy,
I like to follow their approach and stay as close to that project as I possibly can.
LSIO does provide a proxy config for Wallabag, so that was a great place to start.
One of my hangups is I don’t like having the full name of the service in front of my domain.
So for example, in the LSIO proxy config and in all of their configs, they always start
the whole URL slug with the name of the service.
So in this instance, they’re saying you should have wallabag.mydomain.com.
And I just, I think that complicates things a bit.
I think it opens up visibility on the wider web, which I don’t necessarily want.
But once I got Wallabag working on the local host, I wanted to be able to access it using
my domain.
And there are a few things that are very picky about this process, both with the LSIO way
of doing things and also with the Wallabag way of doing things.
So for Wallabag, there is an environment variable that is the base URL or the domain.
And that is extremely, extremely picky.
One thing that I encountered that was difficult during the local host setup was you have to
have the port number at the end of your URL.
So if your IP scheme is something like 10.10.0.1 for your router, and your actual computer
is 10.10.0.4, then you would have to use that IP plus the port of Wallabag, which is port
80, to be able to reach that locally.
You could not just put in 10.10.0.4 and reach your instance, it won’t work.
And then when you’re running things in a container, you typically have the exposed port and the
internal port.
So in your compose file, in your port section, there are two numbers, one before the colon
and one after the colon.
So if you have nothing else running on port 80 externally, then your port scheme would
be 8080.
But for me, I have a lot of different services, a lot that use 80 and 443.
So you have to be very conscious about how you set that up.
So I spent a lot of time bringing containers up and down and just messing with that domain
name parameter.
I switched it to HTTPS, HTTP, my domain name.
I tried my external Docker port of 80,000, and I would always hit a 502 error from my
swag instance.
So what I did was just waste hours searching online after messing with this domain thing.
And then I figured, you know what, I’m just going to leave that alone with my working
host, and I just need to switch gears and mess with the proxy config.
And here’s a really important thing.
If you’re messing with your proxy configs and your container names and everything else
with the LSIO project, you need to understand the naming scheme.
And I’ve skimmed over that in the past.
And this is where it finally came to bite me.
If I had been a really careful reader, it would have saved me lots of time.
So if you look at the general template for the subdomain for a proxy config from LSIO,
and I have that copied and pasted in the blog that accompanies this episode, they’re very
conscious about telling you how to set it up.
And if you are not paying close attention like I did and figured you’ve done this enough
times and it’s not a big deal, really go back and read that closely because they’re very
conscious about the container names have to match.
So in the actual template, they have that laid out very clearly where it says container
name, container name, container name.
And when I looked at my compose file, I had my service defined as Wallabag.
I had the image and then I had the environment.
I did not have a container name variable in there at all.
So that was a huge punch in the gut like, oh my goodness, how did I screw that up?
And if you look at another example compose file for any of the LSIO images, you’ll see
the service defined name.
And in my blog, I use the example of audacity.
So it’d be audacity, then the image and then directly below the image is the container
name and they put in audacity.
So if you are changing it to anything else, like say I wanted to call the Wallabag service
keep, because I wanted to keep all of my articles, then the container name has to be keep.
I can’t stay as the default Wallabag and you have to make sure that those things match.
So if I’m going to rename the service from Wallabag to keep, my new compose scheme is
going to start with a service called keep, the image for Wallabag, and then a container
name called keep.
And all of that has to match the proxy config, which means that you need to save your config
file as keep.subdomain.conf.
All of those things have to be aligned with each other.
And I’m sure experienced users are like, well, duh, of course that all makes sense.
And it makes sense to me now, but I’m sure there are other people who will make the same
mistake that I did as they try to customize things to meet their needs.
So once you get all of that aligned, you have to go back to the Wallabag compose file.
And in the symphony environment, you have to change that domain name to an HTTPS, because
it’s going to be used by the swag service, so it’s going to be a secure connection.
And you have to put in your whole thing.
So for me, it’d be like keep.mydomain.com.
And that was how I finally reached the level of success that I needed.
So in terms of documentation, like I mentioned earlier, Wallabag does have documentation,
but I think it could greatly be improved, but I don’t think I’m the one to do so.
I feel like my approach is still cobbled together, and especially because it’s using a non-upstream
image provided by the project.
I don’t feel like it would be appropriate for me to provide documentation using another
community member’s image.
I don’t feel like that’s my place.
And Wallabag change and provide their own multi-arch service getting into the ARM platform,
then maybe I would consider providing documentation for how to get that up and running.
But again, they kind of have their own way of even documenting how they recommend reverse
proxies are used using IngenX and things like that.
So because my use case is different using a different project’s reverse proxy solution,
again, I don’t know that it would be my place to provide that.
But who knows, things may change, and if they do, then I’ll be there with some documentation
ready and waiting.
I do think, though, rather than contributing to the upstream project, it would be appropriate
for me to provide this information on something like Reddit or something like Discourse, where
it’s all community-based, people all have their own use cases, and somebody may benefit
from this struggle.
So I will kind of push this out to those spaces rather than directly to the upstream project.
So as a little bonus for this episode, I have been using Feedly as an RSS catcher for ten
or more years.
It’s been forever.
I started using it early in college and just never switched away to anything else.
And they have this great feature called Boards, where you can save articles to a very specific
board.
It’s essentially like tagging something, and then it’s easy to find.
You can read it later.
It never goes away.
But when you export your RSS feeds from Feedly, it doesn’t keep any of the tags, and it doesn’t
keep them organized by board, which is a huge bummer.
And Wallabag, well, they do have an import data section for a lot of different services.
They don’t have anything directly for Feedly.
And I will say, Feedly does not make it easy to export your file.
If you go and dig into the settings of Feedly, you would expect to find an export button
front and center, and it’s not there.
It’s kind of hidden off to the side above your feeds directly, and you have to like
click this little, I don’t remember if it’s like an arrow or the three-button hamburger
type thing.
But it definitely could be easier to find.
So once I got that out of Feedly, it didn’t do what I wanted it to, because it just brought
in all of my subscriptions, but none of my organization from the board.
And that’s really what I cared about.
I can resubscribe to RSS feeds on my own.
That’s not a big deal, but I want my saved data that I have 10-plus years of time invested
in.
So I was really upset by this, and one thing that I noticed in Feedly is when you’re looking
at a board, all of your saved articles are there, and you can scroll down and scroll
down and scroll down until your entire history is present on the webpage.
And I was thinking, well, there’s got to be a way to scrape this webpage and get things
like the title of the article and the URL that’s associated with it.
There has to be a way to do that.
So I did some digging online, and I found an article from a data science provider about
this exact thing, how they scrape websites to grab the URLs and the titles of these things,
and then they put them in a nice, neat table for you so you can easily copy and paste them
into a spreadsheet.
So now that I can put things in a spreadsheet, I can make a CSV file, which is pretty universal.
And in this new RSS catcher and in Wallabag, I can put that in.
Now Wallabag, the only CSV import that they support is through a service called Instapaper.
So I had to create an Instapaper account and then export my empty data from Instapaper just
so I could see how they format their CSV.
And once I understood how they formatted it, I could use the code that I found to scrape
my feedly history and then put all of that into an Instapaper format and then import
it into Wallabag and boom.
I had my 10 years of saved data now hosted on my own land, which thank goodness for that.
I enjoy the graphical interface of feedly.
I think they do a lot of good things.
But in terms of owning your data, it’s not really there.
And that’s kind of a big disappointment for me.
So I think now is the time for me to remove myself from that ecosystem after all this
time and really double down on owning my own data.
And I do like Wallabag because not only can you tag the different articles, but you can
put in annotations, you can export them in different formats like PDF or EPUB, a whole
bunch of different things.
So the flexibility offered by Wallabag is just phenomenal.
And I will say that they do have a hosted option.
So if you don’t want to go through the struggle or the effort of hosting your own instance
of Wallabag, they do have a paid option and it’s pretty reasonable considering it’s an
open source project and everything that they offer.
When you compare the price between Wallabag and feedly, feedly is kind of tiered and Wallabag
is a fixed price.
And you can certainly get feedly for cheaper, but you still don’t have that ownership of
your data even on the paid tier.
So I think paying more to own your information is worth it in every respect.
And I was getting very close to just saying forget it and paying for that instance of
Wallabag, but my perseverance and stubbornness led me to finally getting my own self-hosted
solution.
So that brings us to the end of episode 5.