Software Engineering Blog | Clear Thinking & Practical Insights — Igna

Step-by-Step Guide to Configuring a Vite + React + TypeScript Component Library

Ignacio Miranda Figueroa — Sun, 10 Sep 2023 16:45:40 GMT

I want to thank the community for all the love and support shown to my component library template using Vite and React. I appreciate the feedback and recommendations you've shared, and I also thank future developers who use the library!

The purpose of this post is to explain how to set up the component library template. You can find all the library's features in the repo's README file, but I'll highlight the most important ones here.

Vite: Run and build the project blazingly fast!
TailwindCSS 4: Utility classes to define your styling
Storybook 9: Components Preview
Release Please: CHANGELOG.md and GitHub tags generation
Version release configuration for both the GitHub package registry and NPM registry.

Without further ado, let's dive into it 🚀

Repo setup

You can directly create a new repository by clicking the Use this template > Create a new repository button:

Then you can clone the newly generated repo and install the dependencies using pnpm install. If you don't have pnpm installed, you can always run corepack enable to activate it (only works from Node 18+). You could also use another package manager such as npm or yarn but I'd like pnpm for being faster and more efficient (:

Now you're able to run all the scripts this repository comes with. For example, running pnpm dev will start the Storybook dev server with some example components. You can find the complete list of scripts in the README file or by simply taking a look at the package.json file.

Changelog update and version release

This repo uses release-please, a tool created by Google that, quotes "automates CHANGELOG generation, the creation of GitHub releases, and version bumps for your projects." unquote.

You can find the GitHub workflow that takes care of this in the .github/workflows/release-please.yml file (more specifically, the first step of the workflow that uses google-github-actions/release-please-action@v3. For it to work we first need to go to the repo's Settings tab and click on the Code and automation > Actions > General section, then scroll down to Workflow permissions and check off the Allow GitHub Actions to create and approve pull requests checkbox.

This will allow release-please to create the Pull Request against the main branch that will bump up the version, update the changelog file and finally release the GitHub tag after being merged (The PR is created by a bot so any repo administrator has to manually approve it and merge it).

Publishing the package

The repo is configured to use the NPM registry, which I will explain first. You can skip to the next section if you want to know how to do it using the GitHub package registry.

Using the NPM package registry

We simply need to get an NPM access token for our GitHub workflow to be able to publish the package to the registry and add it as a repository secret.

Log into npm and go to the Access Token tab to create a new token. I'll use a classic one for demo purposes.

You can use either the Publish or Automation type as we need the token to be able to publish new versions of our package.

Copy the value of your token and now let's open the repo's Settings tab and go to Security > Secrets and variables > Actions. Add a new repository secret called NPM_TOKEN and paste the value of your token.

And we're done! With this, the release-please.yml workflow will be able to use this token (in line 55) to publish the package to the npm registry 🎉

Using the GitHub package registry

The configuration is pretty straightforward as we don't even need to get a new access token.

The repo has a workflow example in the .github/examples/github-release-please.yml file. It's pretty much the same as the original workflow, just including the following changes:

Adding the packages: write permission in line 8. This will be enough for the autogenerated GITHUB_TOKEN to have permission to publish the package to the GitHub registry during the Publish step.
The registry URL is now https://npm.pkg.github.com (line 40)
We now use the existing secrets.GITHUB_TOKEN as the access token in line 57 instead of having to create another one.

Simply replacing the existing release-please.yml file with the content of the example file is enough for the workflow part.

💡

You can also create a personal access token with more granulated permissions instead of having them in the permissions key inside the yaml file, and create a new repo secret and use it in line 56. What was explained above is just the simplest way to do it.

Last but not least, go to the package.json and make sure the "name" key uses the organization scope where the package will be published. For instance:

If I want to link the published package to the same vite-component-library-template repo, I'll have to change the name to `@ignacionmiranda/vite-component-library-template".
If your company is called Octocat, then the name would be "@octocat/your-library".

Finally, don't forget to update the repository.url value with the URL of your actual repo.

The above is the simplest explanation I came up with to set this up. If you want to have deeper insight, I encourage you to check the official GitHub docs about publishing Node.js packages.

Installing the library as a dependency

Using an NPM package

If your package is public, great! then you can simply go to your frontend application and run pnpm i and start using it.

If your package is private, you'll need to log in using the npm cli or a .npmrc file passing your token along with the npm registry and be invited to the npm organization that publishes the package. Here you can find some official docs about private NPM packages.

Using a GitHub package

We need some additional steps if we use this approach, but nothing crazy (:

Inside your frontend app, you'll need to create a .npmrc file in the root of the project with the following content:

# The first section is your user name or organization name in a
# kebab-case format.

# If my username is IgnacioNMiranda, then the first line should be:
# @ignacionmiranda:registry=https://npm.pkg.github.com
@:registry=https://npm.pkg.github.com
//npm.pkg.github.com/:_authToken=${GITHUB_TOKEN}

and then add the GITHUB_TOKEN variable to your .env file:

export GITHUB_TOKEN=

The token has to be a personal access token, created with at least the read:packages permission in order to download packages from the GitHub Package Registry.

After running source .env or using your preferred tool to load env vars from a .env file into the current console instance, you should be able to install your GitHub package. For example, if the package is called @ignacionmiranda/vite-component-library the command we have to run is: pnpm i @ignacionmiranda/vite-component-library.

Using the library

Here are some examples of how to use the styles of the library and a React component in a Next.js application.

/* _app.tsx for pages router, or layout.tsx for App router  */
import '/styles.css'
// More imports and your App component ...

/* page.tsx */
import { AtButton } from ''
// More imports and your Page component...

Extra: Testing the library in a frontend app locally

There are some times when we want to test the components we're building without having to publish canary, alpha, beta, or whatever versions to the registry. In order to do it we can follow these steps:

Run pnpm build:lib to build the component library and get the output in the dist folder.
Run pnpm pack to create a .tgz file. This has the same content as the dist folder and will allow us to install the library in our frontend app locally.
🚨 Right now, the pack command deletes the 'dependencies' and 'devDependencies' keys from the package.json because of the prepack command. As the pack command is normally run during the publish step in the GitHub workflow, it's not intended to be run locally, i.e. to delete these keys from the package.json. Make sure you revert this change after pushing any new commit to your repo.
Go to your frontend app and add your library as a dependency in the package.json. Instead of setting the version, add the path to the .tgz file. For instance: "vite-component-library-template": "../../vite-component-library-template/vite-component-library-template-2.0.4.tgz" .
Install the deps in your frontend app. Now you should be able to see the local changes you did in the library and use them in your local development for the frontend app.

Wrapping Up

Now we're capable of setting up a repository that contains a component library, following semantic versioning, being published to a package registry and using it in a frontend application. Hope this is useful and provides an easy explanation of how to set up this kind of application (: it can get really hard to make everything work together, becoming a real mess with all the config code and files. Feel free to create an issue in the repo if you have difficulties with something or if you just want to make recommendations to continue improving it! You can also ping me on Linkedin if you need help with anything else (:

Happy coding!

⚛️⚡ Vite + React + Typescript Component Library Template

Ignacio Miranda Figueroa — Fri, 24 Feb 2023 16:01:42 GMT

A few weeks ago I created a template library using technologies such as Vite, React, Typescript, Vitest, and Storybook. It also manages automatically version releases using GitHub Actions. Just want to share it here with the community:

GitHub Repository: https://github.com/IgnacioNMiranda/vite-component-library-template

Storybook Preview: https://vite-component-library-template.vercel.app/

I hope it can be useful for everyone that wants to start personal library projects or maybe to be used in projects for your company 😉 It also would be nice if you support it by giving it a star or mentioning it in the repo you create 😄

Happy coding! :)

Passing params from an Apache Airflow DAG to triggered DAGs using TriggerDagRunOperator

Ignacio Miranda Figueroa — Sat, 07 Jan 2023 20:54:59 GMT

So I was in this situation, struggling for like 5 hours yesterday (yes, the last 5 Friday work hours, the best ones to get stuck with some code) trying to pass parameters using the TriggerDagRunOperator, and wanting to die but at the end achieving it.

Maybe I was just not experienced enough and I fell into a really easy thing to fix but, today I'll show how to do it, so you don't have to struggle as I did 🙂 let's get into it.

Use Case

If you want to go straight to the solution you can skip this section.

I had 2 data sources, an ERP and one content environment (from now on I'll call it 'env') from a CMS (if you don't know what a CMS is, I explain a little bit about it in this post). I had 2 DAGs that run at the same time (with the same schedule_interval) and synced data from the ERP to the CMS. Each DAG syncs a specific type of data to the same env.

Until now, both DAGs were run individually, updating the CMS environment async. The sync process between the 2 data sources is not free of failures so, a new need come up, which was to first create a backup of the env and then sync the data to a new env that is a copy of the old one. If anything goes wrong, we can just switch the environment and delete the broken one.

With this, the 2 DAGs cannot run async anymore, they have to sync the data to the same environment. The proposed solution was to create a new DAG (which I'll call Wrapper from now on) that first runs this create-backup-env task and then triggers the 2 DAGs using the TriggerDagRunOperator. Also, these DAGs cannot be executed manually or with a scheduled interval anymore but the Wrapper DAG instead, the create-backup-env task has to always be run first for the 2 DAGs to always push data to the same env and don't push to old envs that will not be used anymore.

Furthermore, the 2 DAGs can receive quite many config parameters to execute or not certain tasks using the Trigger DAG w/config feature that Airflow provides, so these parameters have to be also available in the Wrapper DAG.

The Solution

FYI - I simplified the solution a lot but always kept the main components untouched.

To use the TriggerDagRunOperator, we need to define something like this:

# Wrapper DAG
from airflow.decorators import task, dag
from airflow.operators.trigger_dagrun import TriggerDagRunOperator
from airflow.operators.python import get_current_context
from airflow.utils.state import State
from datetime import datetime

@dag(start_date=datetime(2023, 1, 7), schedule_interval='@daily', catchup=False)
def wrapper_dag():
    @task.python
    def create_backup_env():
        print('Creating backup env...')

    trigger_sync_dag_1_task = TriggerDagRunOperator(
        task_id='trigger_sync_dag_1',
        trigger_dag_id='sync_dag',
        wait_for_completion=True,
        poke_interval=60,
        failed_states=[State.FAILED],
    )

    trigger_sync_dag_2_task = TriggerDagRunOperator(...)

    @task.python
    def other_task():
        context = get_current_context()
        params = context['params']  # Access to context params
        print(params['message'])

    create_backup_env() >> [trigger_sync_dag_1_task, trigger_sync_dag_2_task] >> other_task()


wrapper_dag()

# Sync DAG (let's assume we have 2 like this that are pretty similar)
from airflow.decorators import task, dag
from airflow.operators.python import get_current_context
from datetime import datetime
import logging

@dag(start_date=datetime(2023, 1, 7), schedule_interval='@daily', catchup=False)
def sync_dag():
    @task.python
    def sync():
        logging.info('Syncing data...')
        # Access to context params in order to perform certain tasks
        context = get_current_context()
        params = context['params']
        logging.debug(f'params: {params}')
        if 'run-task-a' in params and params['run-task-a']:
            logging.info('Running task A...')
        elif 'run-task-b' in params and params['run-task-b']:
            logging.info('Running task B...')
    sync()

sync_dag()

To access the params object passed to a DAG using the Trigger DAG w/config Airflow feature, we can use the params key inside the context that we retrieve using the get_current_context function. This returns the active DAG run context. We also can use the Jinja template interpolation feature that Airflow provides out of the box. That is using a string like {{ params }} in certain operator-templated fields or properties. (For a deeper insight check the official documentation).

The TriggerDagRunOperator supports a field called conf that can receive a python dictionary that will be used as the triggered DAG config. It also supports templating, which means we can do the following:

trigger_dag_task = TriggerDagRunOperator(
    task_id='trigger_dag',
    trigger_dag_id='triggered_dag',
    conf='{{ params }}',
    # conf='{{ conf }}' also this to pass the DAG conf object
    wait_for_completion=True,
    poke_interval=60,
    failed_states=[State.FAILED],
)

As I mentioned, the conf parameter expects a python dictionary. If we don't pass any config object to the Wrapper DAG it will work though, due to it will interpolate the params object (which is None), not resulting in any error. However, if we pass some parameters (for instance, {"run-task-a": true}) will result in the following error in the TriggerDagRunOperator task instance:

So we have to rewrite our conf param:

trigger_dag_task = TriggerDagRunOperator(
    task_id='trigger_dag',
    trigger_dag_id='triggered_dag',
    # You can use whichever key you want. I used 'configuration'.
    conf={'configuration': '{{ params }}'},
    wait_for_completion=True,
    poke_interval=60,
    failed_states=[State.FAILED],
)

Doing this, we have the following context['params'] object available in our triggered DAGs: {'configuration': "{'run-task-a': True}"} .

We have 2 problems here. As you can imagine, the 2 Sync DAGs were built using context['params'] instead of context['params']['configuration']. Furthermore, we're receiving a string with the python dictionary instead of the dictionary.

To handle this, we'll need to modify our sync DAGs a little bit. We can create a get_context_params util function:

# dags/utils/common.py
from ast import literal_eval
from airflow.operators.python import get_current_context


def get_context_params():
    context = get_current_context()
    params = context['params']
    if 'configuration' in params:
        params = {
            **params,
            **literal_eval(params['configuration'])
        }
        del params['configuration']
    return params

Here we're checking if the params object has a configuration property, if so, we spread the value in the first params object level as a python dictionary using the literal_eval function from the ast package. This function evaluates a string containing a Python literal, for instance, a Python dictionary. You can click here to visit the official docs and have a deeper insight into it.

Ultimately, our Sync DAG has to be rewritten as follows:

# Sync DAG (let's assume we have 2 like this that are pretty similar)
from airflow.decorators import task, dag
from datetime import datetime
import logging

from dags.utils.common import get_context_params


@dag(start_date=datetime(2023, 1, 7), schedule_interval='@daily', catchup=False)
def sync_dag():
    @task.python
    def sync():
        logging.info('Syncing data...')
        # Access to context params in order to perform certain tasks
        params = get_context_params()
        logging.debug(params)
        if 'run-task-a' in params and params['run-task-a']:
            logging.info('Running task A...')
        elif 'run-task-b' in params and params['run-task-b']:
            logging.info('Running task B...')
    sync()

sync_dag()

Now if we run the Wrapper DAG passing the following config object:

{"run-task-a": true}

We'll get the following result in the sync task logs:

With this, we're able to pass params from a parent DAG to a triggered DAG without the need of changing too much logic to use the context params (:

If you think I overcomplicated the solution (it's probably the case) I encourage you to leave a comment ^^ then all of us can continue learning (:

You can check the source code here. It includes some extra stuff like using the BranchPythonOperator to skip the syncs depending on more config parameters.

Thanks for reading!

Robots.txt and sitemap pages using Next.js and a Headless CMS

Ignacio Miranda Figueroa — Wed, 04 Jan 2023 03:29:00 GMT

Search Engine Optimization (SEO) is one of those frontend things that can always get tricky. You can have really good HTML practices, the fastest load times, meta tags or social media images. All of that is going to help a lot to increase the positioning of your site. However, there are always 2 special pages that every site that wants to be well-indexed and crawled by search crawlers must have: the robots.txt and sitemap.xml pages.

In this post, we'll go through the details of what these pages are and how to build them in a Next.js project fetching data from a Headless Content Management System (CMS). But first of all, what is a CMS?

CMS Definition

"A CMS, short for content management system, is a software application that allows users to build and manage a website without having to code it from scratch or know how to code at all. [...] With a CMS, you can create, manage, modify, and publish content in a user-friendly interface." (source)

Okay, we know what a CMS is, but what's with a Headless CMS?

You can think of headless as "detached or decoupled from the website that serves the content, mainly consumed via API". To summarize, a Headless CMS is a Content Management System that is decoupled from the main application and serves its content via API. If you want to go deeper into the definition you can visit the official explanation of one of the current biggest headless CMS.

There are many CMS out there: Storyblok, Drupal, WordPress, Contentful, Strapi, Sanity, among others. Today I'll use Contentful because it's the Headless CMS I have used the most (: but the example should apply quite the same for any.

DISCLAIMER: This is not a post about Contentful or the basics of any Headless CMS. If you're not familiar with these I encourage you to take a look into the most popular available options that best suit your needs.

Let's start talking about the main topics of this post.

Robots.txt

"Robots.txt is a text file webmasters create to instruct web robots or crawlers (typically search engine robots) how to crawl pages on their website." (source)

We can achieve the above by telling which robot can crawl our site and which pages they can crawl. The basics for these are the following properties:

User-agent: it defines which robots can crawl the site.
Disallow rules: pages that cannot be crawled.
Allow rules (google-bot only): pages that can be crawled.
Crawl-delay: How many seconds a crawler should wait before loading and crawling page content.
Sitemap: Where the sitemap page is located.

The Next.js implementation for this page is pretty straightforward and it does not need any CMS, but I didn't want to create a post just to paste this code fragment so here we are, putting everything together c:

// pages/robots.txt.tsx
import { Component } from 'react'
import { GetServerSidePropsContext } from 'next'

const isArrowed = process.env.NEXT_PUBLIC_ALLOW_CRAWLING // 'true' or 'false'
const siteUrl = process.env.ORIGIN_URL // https://my.site.com

const allow = `User-agent: *
Disallow: /500
Disallow: /404
Disallow: /403
Allow: /
Sitemap: ${siteUrl}/sitemap.xml
`

const disallow = `User-agent: *
Disallow: /
`

export default class RobotTxt extends Component {
  static async getInitialProps({ res }: GetServerSidePropsContext): Promise<void> {
    const robotFile = isArrowed === 'true' ? allow : disallow
    res.writeHead(200, {
      'Content-Type': 'text/plain',
    })
    res.end(robotFile)
  }
}

If you're not using Typescript you can just remove the types from the code. What we're doing is applying a config based on if the site supports crawling or not, this is defined using an environment (from now on "env") variable. The code has been written thinking of having multiple envs for the site (where you don't want your development or staging envs to be crawled). If your site only has one you can ignore these and just use the "allow" configuration. The same principle applies to the ORIGIN_URL variable.

The disallow variable defines that all search engine robots cannot crawl any page of the site.

On the other hand, the allow variable defines that every user agent (or just robots) is not allowed to crawl 403, 404 and 500 pages. This is mainly because we don't care about the robots crawling those due to they don't have relevant content (unless your error pages are flashy, funny and have interesting information).

Sitemap.xml

Now the real challenging section (kind of) (:

First of all, what a sitemap is? Based on my official patented, personal and not-stolen description:

"Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL [...] so that search engines can more intelligently crawl the site." (source).

So basically it defines the pages our site has and some additional metadata. The format for this page is the following:


  <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
      <loc>https://my.site.comloc>
      <changefreq>dailychangefreq>
      <lastmod>2023-01-03lastmod>
      <priority>0.8priority>
    url>
    ...
  urlset>

For each page, we have to define a block with some metadata (there are more properties but these are the most common):

loc: page's URL.
changefreq: frequency the crawler should check for page changes.
lastmod: page last modified date.
priority: that the page has on the site.

Now the question is, how can we generate this kind of page using data from our favorite Headless CMS?

When using this kind of CMS, pages normally live as entries with fields filled with content (titles, page slugs, banners, sections, headers, footers, etc). Then we fetch these pages and build the UI using some frontend library or framework like Next.js.

First things first, we need to define the Sitemap page component in our application:

// pages/sitemap.xml.tsx
export default class Sitemap extends Component {
  static async getInitialProps({ res }: GetServerSidePropsContext): Promise<void> {
    const pages = await getPages()
    res.writeHead(200, { 'Content-Type': 'text/xml' })
    res.write(createSitemap(pages))
    res.end()
  }
}

Note that we're using 2 functions here:

An async function called getPages that fetches some page data. This will help us to retrieve pages data from our CMS (in this case, Contentful) in an array.
A function called createSitemap. It receives the pages data as a parameter.

Let's dive into the second one first (also the easiest one). First of all, I'm gonna define a type for the Contentful pages (you can skip this if you're using vanilla JS):

export type ContentfulPage = {
  title: string
  slug: string
  header: ContentfulOrHeader
  blocks?: ContentfulBlock[]
  footer: ContentfulOrFooter
  updatedAt?: string
}

This is basically how our page is built in Contentful. It has a title, a slug, a header component, some block components that conform the page itself (like banners, sections, cards, etc), a footer and the updatedAt date for the page.

Now the function itself:

const createSitemap = (pages: ContentfulPage[]) => {
  return `
  
    ${generateLinks(pages)}
  `
}

You can see that we're using another function called generateLinks that receives our pages data. Let's take a look into it:

const generateLinks = (pages: ContentfulPage[]) => {
  const pageItems = pages.map((page) => {
    const slugPath = page.slug === '/' ? '' : `/${page.slug}`
    const url = `${process.env.ORIGIN_URL}${slugPath}`
    return `
        
          ${url}
          daily
          ${page.updatedAt}
          0.8
        
      `
  })
  return pageItems.join('')
}

Here we're using the pages data to build a string containing the required format for each item in our sitemap page, ultimately we return all the items joined in a single string. This is the one that it's finally been inserted in our tag.

Now that we have gone through the createSitemap function, let's start with getPages. For the sake of simplicity, I'm using an already defined Contentful client and importing it from the services/contentful file, click here if you want to go deeper with the Contentful JS SDK implementation and how to initialize a client to consume the data from the CMS.

import { client } from 'services/contentful'

export const getPages = async (): Promise => {
  const collection = await client.getEntries({
    'content_type': 'page',
  })
  const pages = collection?.items && collection.items?.length ? collection.items : null

  if (pages) return pages.map((page) => ({
    title: page.fields.title,
    slug: page.fields.slug,
    header: page.fields.header,
    blocks: page.fields.blocks,
    footer: page.fields.footer,
    updatedAt: page.sys.updatedAt,
  }))
  return []
}

What we're basically doing is using the client to fetch entries that have the 'page' type using the getEntries client method. If there are any, we store the page items on the pages variable. Then we map each page to the ContentfulPage type we defined previously to use them in the createSitemap function.

Contentful gives us the data in the following format:

{
  "fields": {
     "title": "string",
     "slug": "string",
     "header": { ... },
     ...
   },
  "sys": {
    "updatedAt": "2023-01-04T00:50:34.525Z",
    ...
  }
}

Being fields the entry fields themselves, like the title or slug, and the sys object where some metadata is defined like the updatedAt date.

And that's it! With this, we have fetched pages data from our CMS and created a sitemap.xml page for our headless site. The approach is kind of the same for other Headless CMS, the basic concept is that pages live as components in these CMS and we have to consume the data via API to build the page with our favorite language and technology. In this case, using JS and Next.js.

This was my first post so hope all of you like it (: any feedback will be well received.

I'd also like to know your thoughts, did you think this is the way these pages can be built? Have you used headless CMS before? What would have you done differently? (:

Last but not least, thanks for reading!