Today, let's talk about the usage and applicable scenarios of errGroup, which is often used. Why is it useful? Generally, when we use goroutine, we cannot return a value. If you want to pass out the result of the goroutine execution, you usually have to use the channel. The errGroup package is suitable if you want to know when the goroutine you opened encounters an error during execution and stop working, and I need to know the error value.

errGroup usage

The package needs to be downloaded and installed first.

go get -u golang.org/x/sync

An example of using it would be

package main

import (
    "fmt"
    "log"
    "net/http"

    "golang.org/x/sync/errgroup"
)

func main() {
    eg := errgroup.Group{}
    eg.Go(func() error {
        return getPage("https://blog.kennycoder.io")
    })
    eg.Go(func() error {
        return getPage("https://google.com")
    })
    if err := eg.Wait(); err != nil {
        log.Fatalf("get error: %v", err)
    }
}

func getPage(url string) error {
    resp, err := http.Get(url)
    if err != nil {
        return err
    }
    if resp.StatusCode != http.StatusOK {
        return fmt.Errorf("fail to get page: %s, wrong statusCode: %d", url, resp.StatusCode)
    }
    log.Printf("success get page %s", url)
    return nil
}

First, create a group struct. This struct itself doesn't need any fields to be set, so simply create an empty struct will do.
The group struct only provides two functions, Go and Wait, which are actually quite similar to waitGroup.
The Go function accepts a function as a parameter and returns an error. This function is actually the function that you want to execute in the goroutine. In the example above, getPage is placed in Go. The implementation of getPage is very simple. It calls http.Get(url) and returns an error if the statusCode is not 200. Otherwise, it writes a log to indicate that the execution was successful.
Finally, calling Wait means that it will start to block, similar to the waitGroup Wait method. It will wait for all the goroutines that you have opened to finish executing before exiting Wait. One difference is that Wait will return an error, which comes from the error returned by one of your goroutines.

In the example above, Wait() itself does not return an error because all web pages can be accessed normally. If you change one of the URLs to a non-existent URL, you will see the effect.

2021/10/03 11:52:23 success get page https://google.com
2021/10/03 11:52:23 get error: Get "https://kenny.example.com": dial tcp: lookup kenny.example.com: no such host
exit status 1

If both of your goroutines try to access a non-existent URL:

2021/10/03 11:53:52 get error: Get "https://kenny.example.com": dial tcp: lookup kenny.example.com: no such host
exit status 1

You will find that only one error is printed at the end because errGroup only stores the error from one goroutine. Which goroutine's error is stored depends on which one encounters an error first and will be stored. The errors from subsequent goroutines will not be stored.

You may feel that what you want is to know the results of all errors, and it is not helpful to just give me the error from one goroutine. In the example above, it is indeed not ideal because you won't know which URLs are unable to be accessed. So personally, I think the scenario where errGroup is suitable is when the tasks you want to execute are the same or of the same nature. For example, if you have multiple goroutines that need to access the same service to obtain different information. Therefore, when one of the goroutines fails, the most likely reason is a network issue, and other goroutines accessing the same service will also be unable to access it successfully. Additionally, if what you want is that when one goroutine fails, even if other goroutines have completed successfully, it is not helpful, then errGroup is very suitable.

However, simply using an empty Group struct is not helpful.

When errGroup is running, even if one of your goroutines encounters an error, it will not cancel the other goroutines. This means that other goroutines cannot be canceled in time, and it is impossible to know if the goroutines have exited correctly.

errGroup has considered this situation, so it chooses to use the context method to cancel other goroutines.

One example

func main() {
  eg, ctx := errgroup.WithContext(context.Background())
  eg.Go(func() error {
    for i := 0; i < 10; i++ {
      select {
      case <-ctx.Done():
        log.Printf("goroutine should cancel")
        return nil
      default:
        if err := getPage("https://blog.kennycoder.io"); err != nil {
          return err
        }
        time.Sleep(1 * time.Second)
      }
    }
    return nil
  })
  eg.Go(func() error {
    for i := 0; i < 10; i++ {
      select {
      case <-ctx.Done():
        log.Printf("goroutine should cancel")
        return nil
      default:
        if err := getPage("https://google.com"); err != nil {
          return err
        }
        time.Sleep(1 * time.Second)
      }
    }
    return nil
  })
  if err := eg.Wait(); err != nil {
    log.Fatalf("get error: %v", err)
  }
}

errGroup provides WithContext to put your parent context in, and returns a group struct and context. This context is actually a cancel context.
After obtaining the cancel context, you can put it in the Go function and use select <-ctx.Done() to know whether to be canceled and thus end the goroutine. In this scenario, I access the URL ten times. If there is an error once, I return err, and if I receive ctx.Done(), I return nil to end the goroutine. I access one of the URLs incorrectly to see the effect:

2021/10/03 12:11:40 success get page https://google.com
2021/10/03 12:11:41 goroutine should cancel
2021/10/03 12:11:41 get error: Get "https:kenny.example.com": http: no Host in request URL
exit status 1

As you can see, the second goroutine successfully accessed the URL once, but it had to print out the log and end the second goroutine because it received ctx.Done before successfully accessing the URL for the second time. Finally, the Wait error printed out the error from the first goroutine.

This approach ensures that when one goroutine encounters an error, it also notifies other goroutines to end their work and ensures that the goroutines can exit normally.

How does errGroup work internally

Let's take a look at its structure.

type Group struct {
  cancel func()

  wg sync.WaitGroup

  errOnce sync.Once
  err     error
}

This is the structure of the Group struct, which shows that there is a cancel func() that is used for the previously mentioned WithContext. The appearance of WaitGroup indicates that errGroup is also implemented through WaitGroup, and Once is for the existence of accepting only one error. err is the error value returned by Wait in the end.

Below is the Go function

func (g *Group) Go(f func() error) {
  g.wg.Add(1)

  go func() {
    defer g.wg.Done()

    if err := f(); err != nil {
      g.errOnce.Do(func() {
        g.err = err
        if g.cancel != nil {
          g.cancel()
        }
      })
    }
  }()
}

Through wg.Add(1), a goroutine is opened internally to execute the function passed in, and the return value is checked for errors. If there is an error, errOnce.Do is used to store the error. Because the feature of Once is that no matter what function you pass in, Once will only execute once, even if multiple goroutines return errors. Therefore, g.err = err will only be set once, and its value will not be changed once it is successfully set.

Finally, it will also check whether cancel is nil. If it is not nil, what does it mean? It means that errGroup is created using WithContext, so cancel will not be nil. At this time, g.cancel() will be called to cancel other goroutines for you, and ctx.Done() will have a value.

And because it is implemented through WaitGroup, g.wg.Done() needs to be called after each goroutine completes its work to decrement it.

Below is the Wait function:

func (g *Group) Wait() error {
  g.wg.Wait()
  if g.cancel != nil {
    g.cancel()
  }
  return g.err
}

Similarly, g.wg.Wait() is called, which causes the blocking effect. After all the goroutines have finished, it will check whether cancel is nil and cancel other goroutines in the same way. The reason for doing this is more towards canceling this context. After all, this cancel context exists for this errorGroup, and it should be reset after all tasks are completed.

func WithContext(ctx context.Context) (*Group, context.Context) {
  ctx, cancel := context.WithCancel(ctx)
  return &Group{cancel: cancel}, ctx
}

As you can see, a cancel context is indeed established, and ctx is returned to the client side for use, and the cancel func is stored.

In summary, the design of errGroup is very simple, and cleverly uses WaitGroup and Context to achieve the ability to wait for all goroutines and obtain errors, and uses Context to allow the client side to design the implementation of exiting goroutines.

errGroup in GoLang explained

errGroup usage

How does errGroup work internally

RELATED

0 COMMENT

ABOUT

HOW IT WORKS

FOLLOW US

FEEDBACK