Files
gitea/modules/test/utils.go
Karthik Bhandary e82352f156 feat(web): Add Jupyter Notebook (.ipynb) Rendering Support (#37433)
### Summary

Closes #37308

Adds native rendering support for Jupyter notebook files (`.ipynb`) in
Gitea using backend rendering, allowing users to view formatted
notebooks with code cells, markdown, outputs, and visualizations
directly in the repository browser.

### Motivation

Jupyter notebooks are widely used in data science, machine learning, and
scientific computing. Currently, Gitea displays `.ipynb` files as raw
JSON, making them difficult to read. This feature enables users to view
notebooks in a formatted, readable way similar to GitHub and GitLab.

### Implementation Approach

**Evolution:** Initially implemented frontend rendering using `marked`
and `Shiki` libraries. After review feedback, migrated to backend
rendering for better performance, security, and consistency with Gitea
architecture.

#### Backend Rendering Advantages

- Server-side HTML generation eliminates client-side parsing overhead
- Integrates with Gitea existing markup sanitizer for security
- Uses Chroma for syntax highlighting (consistent with code files)
- Uses Goldmark for markdown rendering (consistent with `.md` files)
- No additional frontend dependencies required
- Better performance for large notebooks

### Features

#### Supported Cell Types

- **Markdown cells:** Rendered with Goldmark (tables, lists, links, code
blocks, etc.)
- **Code cells:** Syntax-highlighted with Chroma, execution counts,
language detection from notebook metadata
- **Output cells:** Multiple output types in a single cell

#### Supported Output Types

-  Text/plain outputs
-  Images (PNG, JPEG, SVG) with base64 data URIs
-  HTML outputs (tables, DataFrames, formatted text)
-  LaTeX/math equations (rendered as code blocks)
-  Error outputs with traceback (styled in red)
-  Stream outputs (`stdout`/`stderr`)
- ⚠️ Interactive widgets (Plotly, ipywidgets) show informative messages
- ⚠️ JavaScript outputs show security warning (disabled for safety)

#### Edge Cases Handled

- Empty notebooks or notebooks with no outputs
- Corrupted JSON with graceful error display
- Mixed output types in single cell
- Large base64-encoded images
- Execution count of `null` or `0`
- `nbformat` version compatibility (only renders `nbformat 4+`, shows
message for older versions)

### Changes

#### Backend (Go)

- `modules/markup/jupyter/jupyter.go` (**NEW**)

  - Jupyter notebook renderer implementation
  - Parses `.ipynb` JSON structure and generates HTML
  - Integrates Chroma for code syntax highlighting
  - Integrates Goldmark for markdown cell rendering
  - Dynamic language detection from notebook metadata
  - Handles all standard Jupyter output types
  - Comprehensive error handling with user-friendly messages

- `modules/markup/renderer.go` (**MODIFIED**)

  - Registered Jupyter renderer in markup system

- `main.go` (**MODIFIED**)

  - Import Jupyter renderer package for initialization

#### Styling (CSS)

- `web_src/css/markup/jupyter.css` (**NEW**)

  - Comprehensive styling for notebook cells, code, outputs
  - Uses Gitea CSS variables for consistent theming
  - Responsive layout with proper spacing
  - Table styling for DataFrame outputs
- Removed parent container padding for consistency with other renderers

#### Sanitizer Rules

- `modules/markup/jupyter/jupyter.go` → `SanitizerRules()`

  - Configured HTML sanitization rules for safe rendering:
    - Cell structure (markdown, code, input/output wrappers)
    - Code highlighting (Chroma classes)
    - Images (base64 data URIs only)
    - Tables (DataFrames)
    - Markdown elements (headers, lists, links, etc.)

### Security Considerations

- Server-side rendering: No client-side JavaScript execution
- HTML sanitization: Strict allowlist for HTML elements and attributes
- Image security: Only base64 data URIs allowed (no external URLs)
- JavaScript disabled: `application/javascript` outputs show warning
- XSS protection: Gitea markup sanitizer handles all HTML output

### Testing

Manual testing performed with various notebooks:

- Markdown rendering (headers, lists, tables, links, code blocks)
- Code cells with execution counts and syntax highlighting
- Multiple output types (text, images, HTML, LaTeX, errors, streams)
- Error handling for edge cases
- Theme compatibility (light/dark mode)

### Screenshots

<img width="1080" height="553" alt="image"
src="https://github.com/user-attachments/assets/aef9afa7-ed96-434d-98b0-b160565fc967"
/>
<img width="1092" height="552" alt="image"
src="https://github.com/user-attachments/assets/6e61e792-4737-41c1-851e-5c375c1f932a"
/>
<img width="1104" height="622" alt="image"
src="https://github.com/user-attachments/assets/4ac630c1-3a75-4e1c-9bba-c0a27484d001"
/>
<img width="1104" height="529" alt="image"
src="https://github.com/user-attachments/assets/33750c47-70de-4ab2-893d-e5d09fa8d9c4"
/>
<img width="1111" height="343" alt="image"
src="https://github.com/user-attachments/assets/52107d9f-0e06-420b-9ab4-1603dcd676b1"
/>
<img width="1091" height="650" alt="image"
src="https://github.com/user-attachments/assets/0addae21-efa4-44bb-a56e-0418e3d4d227"
/>
<img width="1077" height="298" alt="image"
src="https://github.com/user-attachments/assets/a3a8c5be-638c-45ff-82f3-816264254ead"
/>

### Dependencies

No new dependencies required:

- Chroma (existing) - Syntax highlighting
- Goldmark (existing) - Markdown rendering
- Standard library - JSON parsing

### Key Design Decisions

- Backend rendering for performance and security
- Reuses existing Gitea infrastructure (Chroma, Goldmark, sanitizer)
- Consistent styling with other markup renderers
- Graceful degradation for unsupported features

---

**Development Note:** This PR was developed with assistance from Amazon
Q Developer and Claude AI for implementation, debugging, and testing.

---------

Signed-off-by: Karthik Bhandary <34509856+karthikbhandary2@users.noreply.github.com>
Co-authored-by: karthik.bhandary <karthik.bhandary@kfintech.com>
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
Co-authored-by: bircni <bircni@icloud.com>
2026-06-14 15:52:37 +02:00

234 lines
5.3 KiB
Go

// Copyright 2017 The Gitea Authors. All rights reserved.
// SPDX-License-Identifier: MIT
package test
import (
"archive/tar"
"archive/zip"
"bytes"
"compress/gzip"
"io"
"net/http"
"net/http/httptest"
"os"
"regexp"
"slices"
"strconv"
"strings"
"sync"
"gitea.dev/modules/json"
"gitea.dev/modules/util"
"golang.org/x/net/html"
)
// RedirectURL returns the redirect URL of a http response.
// It also works for JSONRedirect: `{"redirect": "..."}`
// FIXME: it should separate the logic of checking from header and JSON body
func RedirectURL(resp http.ResponseWriter) string {
loc := resp.Header().Get("Location")
if loc != "" {
return loc
}
if r, ok := resp.(*httptest.ResponseRecorder); ok {
m := map[string]any{}
err := json.Unmarshal(r.Body.Bytes(), &m)
if err == nil {
if loc, ok := m["redirect"].(string); ok {
return loc
}
}
}
return ""
}
func ParseJSONError(buf []byte) (ret struct {
ErrorMessage string `json:"errorMessage"`
RenderFormat string `json:"renderFormat"`
},
) {
_ = json.Unmarshal(buf, &ret)
return ret
}
func ParseJSONRedirect(buf []byte) (ret struct {
Redirect *string `json:"redirect"`
},
) {
_ = json.Unmarshal(buf, &ret)
return ret
}
func IsNormalPageCompleted(s string) bool {
return strings.Contains(s, `<footer class="page-footer"`) && strings.Contains(s, `</html>`)
}
func MockVariableValue[T any](p *T, v ...T) (reset func()) {
old := *p
if len(v) > 0 {
*p = v[0]
}
return func() { *p = old }
}
func ReadAllTarGzContent(r io.Reader) (map[string]string, error) {
gzr, err := gzip.NewReader(r)
if err != nil {
return nil, err
}
content := make(map[string]string)
tr := tar.NewReader(gzr)
for {
hd, err := tr.Next()
if err == io.EOF {
break
}
if err != nil {
return nil, err
}
buf, err := io.ReadAll(tr)
if err != nil {
return nil, err
}
content[hd.Name] = string(buf)
}
return content, nil
}
func WriteTarArchive(files map[string]string) *bytes.Buffer {
return WriteTarCompression(func(w io.Writer) io.WriteCloser { return util.NopCloser{Writer: w} }, files)
}
func WriteZipArchive(files map[string]string) *bytes.Buffer {
buf := &bytes.Buffer{}
zw := zip.NewWriter(buf)
for name, content := range files {
w, _ := zw.Create(name)
_, _ = w.Write([]byte(content))
}
_ = zw.Close()
return buf
}
func WriteTarCompression[F func(io.Writer) io.WriteCloser | func(io.Writer) (io.WriteCloser, error)](compression F, files map[string]string) *bytes.Buffer {
buf := &bytes.Buffer{}
var cw io.WriteCloser
switch compressFunc := any(compression).(type) {
case func(io.Writer) io.WriteCloser:
cw = compressFunc(buf)
case func(io.Writer) (io.WriteCloser, error):
cw, _ = compressFunc(buf)
}
tw := tar.NewWriter(cw)
for name, content := range files {
hdr := &tar.Header{
Name: name,
Mode: 0o600,
Size: int64(len(content)),
}
_ = tw.WriteHeader(hdr)
_, _ = tw.Write([]byte(content))
}
_ = tw.Close()
_ = cw.Close()
return buf
}
func CompressGzip(content string) *bytes.Buffer {
buf := &bytes.Buffer{}
cw := gzip.NewWriter(buf)
_, _ = cw.Write([]byte(content))
_ = cw.Close()
return buf
}
var AllowSkipExternalService = sync.OnceValue(func() bool {
isLocalTesting := os.Getenv("CI") == ""
ciSkipExternal, _ := strconv.ParseBool(os.Getenv("GITEA_TEST_CI_SKIP_EXTERNAL"))
return isLocalTesting || ciSkipExternal
})
type TestingT interface {
Helper()
Skipf(format string, args ...any)
Errorf(format string, args ...any)
Fatalf(format string, args ...any)
}
func ExternalServiceHTTP(t TestingT, envVarName, def string) string {
t.Helper()
val := util.IfZero(os.Getenv(envVarName), def)
if val == "" {
if AllowSkipExternalService() {
t.Skipf("skipping test because %s is not set", envVarName)
} else {
t.Fatalf("%s is not set, but skipping is not allowed in CI", envVarName)
}
}
// minio's endpoint is "host:port" pattern
testURL := util.Iif(strings.Contains(val, "://"), val, "http://"+val)
resp, err := http.Get(testURL)
if err != nil {
if AllowSkipExternalService() {
t.Skipf("skipping test because %s is not ready", val)
} else {
t.Fatalf("%s is not ready, but skipping is not allowed in CI", val)
}
} else {
_ = resp.Body.Close()
}
return val
}
var normalizeHTMLSpacesRegexp = sync.OnceValue(func() (ret struct {
afterRt, beforeLt *regexp.Regexp
},
) {
ret.afterRt = regexp.MustCompile(`>\s*`)
ret.beforeLt = regexp.MustCompile(`\s*<`)
return ret
})
func NormalizeHTMLSpaces(s string) string {
vars := normalizeHTMLSpacesRegexp()
s = vars.afterRt.ReplaceAllString(s, ">\n")
s = vars.beforeLt.ReplaceAllString(s, "\n<")
return strings.TrimSpace(s)
}
func NormalizeHTMLAttributes(t TestingT, s string) string {
nodes, err := html.Parse(strings.NewReader(s))
if err != nil {
t.Errorf("failed to parse expected HTML: %v", err)
return ""
}
var normalize func(n *html.Node)
normalize = func(n *html.Node) {
slices.SortFunc(n.Attr, func(a, b html.Attribute) int {
if cmp := strings.Compare(a.Namespace, b.Namespace); cmp != 0 {
return cmp
}
if cmp := strings.Compare(a.Key, b.Key); cmp != 0 {
return cmp
}
return strings.Compare(a.Val, b.Val)
})
for c := n.FirstChild; c != nil; c = c.NextSibling {
normalize(c)
}
}
var sb strings.Builder
if err = html.Render(&sb, nodes); err != nil {
t.Errorf("failed to render HTML: %v", err)
}
return sb.String()
}