Files
gitea/modules/htmlutil/html.go
Karthik Bhandary e82352f156 feat(web): Add Jupyter Notebook (.ipynb) Rendering Support (#37433)
### Summary

Closes #37308

Adds native rendering support for Jupyter notebook files (`.ipynb`) in
Gitea using backend rendering, allowing users to view formatted
notebooks with code cells, markdown, outputs, and visualizations
directly in the repository browser.

### Motivation

Jupyter notebooks are widely used in data science, machine learning, and
scientific computing. Currently, Gitea displays `.ipynb` files as raw
JSON, making them difficult to read. This feature enables users to view
notebooks in a formatted, readable way similar to GitHub and GitLab.

### Implementation Approach

**Evolution:** Initially implemented frontend rendering using `marked`
and `Shiki` libraries. After review feedback, migrated to backend
rendering for better performance, security, and consistency with Gitea
architecture.

#### Backend Rendering Advantages

- Server-side HTML generation eliminates client-side parsing overhead
- Integrates with Gitea existing markup sanitizer for security
- Uses Chroma for syntax highlighting (consistent with code files)
- Uses Goldmark for markdown rendering (consistent with `.md` files)
- No additional frontend dependencies required
- Better performance for large notebooks

### Features

#### Supported Cell Types

- **Markdown cells:** Rendered with Goldmark (tables, lists, links, code
blocks, etc.)
- **Code cells:** Syntax-highlighted with Chroma, execution counts,
language detection from notebook metadata
- **Output cells:** Multiple output types in a single cell

#### Supported Output Types

-  Text/plain outputs
-  Images (PNG, JPEG, SVG) with base64 data URIs
-  HTML outputs (tables, DataFrames, formatted text)
-  LaTeX/math equations (rendered as code blocks)
-  Error outputs with traceback (styled in red)
-  Stream outputs (`stdout`/`stderr`)
- ⚠️ Interactive widgets (Plotly, ipywidgets) show informative messages
- ⚠️ JavaScript outputs show security warning (disabled for safety)

#### Edge Cases Handled

- Empty notebooks or notebooks with no outputs
- Corrupted JSON with graceful error display
- Mixed output types in single cell
- Large base64-encoded images
- Execution count of `null` or `0`
- `nbformat` version compatibility (only renders `nbformat 4+`, shows
message for older versions)

### Changes

#### Backend (Go)

- `modules/markup/jupyter/jupyter.go` (**NEW**)

  - Jupyter notebook renderer implementation
  - Parses `.ipynb` JSON structure and generates HTML
  - Integrates Chroma for code syntax highlighting
  - Integrates Goldmark for markdown cell rendering
  - Dynamic language detection from notebook metadata
  - Handles all standard Jupyter output types
  - Comprehensive error handling with user-friendly messages

- `modules/markup/renderer.go` (**MODIFIED**)

  - Registered Jupyter renderer in markup system

- `main.go` (**MODIFIED**)

  - Import Jupyter renderer package for initialization

#### Styling (CSS)

- `web_src/css/markup/jupyter.css` (**NEW**)

  - Comprehensive styling for notebook cells, code, outputs
  - Uses Gitea CSS variables for consistent theming
  - Responsive layout with proper spacing
  - Table styling for DataFrame outputs
- Removed parent container padding for consistency with other renderers

#### Sanitizer Rules

- `modules/markup/jupyter/jupyter.go` → `SanitizerRules()`

  - Configured HTML sanitization rules for safe rendering:
    - Cell structure (markdown, code, input/output wrappers)
    - Code highlighting (Chroma classes)
    - Images (base64 data URIs only)
    - Tables (DataFrames)
    - Markdown elements (headers, lists, links, etc.)

### Security Considerations

- Server-side rendering: No client-side JavaScript execution
- HTML sanitization: Strict allowlist for HTML elements and attributes
- Image security: Only base64 data URIs allowed (no external URLs)
- JavaScript disabled: `application/javascript` outputs show warning
- XSS protection: Gitea markup sanitizer handles all HTML output

### Testing

Manual testing performed with various notebooks:

- Markdown rendering (headers, lists, tables, links, code blocks)
- Code cells with execution counts and syntax highlighting
- Multiple output types (text, images, HTML, LaTeX, errors, streams)
- Error handling for edge cases
- Theme compatibility (light/dark mode)

### Screenshots

<img width="1080" height="553" alt="image"
src="https://github.com/user-attachments/assets/aef9afa7-ed96-434d-98b0-b160565fc967"
/>
<img width="1092" height="552" alt="image"
src="https://github.com/user-attachments/assets/6e61e792-4737-41c1-851e-5c375c1f932a"
/>
<img width="1104" height="622" alt="image"
src="https://github.com/user-attachments/assets/4ac630c1-3a75-4e1c-9bba-c0a27484d001"
/>
<img width="1104" height="529" alt="image"
src="https://github.com/user-attachments/assets/33750c47-70de-4ab2-893d-e5d09fa8d9c4"
/>
<img width="1111" height="343" alt="image"
src="https://github.com/user-attachments/assets/52107d9f-0e06-420b-9ab4-1603dcd676b1"
/>
<img width="1091" height="650" alt="image"
src="https://github.com/user-attachments/assets/0addae21-efa4-44bb-a56e-0418e3d4d227"
/>
<img width="1077" height="298" alt="image"
src="https://github.com/user-attachments/assets/a3a8c5be-638c-45ff-82f3-816264254ead"
/>

### Dependencies

No new dependencies required:

- Chroma (existing) - Syntax highlighting
- Goldmark (existing) - Markdown rendering
- Standard library - JSON parsing

### Key Design Decisions

- Backend rendering for performance and security
- Reuses existing Gitea infrastructure (Chroma, Goldmark, sanitizer)
- Consistent styling with other markup renderers
- Graceful degradation for unsupported features

---

**Development Note:** This PR was developed with assistance from Amazon
Q Developer and Claude AI for implementation, debugging, and testing.

---------

Signed-off-by: Karthik Bhandary <34509856+karthikbhandary2@users.noreply.github.com>
Co-authored-by: karthik.bhandary <karthik.bhandary@kfintech.com>
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
Co-authored-by: bircni <bircni@icloud.com>
2026-06-14 15:52:37 +02:00

164 lines
3.9 KiB
Go

// Copyright 2022 The Gitea Authors. All rights reserved.
// SPDX-License-Identifier: MIT
package htmlutil
import (
"errors"
"fmt"
"html/template"
"io"
"slices"
"strings"
)
// ParseSizeAndClass get size and class from string with default values
// If present, "others" expects the new size first and then the classes to use
func ParseSizeAndClass(defaultSize int, defaultClass string, others ...any) (int, string) {
size := defaultSize
if len(others) >= 1 {
if v, ok := others[0].(int); ok && v != 0 {
size = v
}
}
class := defaultClass
if len(others) >= 2 {
if v, ok := others[1].(string); ok && v != "" {
if class != "" {
class += " "
}
class += v
}
}
return size, class
}
func htmlFormatArgs(s template.HTML, rawArgs []any) []any {
if !strings.Contains(string(s), "%") || len(rawArgs) == 0 {
panic("HTMLFormat requires one or more arguments")
}
args := slices.Clone(rawArgs)
for i, v := range args {
switch v := v.(type) {
case nil, bool, int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64, float32, float64, template.HTML:
// for most basic types (including template.HTML which is safe), just do nothing and use it
case string:
args[i] = template.HTMLEscapeString(v)
case template.URL:
args[i] = template.HTMLEscapeString(string(v))
case fmt.Stringer:
args[i] = template.HTMLEscapeString(v.String())
default:
args[i] = template.HTMLEscapeString(fmt.Sprint(v))
}
}
return args
}
func HTMLFormat(s template.HTML, rawArgs ...any) template.HTML {
return template.HTML(fmt.Sprintf(string(s), htmlFormatArgs(s, rawArgs)...))
}
func HTMLPrintf(w io.Writer, s template.HTML, rawArgs ...any) (int, error) {
return fmt.Fprintf(w, string(s), htmlFormatArgs(s, rawArgs)...)
}
func HTMLPrint(w io.Writer, s template.HTML) (int, error) {
return io.WriteString(w, string(s))
}
func HTMLPrintTag(w io.Writer, tag template.HTML, attrs map[string]string) (written int, err error) {
n, err := io.WriteString(w, "<"+string(tag))
written += n
if err != nil {
return written, err
}
for k, v := range attrs {
n, err = fmt.Fprintf(w, ` %s="%s"`, template.HTMLEscapeString(k), template.HTMLEscapeString(v))
written += n
if err != nil {
return written, err
}
}
n, err = io.WriteString(w, ">")
written += n
return written, err
}
func EscapeString(s string) template.HTML {
return template.HTML(template.HTMLEscapeString(s))
}
type HTMLWriter interface {
OriginWriter() io.Writer
WriteString(s string) HTMLWriter
WriteHTML(s template.HTML) HTMLWriter
WriteFormat(fmt template.HTML, args ...any) HTMLWriter
Err() error
}
type htmlWriter struct {
w io.Writer
errs []error
}
func (h *htmlWriter) OriginWriter() io.Writer {
return h.w
}
func (h *htmlWriter) WriteString(s string) HTMLWriter {
if _, err := io.WriteString(h.w, template.HTMLEscapeString(s)); err != nil {
h.errs = append(h.errs, err)
}
return h
}
func (h *htmlWriter) WriteHTML(s template.HTML) HTMLWriter {
if _, err := io.WriteString(h.w, string(s)); err != nil {
h.errs = append(h.errs, err)
}
return h
}
func (h *htmlWriter) WriteFormat(fmt template.HTML, args ...any) HTMLWriter {
if _, err := HTMLPrintf(h.w, fmt, args...); err != nil {
h.errs = append(h.errs, err)
}
return h
}
func (h *htmlWriter) Err() error {
return errors.Join(h.errs...)
}
func NewHTMLWriter(w io.Writer) HTMLWriter {
return &htmlWriter{w: w}
}
type HTMLBuilder struct {
sb strings.Builder
}
func (b *HTMLBuilder) WriteString(s string) *HTMLBuilder {
b.sb.WriteString(template.HTMLEscapeString(s))
return b
}
func (b *HTMLBuilder) WriteHTML(s template.HTML) *HTMLBuilder {
b.sb.WriteString(string(s))
return b
}
func (b *HTMLBuilder) WriteFormat(fmt template.HTML, args ...any) *HTMLBuilder {
_, _ = HTMLPrintf(&b.sb, fmt, args...)
return b
}
func (b *HTMLBuilder) HTMLString() template.HTML {
return template.HTML(b.sb.String())
}
func (b *HTMLBuilder) String() string {
return b.sb.String()
}