Storage & Data Management

Managing disks, data transfers, and storage on SynpixCloud

Overview

SynpixCloud instances come with configurable storage options. Understanding how storage works helps you manage your data effectively and optimize costs.

Storage Types

System Disk

  • Purpose - Operating system, software, and dependencies
  • Default size - Varies by image (typically 50-100 GB)
  • Persistence - Tied to instance lifecycle

Data Disk

  • Purpose - Datasets, models, and user data
  • Configurable - Choose size at instance creation
  • Retention - Can be preserved when instance stops

Disk Retention

When you stop or terminate an instance, you have options for your data:

OptionData PreservedStorage CostBest For
No RetentionNoNo ongoing costTemporary workloads
Retain DiskYesOngoing storage feeLong-term projects

Without disk retention enabled, all data is lost when the instance is terminated. Always backup important data!

Enabling Disk Retention

  1. During instance creation, enable Disk Retention
  2. For existing instances, go to Settings > Storage
  3. Toggle disk retention before stopping

Expanding Disk Space

If you need more storage space:

Online Expansion

Some instances support live disk expansion:

  1. Go to instance Settings > Storage
  2. Click Expand Disk
  3. Select new size
  4. Confirm and pay the difference

Extending Filesystem

After expanding the disk, extend the filesystem:

# Check current disk size
df -h

# Resize the partition (if needed)
sudo growpart /dev/vda 1

# Extend ext4 filesystem
sudo resize2fs /dev/vda1

# Or for XFS filesystem
sudo xfs_growfs /

Data Transfer

Uploading Files

Using SCP

# Upload single file
scp -P <port> local_file.zip root@<host>:/root/

# Upload directory
scp -P <port> -r local_folder root@<host>:/root/
# Sync with progress
rsync -avz --progress local_folder/ root@<host>:/root/data/ -e "ssh -p <port>"

# Resume interrupted transfer
rsync -avz --partial --progress large_file.zip root@<host>:/root/ -e "ssh -p <port>"

Using SFTP

sftp -P <port> root@<host>
sftp> put local_file.zip
sftp> get remote_file.zip

Downloading Files

# Download from instance to local
scp -P <port> root@<host>:/root/results.zip ./

# Download directory
scp -P <port> -r root@<host>:/root/output ./

Large Dataset Transfer

For very large datasets, consider:

  1. Cloud storage sync - Use rclone to sync with S3, GCS, etc.
  2. wget/curl - Download directly from URLs
  3. HuggingFace Hub - Use huggingface-cli for model downloads
# Install rclone
curl https://rclone.org/install.sh | sudo bash

# Configure cloud storage
rclone config

# Sync from cloud
rclone sync remote:bucket/data /root/data

HuggingFace Model Downloads

Download models efficiently from HuggingFace:

# Install the CLI
pip install huggingface_hub

# Login (optional, for private models)
huggingface-cli login

# Download a model
huggingface-cli download meta-llama/Llama-2-7b --local-dir ./models

Mirror Acceleration (China)

For faster downloads in China, use a mirror:

# Set HuggingFace mirror
export HF_ENDPOINT=https://hf-mirror.com

# Then download normally
huggingface-cli download model-name

Storage Best Practices

1. Organize Your Data

/root/
├── code/           # Your source code
├── data/           # Datasets
├── models/         # Trained models
├── checkpoints/    # Training checkpoints
└── output/         # Results and logs

2. Regular Backups

# Backup to local machine
rsync -avz root@<host>:/root/important/ ./backup/ -e "ssh -p <port>"

# Backup to cloud storage
rclone sync /root/important remote:bucket/backup

3. Clean Temporary Files

# Clear pip cache
pip cache purge

# Clear conda cache
conda clean --all

# Clear apt cache
sudo apt clean

# Find large files
du -sh /* | sort -hr | head -20

4. Use Compression

# Compress before transfer
tar -czvf data.tar.gz data/

# Extract on remote
tar -xzvf data.tar.gz

Storage Monitoring

Check Disk Usage

# Overall disk usage
df -h

# Directory sizes
du -sh /root/*

# Find largest files
find /root -type f -exec du -h {} + | sort -hr | head -20

Set Up Alerts

Monitor disk usage in your training scripts:

import shutil

def check_disk_space(path="/", threshold_gb=10):
    total, used, free = shutil.disk_usage(path)
    free_gb = free // (2**30)
    if free_gb < threshold_gb:
        print(f"Warning: Only {free_gb}GB free space remaining!")
    return free_gb

Troubleshooting

Disk Full

  1. Check what's using space: du -sh /* | sort -hr
  2. Clear caches (pip, conda, apt)
  3. Remove unused files/checkpoints
  4. Consider expanding disk

Cannot Write to Disk

  1. Check permissions: ls -la /path/to/dir
  2. Check disk space: df -h
  3. Check if filesystem is mounted read-only: mount | grep " / "

Slow Transfer Speeds

  1. Use compression: rsync -avz
  2. Use parallel transfers for many small files
  3. Check network bandwidth on both ends

Support

Storage questions? Contact us at support@synpixcloud.com