The Lustre file system is a distributed high performance cluster filesytem that redefines I/O performance and scalability for large and complex computing environments. This is ideally suited for data-intensive applications which requires the high IO performance.

Lustre components

MDS – Metadata Server: The MDS server makes metadata stored in the metadata targets available to Lustre clients.

MDT – Metadata Target: This stores metadata, such as filenames, directories, permissions, and file layout, on the metadata server.

OSS – Object Storage Server: The OSS provides file I/O service, and network request handling for the OSTs. The MDT, OSTs and Lustre clients can run concurrently on a single node. However, a typical configuration is an MDT on a dedicated node, two or more OSTs on each OSS node, and a client on each of a large number of compute nodes.

OST – Object Storage Target: The OST stores data as data objects on one or more OSSs. A single Lustre file system can have multiple OSTs, each serving a subset of file data.

Client: The systems that mount the Lustre filesystem.

Steps to create Lustre FS

Configure Lustre Management Server (lustre-mgs.unixfoo.biz – Server 1)

  1. Add the disk to volume manager

    [root@lustre-mgs mnt]# pvcreate /dev/sdb
    Physical volume "/dev/sdb" successfully created

    [root@lustre-mgs mnt]# pvs
      PV         VG   Fmt  Attr PSize   Pfree
      /dev/sdb        lvm2 --   136.73G 136.73G

  2. Create lustre volume group

    [root@lustre-mgs mnt]# vgcreate lustre /dev/sdb
      Volume group "lustre" successfully created
    [root@lustre-mgs mnt]# vgs
      VG     #PV #LV #SN Attr   VSize   Vfree
      lustre   1   0   0 wz--n- 136.73G 136.73G

  3. Create logical volume "MGS" (the management server)

    [root@lustre-mgs ~]# lvcreate -L 25G -n MGS lustre
  4. Create Lustre Management filesystem.

    [root@lustre-mgs ~]# mkfs.lustre --mgs /dev/lustre/MGS

       Permanent disk data:
    Target:     MGS
    Index:      unassigned
    Lustre FS:  lustre
    Mount type: ldiskfs
    Flags:      0x74
                  (MGS needs_index first_time update )
    Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
    Parameters:

    checking for existing Lustre data: not found
    device size = 10240MB
    2 6 18
    formatting backing filesystem ldiskfs on /dev/lustre/MGS
            target name  MGS
            4k blocks     0
            options        -J size=400 -q -O dir_index,uninit_groups –F

    mkfs_cmd = mkfs.ext2 -j -b 4096 -L MGS  -J size=400 -q -O dir_index,uninit_groups -F /dev/lustre/MGS
    Writing CONFIGS/mountdata
    [root@lustre-mgs ~]#

  5. Activate the lustre management filesystem using mount command.

    [root@lustre-mgs ~]# mount -t lustre /dev/lustre/MGS /lustre/MGS/
    [root@lustre-mgs ~]# df
    Filesystem           1K-blocks      Used Available Use% Mounted on
    /dev/sda2             54558908   5276572  46466144  11% /
    /dev/sda1               497829     29006    443121   7% /boot
    tmpfs                  8216000         0   8216000   0% /dev/shm
    /dev/lustre/MGS       10321208    433052   9363868   5% /lustre/MGS
    [root@lustre-mgs ~]#

Configure Lustre Metadata Server (In this guide, both the management & metadata server runs on the same host)

  1. Create logical volume "MDT_unixfoo_cloud"

    [root@lustre-mgs ~]# lvcreate -L 25G -n MDT_unixfoo_cloud lustre
  2. Create Lustre Metdata filesystem for the filesystem “unixfoo_cloud”.

    [root@lustre-mgs ~]# mkfs.lustre --fsname=unixfoo_cloud --mdt  --reformat --mgsnode=lustre-mgs@tcp0 /dev/lustre/MDT_unixfoo_cloud
       Permanent disk data:
    Target:     unixfoo_cloud-MDTffff
    Index:      unassigned
    Lustre FS:  unixfoo_cloud
    Mount type: ldiskfs
    Flags:      0x71
                  (MDT needs_index first_time update )

    Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
    Parameters: mgsnode=10.217.0.237@tcp mdt.group_upcall=/usr/sbin/l_getgroups
    device size = 20480MB
    2 6 18

    formatting backing filesystem ldiskfs on /dev/lustre/MDT_unixfoo_cloud
            target name  unixfoo_cloud-MDTffff
            4k blocks     0
            options        -J size=400 -i 4096 -I 512 -q -O dir_index,uninit_groups –F
    mkfs_cmd = mkfs.ext2 -j -b 4096 -L unixfoo_cloud-MDTffff  -J size=400 -i 4096 -I 512 -q -O dir_index,uninit_groups -F /dev/lustre/MDT_unixfoo_cloud
    Writing CONFIGS/mountdata
    [root@lustre-mgs ~]#

  3. Activate the metdata filesystem using mount command.

    [root@lustre-mgs ~]# mkdir /lustre/MDT_unixfoo_cloud
    [root@lustre-mgs ~]# mount -t lustre  /dev/lustre/MDT_unixfoo_cloud /lustre/MDT_unixfoo_cloud

Configure OSTs ( servers: oss1, oss2 .. )

  1. Add /dev/md0 to volume manager

    [root@oss1 ~]# pvcreate /dev/md0
  2. Create volume group "lustre"

    [root@oss1 ~]# vgcreate lustre /dev/md0
  3. Create logical volume (ost) for unixfoo_cloud

    [root@oss1 ~]# lvcreate -n OST_unixfoo_cloud_1 -L 100G lustre
  4. Create OST lustre filesystem

    [root@oss1 ~]# mkfs.lustre --fsname=unixfoo_cloud --ost --mgsnode=lustre-mgs@tcp0 /dev/lustre/OST_unixfoo_cloud_1
    mkfs_cmd = mkfs.ext2 -j -b 4096 -L unixfoo_cloud-OSTffff  -J size=400 -i 16384 -I 256 -q -O dir_index,uninit_groups -F /dev/lustre/OST_unixfoo_cloud_1
    Writing CONFIGS/mountdata
    [root@oss1 ~]#

  5. Activate the OST by using mount command

    [root@oss1 ~]# mkdir -p /lustre/unixfoo_cloud_oss1
    [root@oss1 ~]# mount -t lustre /dev/lustre/OST_unixfoo_cloud_1 /lustre/unixfoo_cloud_oss1

Mount on the client:

  1. Mount the lustre filesystem unixfoo_cloud

    [root@lustreclient1 ~]# mount -t lustre lustre-mgs@tcp0:/unixfoo_cloud /mnt
    [root@lustreclient1 ~]# df –h
    Filesystem            Size  Used Avail Use% Mounted on
    /dev/sda2              52G  5.1G   44G  11% /
    /dev/sda1             487M   29M  433M   7% /boot
    tmpfs                 7.9G     0  7.9G   0% /dev/shm
    lustre-mgs@tcp0:/unixfoo_cloud
                           99G  461M   93G   1% /mnt
    [root@lustreclient1 ~]#
  2. Done.

Deduplication refers to the elimination of redundant data in the storage. In the deduplication process, duplicate data is deleted, leaving only one copy of the data to be stored. However, indexing of all data is still retained should that data ever be required. Deduplication is able to reduce the required storage capacity since only the unique data is stored.  Netapp supports deduplication where only unique blocks in the flex volume is stored and it creates a small amount of additional metadata in the dedup process. The NetApp deduplication technology allows duplicate 4KB blocks anywhere in the flexible volume to be deleted and stores a unique one.

The core enabling technology of deduplication is fingerprints. These are unique digital signatures for every 4KB data block in the flexible volume. When deduplication runs for the first time on a flexible volume with existing data, it scans the blocks in the flexible volume and creates a fingerprint database, which contains a sorted list of all fingerprints for used blocks in the flexible volume. After the fingerprint file is created, fingerprints are checked for duplicates and if found, first a byte-by-byte comparison of the blocks is done to make sure that the blocks are indeed identical. If they are found to be identical, the block’s pointer is updated to the already existing data block and the duplicate data block is released and inode is updated.

Netapp Deduplication commands:
  1. Enable dedup (asis) license.

    fractal-design> sis on /vol/demovol
  2. If you have a new flex volume which was just created, follow this step to enable ASIS deduplication

    fractal-design> sis on /vol/demovol
    Deduplication for "/vol/demovol" is enabled.
    Already existing data could be processed by running "sis start -s /vol/demovol”

  3. If you have already existing flex volume with data in it, follow this step.

    fractal-design> sis start -s /vol/demovol
  4. Checking the status of deduplication.

    fractal-design> vol status demovol
    Volume          State   Status          Options
    VolArchive      online  raid_dp, flex   nosnap=on
                            sis
    Containing aggregate: 'aggr0'
    fractal-design>

    fractal-design> sis status /vol/demovol
    Path            State   Status      Progress
    /vol/demovol    Enabled Idle        Idle for 00:02:12
    fractal-design>

  5. Check the storage space saved due to deduplication

    fractal-design> df -s /vol/demovol
    Filesystem      used    saved   %saved
    /vol/demovol/   9316052 0       0%
    fractal-design>

  6. If you have to run deduplication at a later point of time on this volume, just do a “sis start /vol/demovol”.
  7. The sis can be scheduled using “sis config” command.
  8. Done.

More netapp blog posts at : http://unixfoo.blogspot.com/search/label/netapp

Disclaimer

All the information presented on this blog is provided on the basis of as is and is meant for reference purpose only without any expressed or implied warranty. Use of the information and its application in any form is sole discretion and responsibility of the user. unixfoo.blogspot.com is not responsible for any damage or loss arising out of use of information presented here. This web site has links to external web sites and unixfoo.blogspot.com is not responsible for the contents at those sites.