life is fun

Open Source Photogrammetry: Ditching 123D Catch

Open Source Photogrammetry: Ditching 123D Catch

This is part one in a series on open source photogrammetry. In part two, I’ll flesh out more VFX-centric application of this workflow. Before I start, big thanks to:

  • Dan Short: for showing me his awesome 123d models that sparked this whole idea
  • Hannah Davis: debugging + emotional support + snacks

So a few weeks ago, Dan Short showed me 123D Catch. It was awesome. You feed in some pictures from your iPhone and they get uploaded to the cloud where they’re turned back into a textured 3D model you can explore on your phone or download on your computer. 123D Catch (henceforth also refered to as “Catch”) is part of Autodesk’s consumer-line of 3D technologies which include a product for 3D modeling on the iPad and producing 3D prints.

Until Dan showed me some models he generated from exhibts at the AMNH I didn’t really get the point of Catch…so what, you have a model of your water bottle…but what Dan showed me was that it worked incredible well on environments too: The Hall of African Mammals or even the penguin diarama from the infamous whale room! This is all done using a process called Photogrammetry or Structure from Motion (SFM) where the computer examines common features in each image and is able to construct a 3D form from overlapping features.

photogrammetry explanation

I remembered seeing something like this years ago: a product demo called Photosynth from Microsoft, which did this sort of reconstruction from thousands of tourist photos of the Notre Dame Cathedral. It had always seemed like ultimate vaporware, but now it was here, on an iPhone! I also remembered watching a product demo for a piece of software called ImageModeler from Autodesk a few years back that was more manual, but still pretty amazing. Was this futuristic tech (FUTURETECH?!1!!) finally in our hands?

But…..I wasn’t thrilled by 123D Catch’s black box nature, and wondered whether there were any open source alternatives that could be used that wouldn’t have as many limitations. Surely, a lot of power was being sacrificed by its fully automatic workflow? Here are some other ways that 123D Catch is quite limited:

  • Photo limits: the iphone app seems to allow a maximum of 40 images. (the non-mobile version for Windows is limited to 70.) There’s no technical reason why there should be any limit. Plus, capturing larger 3d scenes like detailed environments will require more than 70 pictures pretty quickly.
  • Down-rezed photos: pretty sure in order to speed the photo’s “ascent” into the cloud, the pictures are scaled down limiting their detail and use as a high-rez texture during the projection phase.
  • Limited texture map size: in my few tests, the texture maps returned from 123D Catch’s automatic processes are returned at a given size…you have no control over how big a map you want.
  • Total blackbox: no controls to guide the 3D reconstruction or manipulate the results. This process is pretty intricate and having no controls seems a little scary.
  • Lightweight output by design: Autodesk’s 123D line (Catch included) isn’t meant to be a professional solution: this is meant to 3D-ize trinkets and give you something to 3D print with, not create high-rez models.

If I wasn’t going to use 123D Catch, I had to find some alternative.  Since I’m not into pirating software, and I wanted to see how far I could get with $0.00, I decided to investigate what was available and Open Source.  Lots has been written on the pros and cons of Open Source software, but I’ve learned over time that often the FOSS (Free and Open Source Software) tools are on par or exceed their commercial brethren.  Since the emphasis is never on turning a profit, FOSS software is rarely intentionally crippled ala 123d Catch’s photo size and quantity limits.

I started to do research on FOSS alternatives, but getting them up and running took a bit of time. Over time, I discovered Bundler (which was actually created from open sourced components of the Photosynth project!), and RunSFM. Each of them mentioned on their homepages to check out another product, VisualSFM, which was more up to date and represented the state of the art in FOSS photogrammetry technology. (For a list of SFM software, free and commercial, check out this list. Special shout out to the Python Photogrammetry Tookit (PPT) which looks promising…check this out here )

After finding some FOSS solutions and after TONS of research, math, and picture taking (and seeing other peoples results like these!), I’d like to present a totally FOSS pipeline for converting images into textured 3D models. This documents represents hours of work and covers all the steps of creating textured 3d geometry from unregistered images.

Here are the steps:

Part 1: VisualSFM

VisualSFM lets us load up a folder of images, find unique features in each image, solve a set of these images into a 3D model, and then refine that model into a dense point cloud. The two outputs of this step are:

  • a .out file (bundler format) which stores the calculated (solved) cameras’ positions and a sparse point cloud representing the points in the scene that the software used to determine the camera positions.
  • a .ply file which stores a denser cloud of points, each with position, color and normal (a vector perpendicular to the surface the point is on) data.

If you want to see a video of this process, check this out.

Part 2: Meshlab

Meshlab allows us to do basically anything to a 3D mesh or a point cloud. We’ll take the two outputs from the above steps and produce a textured, clean, high-res mesh. Meshlab also automatically calculates UV maps (the basis for 3D texturing) and builds a texture for us by estimating which projector is most accurate on each spot of the model, which is insanely cool. The outputs of this step are:

  • a .obj file of a mesh (with UVs) that can be easily interchanged with various 3D softwares
  • a .png file of arbitrary size representing the texture of the mesh

Part 3: Applications

Now that you have a textured 3d model of something, what can you do with it? I offer a ton of ideas.


Before I detail the process, I want to establish a few things first.

  • THIS IS SUPER EASY: I can’t tell you how much easier this is than it looks…this guide just tries to cover as many edge cases as possible.   Anyone can do this (if you can get VisualSFM running).
  • I’m not an expert in this field at all: I’ve been basically immersing myself in it for the last two weeks and this post represents a massive brain dump of all the possibilities and angles I’ve discovered and thought about. There are probably things that are flat out wrong or poorly explained. Please e-mail me or comment if I’m wrong.
  • Hardware Requirements: Since VisualSFM and Meshlab are both available for Mac OS, Linux and Windows, this pipeline should be viable anywhere. I would strongly recommend a modern GPU as well as a 64-bit system with loads of ram. VisualSFM seems mostly CPU/GPU bound and meshing seems to be mostly memory bound.
  • Licensing Restrictions: Both 123D Catch and Visual SFM have pretty severe licensing restrictions. I’m not a lawyer, but I’m pretty sure that Autodesk owns whatever you upload to them via Catch and that you are prohibited from using Visual SFM for commercial uses. Use this for personal projects only, and if you love the process, find some less restrictive software!
  • Expectation Management: this process is hardly an exact science, and your mesh at the end might look a little funky. Don’t expect a perfect mesh, especially for a complex object or scene, especially your first time through the process. As you learn to take better pictures and learn the ins-and-outs of the flow, you’ll be able to produce better models. Also, you’d be surprised how much better your mesh will look when your textures have been applied. If you’re prepared to do some 3d modeling at the end of this process to compensate for meshing issues, you can produce amazing results.
  • 123D isn’t that bad: Despite my negative comments about 123D Catch before, I learned in my research that Autodesk has licensed some incredibly expensive high-end software called Smart3DCapture from a french company called Acute3D. Despite Autodesk’s insistance on down-rezing your images / not returning 3D camera data / limiting the number of images the software that’s doing the math behind the scenes is actually incredibly powerful. I’m speculating here based on limited experience, but it seems that Catch (Smart3DCapture) does a better job matching images and meshing them than my solution does some of the time, but that you have way more control over the input / output using my pipeline. Taking pictures with a structured and regimented approach will allow this pipeline to overcome the comparative disadvantages of this free technology and produce awesome results. And if you’re bummed out at the quality, go buy Smart3DCapture.
  • vs. LIDAR: LIDAR (LIght Detection And Ranging) scanners are special machines that scan an environment in a sphere, capturing both color as well as 3d position data. If you remember the “House of Cards” music video by Radiohead, that was all captured (and released for home use!) using LIDAR. It’s also how the Google Car “sees“. They’re insanely expensive but exist as the gold standard for environment scanning. Photogrammetric solutions are starting to get pretty close to what LIDAR is capable of, but if you’ve got the bucks, you’ll probably get better results from LIDAR scans. Check out this awesome tutorial on reconstructing sets using LIDAR and Mari by Scott Metzger. I haven’t had any experience with LIDAR scans, and I imagine they include their own proprietary software, but if you can get a .ply file and camera positions, then you can plug that into step 2 of my workflow and keep on trucking!
  • I’m a compositor: I’m a freelance Compositor for a living and we often have to partially reconstruct environments using footage from a moving camera. Even though I’m trying to stay domain agnostic and separate most of compositor stuff into a second post, some stuff does creep in from time to time. In particular, any mentions of Nuke pertain to The Foundry’s Nuke, the software I use for work. It’s incredibly expensive so I don’t expect anyone to pick it up from reading this tutorial or it’s followup, but I’m happy to talk your ear off about it if it seems interesting.

Pre-requisite: Compiling VisualSFM:

A core part of this pipeline is VisualSFM, the photogrammetry software. No two ways about it though…getting VisualSFM to run is no easy task. The program wraps several other open source photogrammetry tools / pre-requisites that each have their own pre-requsites that need to be compiled. I had such a hard time getting it to run on Mac OSX that I initally got it running on linux inside a virtual machine. I eventually figured out how to get it running on my native OS in order to take advantage of some GPU accelerated elements of the workflow.

  • For Windows: seems to be not that bad, just follow the instructions on the VisualSFM site.
  • For OSX: use this set of helper scripts
  • For Linux: I followed the guide from the VisualSFM website, as well as this guide here. Specfically, the comment towards the bottom that gives the apt-get command needed to get all the base requirements was very helpful, and is repeated here:
    <br />
    sudo apt-get install libc6-dev-i386 libhdf5-serial-dev libgtest-dev libgsl0-dev python-dev cmake imagemagick libmagick++-dev gfortran minpack-dev liblapack-dev libatlas-dev libatlas-base-dev libboost-dev libc6-dev-i386 libgsl0-dev<br />

If you’re having trouble compiling the software, drop me a line and I’ll try to help you! I should say, that, even without the rest of the steps in my workflow, playing around with VisualSFM is great fun. Worth the process of installation!

Flowchart View

If you get lost, refer to this handy flowchart, created using dot:


Pipeline Part 1: VisualFSM

For this tutorial, I’ll be using table still life I arranged:


Photographing for photogrammetry requires a different approach to regular photography. In order to get the best possible reconstruction, check here and here (read the section “Basic DOs and DONTs”). I’d like to emphasize a few other points:

  • Parallax: MOVE AROUND! Don’t stand still and turn your body: this won’t give the solver any sort of depth information. If you move in a 360 degree circle around an object, taking snaps every 10-20 degrees, as well as at several heights, you’re bound to leave no face un-photographed and also provide the solver as much information as possible. Remember: there is no actual limit to the number of pictures VisualSFM will integrate into its solve, the only factor is time.
  • Overlap: you can’t just take random snaps of the scene and expect it to work out (unless you don’t mind bringing it into the software and seeing where you missed spots and then taking more photos from those angles). In the end, you do have to be fairly methodical. Making sure to walk around in a circle and at various heights is a lot easier than modeling / texturing / rendering a photoreal environment, right? And you get used to it.
  • Photograph all angles: If you’re distributing a model at the end of this pipeline, expect people to spin and zoom all around / through it. That means if there’s a patch you don’t have any data for, people will see it and frown. Try to get everything. Roofs can be really tricky. Whatever you don’t photograph you’ll have to manually reconstruct and the whole point of this pipeline is to avoid that.
  • Textures: The material the objects in your scene are made of is incredibly important. Since the process relies on determining features common to several photos, it needs points or features to grab onto. Objects that are reflective (mirrors), very shiny (a metal teapot), or transparent (plastic, glass) will cause problems. Beware false points as well: specular highlights on shiny objects might seem like valid points, but as you move around the room, those points will “move” across the surface of the object and will throw off the solver. Try to make sure your scene has a lot of matte surfaces. There is a reason all photogrammetry demos are done of stone or marble statues. If your scene does have objects with “bad” materials, don’t worry…as long as it can find points to grab onto, you should be able to produce a solve. It’s just unlikely that the mesh that gets produced over those objects will be useful, and you’ll have to spend some time reconstructing manually.
  • Scale: reconstructing entire environments is totally possible, but be prepared to take a lot more photographs. Smaller objects require fewer images and the process will take less time. Any cleanup steps will also take dramatically less time.
  • Minimize subject motion: The software expects that everything in the scene is stationary when it analyzes the scene. if objects move around, the assumption that features are in the same space between photographs will be false, and VisualSFM will build a distorted model. Note that if you have a setup like this guy, you don’t have to worry about motion because all the cameras fire at once!
  • Controlled Environment: Choose a place where no one is likely to interrupt or disrupt your setup. If a chair moves during your photography, it’s going to confuse VisualSFM.

Step A: Import a set of pictures

I tried to follow most of those maxims when I arranged the above still life (except intentionally in a few places).  Pictures were taken with a Panasonic DMC-ZS7 Lumix camera. VisualSFM will produce generally better results the higher-rez your images are, but after 3200px, the results get worse (the reasons are in the readme, if you’re curious).  Before you import the images, make sure your images are no larger than 3200px tall or wide. If you’re on a mac, this is trivial with Automator!

Screen Shot 2013-07-12 at 3.15.10 AM

DON’T do what this picture depicts and scale the original images.  Always make copies (automaters tells you to do this by default).

Sidebar: Resolution

There is some evidence that smaller images (<2k) actually improve feature detection and matching success. The camera I used for this project shot 4000×3000 pixel images. At first, I resized to 3200×2400 to obey VisualSFM’s maximum. But just for kicks, I also resized to 1280×960. The smaller images solved properly, and the larger batch didn’t. I can’t really explain this and it’s driving me crazy. My hunch is that since the process of matching features between images is done on the GPU and mine isn’t that great (Apple uses basically laptop GPUs in all their desktops except the Mac Pro) it wasn’t able to find all the matches in the larger pictures. I’m going to e-mail the developer of the software and try to figure out what’s up. In the meantime, I’d recommend scaling your images down to maybe around 2000 to start, plus or minus, especially if you have a good GPU. G smaller if you’re having trouble fitting all the images into one model. Remember to keep the originals so you can try different smaller resolutions. Sorry this is so strange.

If you do have to resize your images, make sure to retain any EXIF metadata logged by the camera as VisualSFM can use it in order to pick more accurate focal length and sensor size settings. Automator retains this info no problem. (VisualSFM installs a FOSS program called jhead for reading EXIF data from the command line. You can run it on images at any step in this worflow to double check the data is still there)

Now, click the second folder icon to import your sequence:


In the ensuing file browser dialog, hold shift down to select multiple images. After a brief delay, you should see a 2d grid of the imported images, read to go! You can remove bad (ie: blurry, out of focus, or duplicate) pictures from here, if you like, though reviewing them in a file browser is probably easier.

Screen Shot 2013-07-26 at 4.58.20 AM

Make sure you enable the task viewer if it hasn’t popped up yet, it’s in Tools –> Show TaskViewer and gives feedback into what VisualSFM is doing.

Step B: SIFT and match

Hitting the icon that looks like 4 arrows in an X shape on the toolbar will start the next step, SIFT and match.


This step encapsulates two steps. The first process, “SIFTing” is where unique features of each image are detected and logged. VisualSFM will accelerate this process on machines that have almost any modern GPU, which is cool.

Screen Shot 2013-07-12 at 3.16.51 AM

Next, during the matching phase, these features are matched between images. You can see in the log all the comparisons being made between each image and every other. The amount of comparisons that have to be made is exponentially related to the number of images, so this can potentially take quite a while.

Screen Shot 2013-07-12 at 3.17.06 AM

Another way of doing the comparisons is a neighbor comparison to only the n images on either side of each image. While this might work well for a linear camera move, I worry that it’ll miss overlap between early images and later ones that could tie the solve together, so I choose to use the longer / slower comparison.

Note that if either process is interupted, either by being paused or crashing out, it’ll resume where it left off the next time you try to SIFT and match those images.

Step C: Sparse 3D reconstruction

After the overlapping features have been detected, you can begin a sparse 3D reconstruction. Hit the button with the two arrows on it (>>) to start this process.


The GUI for VisualSFM is really awesome during this proceess as it’s totally interactive. You can pan, rotate and zoom through the 3D space as it tries to solve the feature overlaps into a 3D form. You can also see the position and rotation of the cameras that recorded the original images as they’re slotted into the solve. At any point, you can ctrl+mousewheel up or down in order to scale the photos that are reconstructing the scene to get an idea of which photos are being places where. This process shouldn’t take too long.

I didn’t get any good screenshots of it solving the scene, but here are some when it’s all done.

Screen Shot 2013-07-26 at 4.59.55 AM

Screen Shot 2013-07-26 at 5.00.09 AM

Screen Shot 2013-07-26 at 5.00.59 AM

These last two pictures are the same, except I adjusted the size of the projectors up a bit so you can how it’s fitting each camera / image into the scene.  Unlike Catch, this process is interactive in that you can influence the outcome of the solve by removing bad cameras: ones that have been added the solve in the wrong place or facing the wrong way. Here are the instructions, straight from the creator of VisualSFM:

  • Method 1
        0. turn off 3+
        1. Click F2, select the bad 3D points
        2. Click the 'hand' icon twice, it will delete the camera that sees the largest number of selected points
        3. Go to step 1 if there are still bad 3D points
        4. Click >>+ to grow the modified 3D model
  • Method 2 if you know which cameras are bad
        1. Click F1, draw a rectangle to select all the bad cameras (at most 250 cameras at a time)
        2. Click the "hand" icon to remove all the bad cameras
        3. Click >>+ to grow the modified 3D model.

During this phase, if VisualSFM can’t fit all the images into one solve, it may create several solves, side by side. This will be echoed by the log window which will report the creation of multiple models. If most of the images are in one solve, you might choose to proceed after removing all the cameras in the alternate solves, but this could be a sign that you need to take more images to bridge the solves.

Step D: Dense reconstruction

Finally, a dense reconstruction is initiated which takes the cameras solved by the previous step and the original images, and computes a much denser version of the point cloud which we’ll use in the next step to create a mesh. Simply hit the “cmvs” button, and save the cmvs folder structure somewhere sensible.


VisualSFM will export its internal sparse reconstruction to that folder in the format needed by cmvs / pmvs2 and that software is then run in order to populate that folder structure with the dense point cloud we desperately want! Remember where you save this folder, as you’ll need its contents later.

Screen Shot 2013-07-12 at 4.05.30 AM

This will take a while. Possibly around 20 minutes. Once the dense reconstruction is complete, hit tab to see a visualization within VisualSFM of your new cloud! Much better!

Screen Shot 2013-07-26 at 5.05.02 AM

Screen Shot 2013-07-26 at 5.05.10 AM

Screen Shot 2013-07-26 at 5.05.27 AM

Isn’t this amazing! And all with free software! So cool!

Incidentally, you can also perform a dense reconstruction with another piece of software called cmp-mvs, but it’s Windows only so I haven’t experimented with it. VisualSFM will happily export its data into the cmp-mvs format if you adjust the dropdown in the save dialog that pops up when you hit the “cmvs” button.


Once the dense reconstruction step is finished, simply quit the program: the data we need has already been written to disk.

Unless your solve runs into trouble, it’s a mostly automated process. There do seem to be some advanced features in this program I’m still exploring for handling more complex solves or for situations where VisualSFM doesn’t do everything correctly. One thing to remember is that the solve is always live: you can continue to refine it as much as you want. This iterative workflow is incredibly handy.

If some cameras fail to align properly, simply go back and shoot more pictures in the region where it failed. The program also has a mode where you can view the feature matches between images to debug strange results. As I already mentioned, you can manipulate a sparse reconstruction in progress and remove bad points / cameras that have been added in the wrong position / orientation in order to improve the rest of the solve. Lastly, since the SIFT and matching process is stored on disk (you’ll notice one .sift and .mat file for each image you fed into the program) you’ll never have to re-sift or re-match images that have already been analyzed. Adding new images to improve (or extend!) a solve only takes as much time as analyzing those new files do!

This whole process probably seems far more complicated than it really is: This whole step basically amounts to loading images in, clicking a few buttons, and a lot of waiting. You can do it!

Pipeline Part 2: Meshlab

Meshlab is an amazing piece of free software that supports a dizzying array of operations on mesh and point-cloud data. It’s ludicrously unstable, but totally free and seems to impliment some cutting edge research. It’s needed in our workflow to to mesh the point clouds generated by the SFM software, and to automate the creation of UVs and a texture.

Meshlab doesn’t really work the same way as a lot of other GUI apps. It might honestly be worth watching a video or two from MrPMeshLabTutorials, a meshlab pro with tons of great tutorials, just to learn how it works. In particular, it doesn’t really have undo (instead, changes are only made to a mesh if you “export” it…so if you want to go back, you can “reload” the current mesh you’re editing) so be careful and save your meshlab project AND export your mesh (“export as…” in order to save a new file) frequently.

Step A: Open the bundle.rd.out file.

When you create a dense reconstruction, VisualSFM exports its internal representation of the sparse reconstruction into the cmvs / pvms2 folder structure in a file called bundle.rd.out. Recent versions of Meshlab support opening these files directly, so that’s the first step. Go to File –> Open Project, and navigate to your bundle.rd.out file within the nvm directory created by VisualSFM.

Screen Shot 2013-07-26 at 5.37.59 AM

It should be inside the 00 folder in the directory you chose the name of. Importantly, Meshlab will ask you as soon as you select the bundle.rd.out file to select the camera file. That’s in a file in the same directory called, “list.txt”, so select that next.

Give Meshlab a second, then it should display your (sparse) point cloud. The other thing that this has done is actually imported the cameras from your solve into the scene as well as the photos they’re associated with (they’re called raster layers in Meshlab). This is hugely imporant for the texturing process that’ll come later.

To double check this has worked properly, click on the layers icon on the toolbar to open up the layers palette. You should see your geometry layers on the top right, and the raster layers on the bottom right of the screen. Go to Render –> Show Camera. This will display the camera frustums from all of your projectors in the viewer. The default scale of the camera visualization is huge compared to the size of our mesh, so click on the disclosure triangle next to “Show Camera” that has appeared on our layers sidebar:

Screen Shot 2013-07-23 at 1.41.57 AM

…and change the scale factor to something like 0.001. Keep decreasing it until you can see both your mesh and the camera positions around your mesh. Pretty cool, right?

Screen Shot 2013-07-26 at 5.39.30 AM

When you’re convinced the camera positions have imported properly, you should disable the camera visualization by unchecking Render –> Show Camera.

Step B: Replace sparse cloud with the dense one

The sparse mesh is what VisualSFM uses as a precursor to the dense cloud step. We needed to import the bundle.rd.out file to get the camera positions and rasters, but we should to replace the sparse cloud with the dense one. In the layers palate, right click on the mesh layer, and select “delete current mesh” in order to remove it entirely.

Screen Shot 2013-07-12 at 5.58.37 PM

Now, select File –> Import Mesh and load in your dense mesh. It should be located in your NVM_FOLDER/00/models/option-0000.ply

Screen Shot 2013-07-12 at 5.59.08 PM

Go ahead and import all .ply (but not .pset or .patch) files in the “models” folder.

It’s possible that VisualSFM (really pmvs2) generated multiple .ply files (called option-0000.ply, option-0001.ply etc). I think this happens because the dense mesher only works with 50 cameras at a time, so if your solve contains 101 cameras, you’ll end up with 3 .ply files. If you have multiple .ply files, you’ll need to flatten them into one mesh before you can procede. Simply right click on any of the mesh layers, and select “Flatter Visible Layers”. Make sure to check “Keep unreferenced vertices”:

Screen Shot 2013-07-23 at 1.18.55 AM

Now you should have one uber dense cloud AND the camera positions / raster layers!

Screen Shot 2013-07-26 at 5.45.26 AM

Step C: Clean up dense cloud

You’ll have more opportunities to clean this up later on, but any bad points you can nip in the bud now are worth it. Spend some time here.  If you use the box selection tool:

Screen Shot 2013-07-23 at 1.22.14 AM

you can select regions of points that are bad and delete them with the delete points button:

Screen Shot 2013-07-23 at 1.24.19 AM

If you’re coming from a LIDAR scan, check out the Poisson Disk Sampling filter which can simplify a point cloud and make it less dense to make the following steps faster. Even if you’re coming from photogrammetry, you can experiment with this filter to automatically remove strange points from your mesh and keep the most essential forms. Check out this tutorial for more info.

I decided to isolate just the table for this reconstruction, so you can see that I removed a ton of points from the cloud using those tools.  I also removed some strange points in and around the table including this odd floating blobs.  As you tumble around your scene, these bad points will be obvious.  I probably should have removed the chairs since I don’t really have enough good points to mesh them…you’ll see the problems they cause in a few steps.

Screen Shot 2013-07-26 at 5.41.00 AM

Step D: Sanity check Cameras

Now that you’ve got a dense mesh in a meshlab scene with your raster layers, lets double check the camera positions. Remember when we looked at the cameras in our scene a few steps back? If you select a “raster layer” from the layers sidebar and hit the “Show current raster mode” button:

Screen Shot 2013-07-23 at 1.21.53 AM

your 3d camera will snap to the position where VisualSFM thought that image was taken from in 3d space! You can scroll your mousewheel to alter the transparency of your image and double check the alignment! In my experience it’s either been pretty much dead on or totally insanely wrong. Any ones that are totally insanely wrong should be re-checked in VisualSFM…if it’s wrong in meshlab, it means that it didn’t contribute to VisualSFM’s solve! When you’ve checked your cameras, disable the “Show current raster mode” by hitting the icon again.

Screen Shot 2013-07-26 at 5.50.20 AM

It’s hard to see, but here I am about 50% between the point cloud and a photograph.  Alignment looks really good!

Step E: Meshing

Now comes the time to convert your point cloud into an actual polygonal mesh!

I spent a ton of time while working on this writeup in this step, and I have a ton to say about meshing. This is by far the most delicate part of the process. While the process of turning a point cloud into a mesh may seem obvious to us, (that point is part of the toaster, but that point is part of the ground, duh) for the computer, this is a hard problem. There exist several algorithms for doing this meshing step, but the best in breed seems to the Poisson Surface Reconstruction algorithm by Kazhdan, Bolitho, and Hoppe. Perhaps someday there will be better meshing algorithms that are available. My guess is, as long as the algorithm isn’t proprietary, it’ll make its way into Meshlab shortly after it’s announcement. There’s already another process from the same authors called Poisson Streaming Surface Reconstruction that can do the same sort of reconstructions but using dramatically less memory in sections. The Poisson Reconstruction method is open source, so you can check out the code here if you’re curious and a super genius.

Anyway, select Filter –> Points –> Surface Reconstruction: Poisson.

Screen Shot 2013-07-23 at 1.17.03 AM

You have to spend some time and experiment here. If the results are crappy, try again with slightly different settings. YMMV. This will take some time. Each time you hit “apply” in the Poisson Surface Reconstruction filter, meshlab will create a new layer with your mesh in it. Instead of deleting the ones that are bad, keep them around and maybe even rename them with the settings you chose. Protip: keep increasing the Octree Depth till you’re getting meshes that have more or less the same level of detail. That’s when you know you’ve gotten all the detail possible out of your cloud.

Sidebar: Meshing

I’m still piecing together the meaning of all the Poisson settings. This is a great guide.

The Octree Depth seems to have the most control over the detail in the mesh. The larger that number gets, the slower things seem to go / more memory you need, so be careful. The Solver Divide seems to help use less memory when the Octree depth increases. The “Samples per Node” setting seems to smooth the mesh out if it has a lot of errors in it. Leave it at 1-5 to start with and increase up to 20 to try to smooth your mesh out a bit if there are problems. (If you use Nuke, these settings are the same as in the PoissonMesh node, so if you learn a bit about these settings here, your experiences will transfer.)

The default shaders in Meshlab can make it difficult to see details in the mesh as you’re reviewing it. Check out some of the other shaders in the Render –> Shaders menu. Or take a gander at this tutorial to calculate per-vertex Ambiant Occlusion to really get a great look at the model.

Even if your cloud looks mostly correct, the biggest problem seems to be that this filter seems to prefer point clouds that are fairly uniform, but that VisualSFM’s dense point cloud generator can’t really figure out where to place points along a surface with no details like a flat wall, or a reflective surface. This means that spotty areas in your point cloud will likely correlate to failures in the Mesher. Don’t fret. It just means you’re going to have to spend some time in reconstruction correcting these regions. On the other hand, if there are errors in regions that should have a lot of points but mysteriously don’t, perhaps go back to VisualSFM and try to feed some more images of that region in order to pump up the point count in that area, and therefore, the eventual mesh detail.

Another thing to watch out for with Poisson Meshing is its signature blobby look. The Poisson Mesher always tries to return

“watertight” meshes which means that often it will return a mesh-ball with your object /environment inside. In order to excavate your mesh, you’ll need to spend some time deleting faces. You can try “Filters –> Selection –> Select faces with edges longer than” to get you started. Use the box selection tool to remove everything else. Often, smaller blobs will form around your scene and they can usually be deleted pretty easily with the box selection.

Tl;Dr: try 10,7,1,1. Increase the Octree Depth as much as it’s creating more detail, slowly increasing the Solver Divide to reduce memory requirements. Experiment with Samples per Node to see if the smoother look helps or hurts. Clean up the wacky artifacts of the Poisson meshing process.

Screen Shot 2013-07-26 at 5.58.35 AM

Screen Shot 2013-07-26 at 6.04.36 AM

Screen Shot 2013-07-26 at 5.58.25 AM

These are fine for this demo, but they could use some more love.  You’d be shocked, though, at how terrible geo can actually look fine when a totally registered, high rez texture is applied.  I included the glass Lagavulin 18 bottle to give an  example of something that would probably not ork at all…and I was right.  This will probably have to be manually rebuilt in some modeling software later on.  Similarly, the chairs themselves look pretty okay, but the holes between the back-support crossbeams and between the chairs themselves are poisson meshed into oblivion.  That’d require some better cleanup if I wasn’t in a hurry to finish this treatise!

Save (both the project, and the mesh).

Step F: Fix manifold edges

Sometimes, after meshing, there are non-manifold edges which need to be removed before we can proceed. Technically, non-manifold geometry is “any edge shared by more than two faces.” Non-technically, non-manifold geometry is bad geo that that certain algorithms can’t deal with. In this case, the texturing step coming next requires our geometry be manifold, so lets fix this now. Hit Filters –> Selection –> Select Non-Manifold edges, hit Apply, then delete them with the delete points button.

Screen Shot 2013-07-23 at 1.09.47 AM

If you don’t have any non-manifold edges, no need to worry. Ideally there should be some sort of stitching algorithm that allows you to repair damage from this step, but I haven’t found it yet, so you may get a few holes in your mesh.

Step G: Parameterization

This step automatically builds a UV map by figuring out which projector cameras have the best view of each face of the model. Filter –> Texture –> Parameterization from registered rasters, and match the following settings:

Screen Shot 2013-07-23 at 1.11.13 AM

Save (both the project, and the mesh).

Step H: Project textures!

Using the parameterization from the last step, finally project the texture from the projectors and make a texture map. Filter –> Texture –> Project active rasters color to current mesh, filling the texture, and match the following settings.

Screen Shot 2013-07-23 at 1.13.15 AM

The texture filename you choose will be the name of the texture image that gets written out. The pixel size can be any power of 2 greater than, I’d say, 512. 512 / 1024 / 2048 / 4096 / 8192…the sky (really your computer) is the limit. This is one of the best parts of this workflow: arbitrary resolution textures! Catch doesn’t give you any options about the size you’d like returned. I’d recommend aiming really high here…you can always downsample the image later and as long as it stays square, you should be fine.

Shortcut: Parameterization + texturing from registered rasters

You may have noticed that there’s a menu item in Filter –> Texturing –> Parameterization + texturing from registered rasters. You can use this filter instead of steps G and H, but this process is slightly different. Instead of averaging several images together in order to produce a sensible texture for each face, this filter chooses to source the texture for each face from one image. You’ll end up with cleaner textures, but it’s really designed to work with only a few raster layers and may get overwhelmed with tons of layers.

You’ll notice that the options for this filter combine pieces of the other two, and so any prior wisdom applies here. The only new one is Color Correction, where the filter tries to adjust differences in exposure levels between rasters since each face gets only textured by one photograph . I’d give it a shot with the following settings:

Screen Shot 2013-07-23 at 2.00.20 AM

Step I: Finalize and Export!

Voila!  How’s it lookin?  If you’ve got a browser that supports webgl, here it is in your browser via a threejs web viewer!

Please load the 3d object!  You’ve come this far!  It’s so cool!  You can rotate, zoom, and pan if you’ve got a 3 button mouse.   I down sampled the texture and simplified the geometry a bit to make it load faster, so a) this doesn’t look quite as good as the stills and b) it still takes about a minute or so to open.  There’s a progress bar in the upper right and for some reason, you have to move around before the texture will load, so wait till the mesh appears (a large black form) and then move the mouse.  If the texture doesn’t load, try reloading the page (this will take much less time).

Here are those stills if you want instant gratification, though.

Screen Shot 2013-07-26 at 5.31.21 AM

Screen Shot 2013-07-26 at 5.31.38 AM

For comparison, here is a similar photoset taken of the same table (but with different stuff) from Catch.  Both seem to be equally accurate but seem to have different types of artifacts.  I wonder too what Catch is using to mesh their point clouds.  Before you save out your final mesh, explore it a bit now that it has a texture and everything. Consider:

  • Scale: if your mesh is the wrong size, check out this tutorial for ideas about how to scale the mesh to match real world measurements.
  • QECD: If your mesh is good but incredibly heavy (ie: huge number of polys) consider saving out a few meshes, each with different amounts of the Quadratic Edge Collapse Decimation Filter applied. This amazing filter lets you simplify the mesh by percentage and retain a similar shape. Here’s a great tutorial on that filter. If you output your mesh at full, half, and quarter size, you should be able to stay interactive further down the pipeline but swap in your full quality geo at rendertime. Same for your texture!
  • Advanced Meshlab Filters: Check out this more complex remeshing tutorial from Kyle Mcdonald for some more ideas of what’s possible within Meshlab.
  • Texture resolution: If your textures look a little low-rez and this is the last step in the pipeline for you, consider going back to step H and doubling the texture resolution.  The images above seem a bit fuzzy, so I’d definitely do that.  Or stay tuned to the next part of the tutorial as I discuss texture projection which will enable us to selectively improve the resolution of certain parts of the model.
  • Feel the power of AO: Now might be a good time to check out that tutorial about calculating per vertex Ambient Occlusion so you can see your model’s nooks and crannies in high contrast.

Once you’re (mostly) satisfied, go File –> Save mesh as… a .obj file! Now you should have an obj with a texture map that comes along with it at a resolution you picked in the previous step! If you want to see the textured mesh in meshlab, save mesh as textured ply (and turn on normals in export dialog), then run reload (alt+r) and you should be textured!


These steps represent the shortest path point-cloud to textured mesh setup that I’ve found. I’ll admit that very few meshes seemed to come out of the Poisson Reconstruction process perfectly, and most require some additional work which is beyond the scope of this tutorial. Consider other software that is great for cleaning up topology, such as Blender (free) or Zbrush / Maya (Proprietary, expensive). If you’re dissatisfied with your mesh, check out:

  • Tweak mesh: If your mesh is ballpark correct, some tweaking in a modeling app like blender or maya should be fine.
  • Use Cubes and Planes: For non-organic forms, say, architectural ones, consider just using the point cloud / mesh as reference and add some simple planes / cubes into the scene in the appropriate place. Subdivided cubes work wonders for simple architectural forms.
  • Consider using a projective modeling tool: If you can import your cloud and cameras into some 3d software, consider extruding simple shapes into position and validating against the image taken from that perspective. This should be possible in commercial DCC packages, as well as blender. Nuke has a special tool for this called the ModelBuilder node. Details on this to follow in my Nuke followup!

Pipeline Part 3: Applications

So now you have a textured .obj from photos! Cool! What next? Here are some domain agnostic applications:

  • 3D Printing: Take your .obj file, convert to .stl, and print yourself a physical version of the thing you’ve been photographing! Sooooo meta. But seriously, print out your house as a keychain or a giant coffee bean statue.
  • Easy 3D Asset Capture: Don’t just reconstruct a tabletop! Visualize that new kitchen you’ve always wanted! Or scan yourself to eventually get the attention of Obi-wan Kenobi if you are in danger. Also, check out these awesome reconstructions from a film set in Poland! Does that screenshot of the reconstruction step look familiar?
  • Create assets for a game engine: Similarly, don’t build levels or props, just photograph them (alot)!
  • Terrain Generation: If you have a picture-taking quadcopter / drone of your own, use it to build a 3d terrain map of wherever you are. Check this out for more info.
  • WebGL / Three.js: Embed your textured object in a website so anyone with a modern browser can view and explore your object / environment! If you need help, start with this tutorial, or view the source of the webgl link above, it’s heavily commented.
  • 3D Diffing: Another FOSS tool, CloudCompare, lets you compare, combine, and explore point clouds. In this tutorial video, you can see how to visualize differences between two point clouds. In the video, the guy builds a point cloud of his backyard before and after digging a hole. Once the “before” and “after” clouds are aligned, CloudCompare can show you the differences between the two clouds, in this case, the hole that has been dug. Imagine using my workflow to build a point cloud of your home or yard at some interval to visualize subtle changes due to aging, temperature, or environmental effects!
  • Weather: Hannah and I were driving through a lighting storm and we were trying to figure out “where” the lightning was in space. Since I’ve been geeking out on photogrammetry for a few weeks, I immediately imagined a ring of cameras around a certain large area. If they could be synchronized and set to trigger using, say, a brightness sensor, I wonder if you could get a 3d model of a storm system / a lightning bolt…
  • Join a Global Tagging System: In the photosynth Ted Talk video linked earlier (here it is again), the lead developer, Blaise Aguera y Arcas, talks about a global tagging system using the data. Imagine a 3D model is built from thousands of photos taken of Notre Dame. The model is then “tagged” in 3D with information about, lets say, the saint statues on the facade. In the future, new photos of the cathedral could be compared to the model and your photos could be tagged with which saints you’ve taken pictures of! That is super insanely brilliant: using photogrammetry to fit pictures we take into a virtual and information-augmented model of the world. Mind Blown.
  • Make a movie with this look / technology: Check out this short from New Zealand: Sifted (see what they did there?)
  • Make a mesh of the earth: This comes to us from the creator of VisualSFM! Double check what NASA is telling you! NASA also used photogrammetry to build models of the moon from astronaut photos…could be a fun project!
  • CLI-only edition: both VisualSFM and meshlab both have command line modes, which are possibly more cryptic to use and not at all interactive but automatable and also less resource intensive. If you automate the build and meshing using some clever scripting, congratulations: you’ve rebuilt a local version of 123D Catch! Check out VisualSFM’s docs and meshlab’s “meshlabserver” mode(or this tutorial) for more info.
  • Photogrammetry in the Cloud: Once you’ve gone CLI-only, start to turn your gaze towards the cloud. VisualSFM is CUDA enabled even on remote machines, so if you built, lets say, an Amazon EC2 AMI with the software on it, and deployed it to a freakin’ supercomputer you’d be able to crunch information a lot faster than your home machine could do it. Slap a web interface on it, and this is basically what Catch is, except that you’d still have a lot of advantages of this workflow. The final result of the VisualSFM workflow can be exported once all the math is done, and so you could transfer the results of the dense reconstruction “live” back to your machine to explore and improve.
  • More ideas: Read this blog for even more ideas!

Application for this workflow for VFX work (Nuke, Mari, 3d Lighting) will be discussed in my next blog post!!!

One last thing and then I’ll shut up!

I’ll leave you with this: I did a reconstruction of the full kitchen at the house Hannah and I are working from, but my computer literally doesn’t have enough memory / CPU to mesh it in a reasonable amount of time. I might have to actually set up this cloud system in order to actually see this thing meshed. Check out these clouds though….SO COOL!

Screen Shot 2013-07-10 at 3.53.19 AM

Screen Shot 2013-07-10 at 3.53.11 AM

Screen Shot 2013-07-10 at 4.11.45 AM


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s