2021-01-07 18:41:33 -08:00
|
|
|
hg-fast-export.sh - mercurial to git converter using git-fast-import
|
2015-12-12 10:25:31 +00:00
|
|
|
=========================================================================
|
2007-03-07 15:10:39 +00:00
|
|
|
|
|
|
|
|
Legal
|
2015-12-12 10:25:31 +00:00
|
|
|
-----
|
2007-03-07 15:10:39 +00:00
|
|
|
|
2019-09-11 16:46:55 -05:00
|
|
|
Most hg-* scripts are licensed under the [MIT license] and were written
|
2007-03-19 09:18:20 +00:00
|
|
|
by Rocco Rutte <pdmef@gmx.net> with hints and help from the git list and
|
2015-12-12 10:25:31 +00:00
|
|
|
\#mercurial on freenode. hg-reset.py is licensed under GPLv2 since it
|
2007-03-19 09:18:20 +00:00
|
|
|
copies some code from the mercurial sources.
|
2007-03-07 15:10:39 +00:00
|
|
|
|
2008-11-25 11:25:22 +01:00
|
|
|
The current maintainer is Frej Drejhammar <frej.drejhammar@gmail.com>.
|
|
|
|
|
|
2019-09-11 16:46:55 -05:00
|
|
|
[MIT license]: http://www.opensource.org/licenses/mit-license.php
|
|
|
|
|
|
2017-06-02 16:18:43 +02:00
|
|
|
Support
|
|
|
|
|
-------
|
|
|
|
|
|
|
|
|
|
If you have problems with hg-fast-export or have found a bug, please
|
2019-09-11 16:46:55 -05:00
|
|
|
create an issue at the [github issue tracker]. Before creating a new
|
2017-06-02 16:18:43 +02:00
|
|
|
issue, check that your problem has not already been addressed in an
|
|
|
|
|
already closed issue. Do not contact the maintainer directly unless
|
|
|
|
|
you want to report a security bug. That way the next person having the
|
|
|
|
|
same problem can benefit from the time spent solving the problem the
|
|
|
|
|
first time.
|
|
|
|
|
|
2019-09-11 16:46:55 -05:00
|
|
|
[github issue tracker]: https://github.com/frej/fast-export/issues
|
|
|
|
|
|
2018-05-27 11:36:41 -07:00
|
|
|
System Requirements
|
|
|
|
|
-------------------
|
|
|
|
|
|
Support Python 3
Port hg-fast-import to Python 2/3 polyglot code.
Since mercurial accepts and returns bytestrings for all repository data,
the approach I've taken here is to use bytestrings throughout the
hg-fast-import code. All strings pertaining to repository data are
bytestrings. This means the code is using the same string datatype for
this data on Python 3 as it did (and still does) on Python 2.
Repository data coming from subprocess calls to git, or read from files,
is also left as the bytestrings either returned from
subprocess.check_output or as read from the file in 'rb' mode.
Regexes and string literals that are used with repository data have
all had a b'' prefix added.
When repository data is used in error/warning messages, it is decoded
with the UTF8 codec for printing.
With this patch, hg-fast-export.py writes binary output to
sys.stdout.buffer on Python 3 - on Python 2 this doesn't exist and it
still uses sys.stdout.
The only strings that are left as "native" strings and not coerced to
bytestrings are filepaths passed in on the command line, and dictionary
keys for internal data structures used by hg-fast-import.py, that do
not originate in repository data.
Mapping files are read in 'rb' mode, and thus bytestrings are read from
them. When an encoding is given, their contents are decoded with that
encoding, but then immediately encoded again with UTF8 and they are
returned as the resulting bytestrings
Other necessary changes were:
- indexing byestrings with a single index returns an integer on Python.
These indexing operations have been replaced with a one-element
slice: x[0] -> x[0:1] or x[-1] -> [-1:] so at to return a bytestring.
- raw_hash.encode('hex_codec') replaced with binascii.hexlify(raw_hash)
- str(integer) -> b'%d' % integer
- 'string_escape' codec replaced with 'unicode_escape' (which was
backported to python 2.7). Strings decoded with this codec were then
immediately re-encoded with UTF8.
- Calls to map() intended to execute their contents immediately were
unwrapped or converted to list comprehensions, since map() is an
iterator and does not execute until iterated over.
hg-fast-export.sh has been modified to not require Python 2. Instead, if
PYTHON has not been defined, it checks python2, python, then python3,
and uses the first one that exists and can import the mercurial module.
2020-02-10 21:39:13 -05:00
|
|
|
This project depends on Python 2.7 or 3.5+, and the Mercurial >= 4.6
|
|
|
|
|
package (>= 5.2, if Python 3.5+). If Python is not installed, install
|
2020-10-29 19:00:30 +01:00
|
|
|
it before proceeding. The Mercurial package can be installed with `pip
|
|
|
|
|
install mercurial`.
|
2018-05-27 11:36:41 -07:00
|
|
|
|
2019-11-12 17:46:08 +01:00
|
|
|
On windows the bash that comes with "Git for Windows" is known to work
|
|
|
|
|
well.
|
2018-05-27 11:36:41 -07:00
|
|
|
|
2007-03-07 15:10:39 +00:00
|
|
|
Usage
|
2015-12-12 10:25:31 +00:00
|
|
|
-----
|
2007-03-07 15:10:39 +00:00
|
|
|
|
2007-03-19 09:18:20 +00:00
|
|
|
Using hg-fast-export is quite simple for a mercurial repository <repo>:
|
2007-03-07 15:10:39 +00:00
|
|
|
|
2015-12-12 10:25:31 +00:00
|
|
|
```
|
2023-03-01 18:17:05 -06:00
|
|
|
git init repo-git # or whatever
|
2015-12-12 10:25:31 +00:00
|
|
|
cd repo-git
|
2018-02-13 13:37:58 +00:00
|
|
|
hg-fast-export.sh -r <local-repo>
|
2023-03-01 18:17:05 -06:00
|
|
|
git checkout
|
2015-12-12 10:25:31 +00:00
|
|
|
```
|
2007-03-07 15:10:39 +00:00
|
|
|
|
2014-05-29 20:58:34 -04:00
|
|
|
Please note that hg-fast-export does not automatically check out the
|
|
|
|
|
newly imported repository. You probably want to follow up the import
|
2015-12-12 10:34:02 +00:00
|
|
|
with a `git checkout`-command.
|
2014-05-29 20:58:34 -04:00
|
|
|
|
2007-03-07 15:10:39 +00:00
|
|
|
Incremental imports to track hg repos is supported, too.
|
|
|
|
|
|
2007-03-19 09:18:20 +00:00
|
|
|
Using hg-reset it is quite simple within a git repository that is
|
|
|
|
|
hg-fast-export'ed from mercurial:
|
|
|
|
|
|
2015-12-12 10:25:31 +00:00
|
|
|
```
|
|
|
|
|
hg-reset.sh -R <revision>
|
|
|
|
|
```
|
2007-03-19 09:18:20 +00:00
|
|
|
|
|
|
|
|
will give hints on which branches need adjustment for starting over
|
|
|
|
|
again.
|
|
|
|
|
|
2014-10-25 13:18:41 +03:00
|
|
|
When a mercurial repository does not use utf-8 for encoding author
|
2015-12-12 10:34:02 +00:00
|
|
|
strings and commit messages the `-e <encoding>` command line option
|
2014-10-25 13:18:41 +03:00
|
|
|
can be used to force fast-export to convert incoming meta data from
|
2015-11-03 16:12:46 +09:00
|
|
|
<encoding> to utf-8. This encoding option is also applied to file names.
|
|
|
|
|
|
|
|
|
|
In some locales Mercurial uses different encodings for commit messages
|
2015-12-12 10:34:02 +00:00
|
|
|
and file names. In that case, you can use `--fe <encoding>` command line
|
2015-11-03 16:12:46 +09:00
|
|
|
option which overrides the -e option for file names.
|
2014-10-25 13:18:41 +03:00
|
|
|
|
2012-01-27 20:02:54 +01:00
|
|
|
As mercurial appears to be much less picky about the syntax of the
|
|
|
|
|
author information than git, an author mapping file can be given to
|
|
|
|
|
hg-fast-export to fix up malformed author strings. The file is
|
|
|
|
|
specified using the -A option. The file should contain lines of the
|
2017-09-30 14:51:24 +02:00
|
|
|
form `"<key>"="<value>"`. Inside the key and value strings, all escape
|
2020-03-25 12:31:16 -04:00
|
|
|
sequences understood by the python `unicode_escape` encoding are
|
|
|
|
|
supported; strings are otherwise assumed to be UTF8-encoded.
|
|
|
|
|
(Versions of fast-export prior to v171002 had a different syntax, the
|
|
|
|
|
old syntax can be enabled by the flag `--mappings-are-raw`.)
|
2017-09-30 14:51:24 +02:00
|
|
|
|
|
|
|
|
The example authors.map below will translate `User
|
|
|
|
|
<garbage<tab><user@example.com>` to `User <user@example.com>`.
|
2012-01-27 20:02:54 +01:00
|
|
|
|
2015-12-12 10:25:31 +00:00
|
|
|
```
|
2012-01-27 20:02:54 +01:00
|
|
|
-- Start of authors.map --
|
2017-09-30 14:51:24 +02:00
|
|
|
"User <garbage\t<user@example.com>"="User <user@example.com>"
|
2012-01-27 20:02:54 +01:00
|
|
|
-- End of authors.map --
|
2015-12-12 10:25:31 +00:00
|
|
|
```
|
2012-01-27 20:02:54 +01:00
|
|
|
|
2020-02-14 17:16:18 +01:00
|
|
|
If you have many Mercurial repositories, Chris J Billington's
|
|
|
|
|
[hg-export-tool] allows you to batch convert them.
|
|
|
|
|
|
2015-08-16 17:13:04 +02:00
|
|
|
Tag and Branch Naming
|
2015-12-12 10:25:31 +00:00
|
|
|
---------------------
|
2015-08-16 17:13:04 +02:00
|
|
|
|
|
|
|
|
As Git and Mercurial have differ in what is a valid branch and tag
|
|
|
|
|
name the -B and -T options allow a mapping file to be specified to
|
|
|
|
|
rename branches and tags (respectively). The syntax of the mapping
|
|
|
|
|
file is the same as for the author mapping.
|
|
|
|
|
|
2019-05-10 18:52:57 +02:00
|
|
|
When the -B and -T flags are used, you will probably want to use the
|
|
|
|
|
-n flag to disable the built-in (broken in many cases) sanitizing of
|
|
|
|
|
branch/tag names. In the future -n will become the default, but in
|
|
|
|
|
order to not break existing incremental conversions, the default
|
|
|
|
|
remains with the old behavior.
|
|
|
|
|
|
2019-12-18 10:23:36 -07:00
|
|
|
By default, the `default` mercurial branch is renamed to the `master`
|
|
|
|
|
branch on git. If your mercurial repo contains both `default` and
|
|
|
|
|
`master` branches, you'll need to override this behavior. Use
|
|
|
|
|
`-M <newName>` to specify what name to give the `default` branch.
|
|
|
|
|
|
2018-06-17 21:09:59 +03:00
|
|
|
Content filtering
|
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
|
|
hg-fast-export supports filtering the content of exported files.
|
|
|
|
|
The filter is supplied to the --filter-contents option. hg-fast-export
|
|
|
|
|
runs the filter for each exported file, pipes its content to the filter's
|
|
|
|
|
standard input, and uses the filter's standard output in place
|
|
|
|
|
of the file's original content. The prototypical use of this feature
|
|
|
|
|
is to convert line endings in text files from CRLF to git's preferred LF:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
-- Start of crlf-filter.sh --
|
|
|
|
|
#!/bin/sh
|
|
|
|
|
# $1 = pathname of exported file relative to the root of the repo
|
|
|
|
|
# $2 = Mercurial's hash of the file
|
|
|
|
|
# $3 = "1" if Mercurial reports the file as binary, otherwise "0"
|
|
|
|
|
|
2022-09-19 16:11:45 +02:00
|
|
|
if [ "$3" == "1" ]; then cat; else dos2unix -q; fi
|
|
|
|
|
# -q option in call to dos2unix allows to avoid returning an
|
|
|
|
|
# error code when handling non-ascii based text files (like UTF-16
|
|
|
|
|
# encoded text files)
|
2018-06-17 21:09:59 +03:00
|
|
|
-- End of crlf-filter.sh --
|
|
|
|
|
```
|
|
|
|
|
|
2018-12-05 09:23:35 -08:00
|
|
|
|
|
|
|
|
Plugins
|
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
|
|
hg-fast-export supports plugins to manipulate the file data and commit
|
|
|
|
|
metadata. The plugins are enabled with the --plugin option. The value
|
|
|
|
|
of said option is a plugin name (by folder in the plugins directory),
|
|
|
|
|
and optionally, and equals-sign followed by an initialization string.
|
|
|
|
|
|
|
|
|
|
There is a readme accompanying each of the bundled plugins, with a
|
|
|
|
|
description of the usage. To create a new plugin, one must simply
|
|
|
|
|
add a new folder under the `plugins` directory, with the name of the
|
|
|
|
|
new plugin. Inside, there must be an `__init__.py` file, which contains
|
|
|
|
|
at a minimum:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
def build_filter(args):
|
|
|
|
|
return Filter(args)
|
|
|
|
|
|
|
|
|
|
class Filter:
|
|
|
|
|
def __init__(self, args):
|
|
|
|
|
pass
|
|
|
|
|
#Or don't pass, if you want to do some init code here
|
|
|
|
|
```
|
|
|
|
|
|
2018-12-05 09:24:15 -08:00
|
|
|
Beyond the boilerplate initialization, you can see the two different
|
|
|
|
|
defined filter methods in the [dos2unix](./plugins/dos2unix) and
|
|
|
|
|
[branch_name_in_commit](./plugins/branch_name_in_commit) plugins.
|
2018-12-05 09:23:35 -08:00
|
|
|
|
|
|
|
|
```
|
2021-02-19 13:23:49 -07:00
|
|
|
commit_data = {'branch': branch, 'parents': parents, 'author': author, 'desc': desc, 'revision': revision, 'hg_hash': hg_hash, 'committer': 'committer', 'extra': extra}
|
2018-12-05 09:23:35 -08:00
|
|
|
|
|
|
|
|
def commit_message_filter(self,commit_data):
|
|
|
|
|
```
|
|
|
|
|
The `commit_message_filter` method is called for each commit, after parsing
|
|
|
|
|
from hg, but before outputting to git. The dictionary `commit_data` contains the
|
|
|
|
|
above attributes about the commit, and can be modified by any filter. The
|
|
|
|
|
values in the dictionary after filters have been run are used to create the git
|
|
|
|
|
commit.
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
file_data = {'filename':filename,'file_ctx':file_ctx,'d':d}
|
|
|
|
|
|
|
|
|
|
def file_data_filter(self,file_data):
|
|
|
|
|
```
|
|
|
|
|
The `file_data_filter` method is called for each file within each commit.
|
|
|
|
|
The dictionary `file_data` contains the above attributes about the file, and
|
|
|
|
|
can be modified by any filter. `file_ctx` is the filecontext from the
|
|
|
|
|
mercurial python library. After all filters have been run, the values
|
|
|
|
|
are used to add the file to the git commit.
|
|
|
|
|
|
2019-01-02 12:38:34 +01:00
|
|
|
Submodules
|
|
|
|
|
----------
|
|
|
|
|
See README-SUBMODULES.md for how to convert subrepositories into git
|
|
|
|
|
submodules.
|
|
|
|
|
|
2007-03-13 15:27:29 +00:00
|
|
|
Notes/Limitations
|
2015-12-12 10:25:31 +00:00
|
|
|
-----------------
|
2007-03-13 15:27:29 +00:00
|
|
|
|
2012-01-27 20:06:07 +01:00
|
|
|
hg-fast-export supports multiple branches but only named branches with
|
2014-01-08 20:29:55 +01:00
|
|
|
exactly one head each. Otherwise commits to the tip of these heads
|
2020-11-04 18:23:22 -05:00
|
|
|
within the branch will get flattened into merge commits. There are a
|
|
|
|
|
few options to deal with this:
|
|
|
|
|
1. Chris J Billington's [hg-export-tool] can help you to handle branches with
|
|
|
|
|
duplicate heads.
|
|
|
|
|
2. Use the [head2branch plugin](./plugins/head2branch) to create a new named
|
|
|
|
|
branch from an unnamed head.
|
2022-11-19 18:15:04 +01:00
|
|
|
3. You can ignore unnamed heads with the `--ignore-unnamed-heads` option, which
|
2020-11-04 18:23:22 -05:00
|
|
|
is appropriate in situations such as the extra heads being close commits
|
|
|
|
|
(abandoned, unmerged changes).
|
2007-03-07 15:10:39 +00:00
|
|
|
|
2020-02-07 17:50:59 -05:00
|
|
|
hg-fast-export will ignore any files or directories tracked by mercurial
|
|
|
|
|
called `.git`, and will print a warning if it encounters one. Git cannot
|
|
|
|
|
track such files or directories. This is not to be confused with submodules,
|
|
|
|
|
which are described in README-SUBMODULES.md.
|
|
|
|
|
|
2012-01-27 20:06:07 +01:00
|
|
|
As each git-fast-import run creates a new pack file, it may be
|
|
|
|
|
required to repack the repository quite often for incremental imports
|
|
|
|
|
(especially when importing a small number of changesets per
|
|
|
|
|
incremental import).
|
2007-03-08 09:37:41 +00:00
|
|
|
|
2009-01-16 20:09:52 +01:00
|
|
|
The way the hg API and remote access protocol is designed it is not
|
|
|
|
|
possible to use hg-fast-export on remote repositories
|
|
|
|
|
(http/ssh). First clone the repository, then convert it.
|
|
|
|
|
|
2007-03-08 09:37:41 +00:00
|
|
|
Design
|
2015-12-12 10:25:31 +00:00
|
|
|
------
|
2007-03-08 09:37:41 +00:00
|
|
|
|
2021-01-07 18:41:33 -08:00
|
|
|
hg-fast-export was designed in a way that doesn't require a 2-pass
|
|
|
|
|
mechanism or any prior repository analysis: it just feeds what it
|
2012-01-27 20:06:07 +01:00
|
|
|
finds into git-fast-import. This also implies that it heavily relies
|
|
|
|
|
on strictly linear ordering of changesets from hg, i.e. its
|
|
|
|
|
append-only storage model so that changesets hg-fast-export already
|
|
|
|
|
saw never get modified.
|
2007-03-08 10:12:01 +00:00
|
|
|
|
2012-10-13 16:27:06 +02:00
|
|
|
Submitting Patches
|
2015-12-12 10:25:31 +00:00
|
|
|
------------------
|
2012-10-13 16:27:06 +02:00
|
|
|
|
2020-09-09 13:27:44 +02:00
|
|
|
Please create a pull request at
|
|
|
|
|
[Github](https://github.com/frej/fast-export/pulls) to submit patches.
|
|
|
|
|
|
|
|
|
|
When submitting a patch make sure the commits in your pull request:
|
|
|
|
|
|
|
|
|
|
* Have good commit messages
|
|
|
|
|
|
|
|
|
|
Please read Chris Beams' blog post [How to Write a Git Commit
|
|
|
|
|
Message](https://chris.beams.io/posts/git-commit/) on how to write a
|
|
|
|
|
good commit message. Although the article recommends at most 50
|
|
|
|
|
characters for the subject, up to 72 characters are frequently
|
|
|
|
|
accepted for fast-export.
|
|
|
|
|
|
|
|
|
|
* Adhere to good [commit
|
|
|
|
|
hygiene](http://www.ericbmerritt.com/2011/09/21/commit-hygiene-and-git.html)
|
|
|
|
|
|
|
|
|
|
When developing a pull request for hg-fast-export, base your work on
|
|
|
|
|
the current `master` branch and rebase your work if it no longer can
|
|
|
|
|
be merged into the current `master` without conflicts. Never merge
|
|
|
|
|
`master` into your development branch, rebase if your work needs
|
|
|
|
|
updates from `master`.
|
|
|
|
|
|
|
|
|
|
When a pull request is modified due to review feedback, please
|
|
|
|
|
incorporate the changes into the proper commit. A good reference on
|
|
|
|
|
how to modify history is in the [Pro Git book, Section
|
|
|
|
|
7.6](https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History).
|
|
|
|
|
|
2021-07-29 15:28:01 +02:00
|
|
|
Please do not submit a pull request if you are not willing to spend
|
|
|
|
|
the time required to address review comments or revise the patch until
|
|
|
|
|
it follows the guidelines above. A _take it or leave it_ approach to
|
|
|
|
|
contributing wastes both your and the maintainer's time.
|
2019-09-19 16:28:43 +02:00
|
|
|
|
|
|
|
|
Frequent Problems
|
|
|
|
|
=================
|
|
|
|
|
|
|
|
|
|
* git fast-import crashes with: `error: cannot lock ref 'refs/heads/...`
|
|
|
|
|
|
|
|
|
|
Branch names in git behave as file names (as they are just files and
|
|
|
|
|
sub-directories under `refs/heads/`, and a path cannot name both a
|
|
|
|
|
file and a directory, i.e. the branches `a` and `a/b` can never
|
|
|
|
|
exist at the same time in a git repo.
|
|
|
|
|
|
|
|
|
|
Use a mapping file to rename the troublesome branch names.
|
|
|
|
|
|
|
|
|
|
* `Branch [<branch-name>] modified outside hg-fast-export` but I have
|
|
|
|
|
not touched the repo!
|
|
|
|
|
|
|
|
|
|
If you are running fast-export on a case-preserving but
|
|
|
|
|
case-insensitive file system (Windows and OSX), this will make git
|
|
|
|
|
treat `A` and `a` as the same branch. The solution is to use a
|
|
|
|
|
mapping file to rename branches which only differ in case.
|
|
|
|
|
|
|
|
|
|
* My mapping file does not seem to work when I rename the branch `git
|
|
|
|
|
fast-import` crashes on!
|
|
|
|
|
|
|
|
|
|
fast-export (imperfectly) mangles branch names it thinks won't be
|
|
|
|
|
valid. The mechanism cannot be removed as it would break already
|
|
|
|
|
existing incremental imports that expects it. When fast export
|
|
|
|
|
mangles a name, it prints out a warning of the form `Warning:
|
|
|
|
|
sanitized branch [<unmangled>] to [<mangled>]`. If `git fast-import`
|
|
|
|
|
crashes on `<mangled>`, you need to put `<unmangled>` into the
|
|
|
|
|
mapping file.
|
|
|
|
|
|
|
|
|
|
* fast-import mangles valid git branch names which I have remapped!
|
|
|
|
|
|
|
|
|
|
Use the `-n` flag to hg-fast-export.sh.
|
|
|
|
|
|
|
|
|
|
* `git status` reports that all files are scheduled for deletion after
|
|
|
|
|
the initial conversion.
|
|
|
|
|
|
|
|
|
|
By design fast export does not touch your working directory, so to
|
|
|
|
|
git it looks like you have deleted all files, when in fact they have
|
|
|
|
|
never been checked out. Just do a checkout of the branch you want.
|
2020-02-14 17:16:18 +01:00
|
|
|
|
2020-11-04 18:23:22 -05:00
|
|
|
* `Error: repository has at least one unnamed head: hg r<N>`
|
|
|
|
|
|
|
|
|
|
By design, hg-fast-export cannot deal with extra heads on a branch.
|
|
|
|
|
There are a few options depending on whether the extra heads are
|
|
|
|
|
in-use/open or normally closed. See [Notes/Limitations](#noteslimitations)
|
|
|
|
|
section for more details.
|
|
|
|
|
|
2020-03-09 16:41:26 +02:00
|
|
|
[hg-export-tool]: https://github.com/chrisjbillington/hg-export-tool
|