39 Commits

Author SHA1 Message Date
Frej Drejhammar
b0d5e56c8d Merge branch 'PR/247' 2020-10-29 19:01:04 +01:00
Frej Drejhammar
787e8559b9 Fix typo in README 2020-10-29 19:00:30 +01:00
Henrik Tunedal
ab500a24a7 Add plugin for dropping commits from output 2020-10-29 12:04:27 +01:00
Frej Drejhammar
ead75895b0 Enable code analysis
Merge github generated workflow into master
2020-10-10 16:26:53 +02:00
Frej Drejhammar
bf5f14ddab Create codeql-analysis.yml 2020-10-10 13:15:54 +00:00
Frej Drejhammar
7057ce2c2b Allow plugins to modify the committer
Plugins have since they were introduced been able to modify the author
of a commit, but not the committer. This patch adds the necessary
support for allowing them to also modify the committer.
2020-09-30 17:47:33 +02:00
Frej Drejhammar
2b6f735b8c Update section about submitting patches in README
Try to cover the most common reasons for requesting changes in PRs.
2020-09-09 14:08:00 +02:00
Frej Drejhammar
71acb42a09 Merge branch 'PR/236-v2' into master
Implement a plugin converting unnamed heads to branches
2020-07-31 17:08:04 +02:00
Ondrej Stanek
a7955bc49b Update head2branch plugin to accept hg commit hash
The revision number isn't a unique identifier of commits across
repository clones and forks, while the hg hash is guaranteed to be stable.
2020-07-31 10:50:57 +02:00
Ondrej Stanek
9c6dea9fd4 Pass original hg commit hash to plugins 2020-07-31 10:50:51 +02:00
Ethan Furman
21827a53f7 Add head2branch plugin
Support converting unnamed heads to named branches during mercurial
conversions.

Co-Authored-By:	ostan89@gmail.com
2020-07-31 10:49:08 +02:00
Ethan Furman
5c1cbf82b0 Add revision to commit_data for commit plugins
Co-Authored-By: ostan89@gmail.com
2020-07-31 10:48:33 +02:00
Ondrej Stanek
50631c4b34 Add option --ignore-unnamed-heads
This option allows the user to ignore only unnamed heads (compared to --force
which ignores all non-fatal issues). The intended use is for a future plugin
converting unnamed heads to named branches.
2020-07-31 10:30:53 +02:00
Ethan Furman
2a9dd53d14 Show all unnamed heads at once
Co-Authored-By: ostan89@gmail.com
2020-07-31 10:27:07 +02:00
Frej Drejhammar
597093eaf1 Merge branch 'fix-233'
Closes #233
2020-07-10 16:52:17 +02:00
Frej Drejhammar
3910044a97 Avoid crash during rev-parse when the default encoding is ascii
In some locales the default encoding is ascii in which case
subprocess.check_output() will fail if it is given a non-ascii ref as
one of the arguments. By forcing the ref to be utf8 we will avoid a
crash while still behaving correctly when the default encoding is
utf8.

The credits for this fix go to Nikita Bazhinov for discovering the fix
and Chris J Billington for explaining it.

Co-Authored-By: Nikita Bazhinov <nbazhinov@syntellect.ru>
Co-Authored-By: Chris J Billington <chrisjbillington@gmail.com>
2020-07-10 16:41:38 +02:00
Frej Drejhammar
44c50d0fae Merge branch 'PR/226' 2020-05-07 20:10:24 +02:00
chrisjbillington
d29d30363b Fix backward incompatible change for hg < 5.1
The port to Python 3 in b961f146 changed `repo.branchmap().iteritems()`
to use `.items()` instead. However, the object returned by mercurial
isn't a dictionary and its `.items()` method was only introduced (as an
alias for `iteritems`) in hg 5.1. `iteritems()` still exists, so let's
keep using it for now to retain compatibility with hg < 5.1.
2020-05-06 11:59:49 -04:00
Frej Drejhammar
f102d2a69f Merge branch 'PR/223'
Closes #223
2020-05-06 16:31:13 +02:00
Ondrej Stanek
cf0e5837b6 Allow converting a repository with git and hg subrepos
In the verification phase, fast-export falsely expects that both hg
and git subrepositories should have the appropriate line in the
subrepo-map file. The case is, that only hg subrepos need a line in
subrepo-map that references a converted subrepo, while git
subrepositories do not.
2020-05-06 16:30:05 +02:00
Frej Drejhammar
61d22307af Merge branch 'PR/217'
Closes: #215
2020-03-26 20:17:20 +01:00
chrisjbillington
3b3f86b71e Allow utf8 in mappings
We were previously processing entries in mapping files (when
`--mappings-are-raw` is not given) with
`.decode('unicode_escape').encode('utf8')` to replace backslash escape
sequences in bytestrings with the utf-8 encoded characters they
represent. However, it turns out that `.decode
('unicode_escape')` assumes latin-1 encoding if it encounters non-ascii
bytes: https://bugs.python.org/issue21331. So this gave incorrect
results if non-ascii utf8 data was present in the mapping.

To fix this, we now add an extra layer of `.decode('utf8').encode
('unicode-escape')` in order to convert any non-ascii characters into
their backslash escape sequences. Then the subsequent
`.decode('unicode_escape')` only encounters ascii characters and gives
correct results.
2020-03-25 12:33:42 -04:00
Frej Drejhammar
e51844cd65 Merge branch 'PR/214'
Closes: #213
2020-03-25 16:09:01 +01:00
Toni Sissala
90eeef2ff4 Fix TypeError when using -M command line argument
hg-fast-export.sanitize_name expects branch name to be a bytes
object. Command line parser gives out str objects. Convert
possible str object to bytes in hg2git.set_default_branch().
2020-03-25 11:19:25 +02:00
Frej Drejhammar
7f4d9c3ad4 Merge branch 'PR/211' 2020-03-10 17:51:47 +01:00
Pi Delport
b37420f404 Fix link markup for hg-export-tool 2020-03-09 16:41:26 +02:00
Frej Drejhammar
f2aa47fdf7 Merge branch 'PR/210'
Closes #210.
2020-03-08 19:43:23 +01:00
chrisjbillington
6361b44c33 Fix bug in ignoring .git files/folders on Windows
Mercurial internally stores (most) filepaths using forward slashes, and
returns them as such from its Python API, even on Windows.

So the splitting up of filepaths with `os.path.sep` was incorrect,
resulting in `.git` files (those within a subdirectory, anyway)
not being ignored on Windows as intended. Splitting on `b'/'` regardless
of OS fixes this.
2020-03-08 19:40:50 +01:00
Frej Drejhammar
afeb58ae95 Merge branch 'PR/209' 2020-03-06 17:30:52 +01:00
chrisjbillington
48508ee299 Fix failure to print error message in verify_heads
On Python 3, `b'%s' % None` fails with a TypeError. In verify_heads,
an error message prints the sha1 of a git commit, but that sha1
can be None.

This commit instead prints `b'<None>'` if sha1 is None.
2020-03-06 11:02:38 -05:00
Frej Drejhammar
56da62847a Merge branch 'PR/208'
Closes #207.
2020-03-01 14:34:38 +01:00
Max Fuqua
750fe6d3e1 Resolve type error resulting from passing an int to b'%s' in python3 2020-02-29 14:55:15 -05:00
Frej Drejhammar
e4d6d433ec Merge branch 'PR/206' 2020-02-29 14:48:46 +01:00
Steven Peters
058c791b75 Check python's mercurial version for compatibility
When checking that python has the mercurial package in hg-fast-export.sh,
use the same import statement that is used in hg-fast-export.py.

hg-fast-export.py imports revsymbol from mercurial.scmutil,
which was introduced in mercurial 4.6, but Ubuntu 18.04 only has
mercurial 4.5.3 using python2, so an incompatible python version may be
chosen without this change.
2020-02-28 15:41:24 -08:00
Frej Drejhammar
13010f7a25 Merge branch 'PR/204'
Closes #203.
2020-02-21 16:34:03 +01:00
chrisjbillington
4071f720b0 Fix issue #203: Resolve stderr encoding issues
In Python 3, `sys.stderr.write()` requires unicode strings, and all
output on standard streams is UTF8 encoded. Therefore in the port to
Python 3, we `.decode()`d all strings that are used in `%` formatting of
strings to be printed to stderr.

However, in Python 2, `sys.stderr` accepts either bytestrings or unicode
strings, and:

- `%s` formatting of a bytestring with a unicode string, i.e  `"%s" %
  u"foo"` results in a unicode string.
- Writing a unicode string to stderr/stdout uses that stream's encoding
- When the output of the process is being piped somewhere other than a
  terminal (as it is when called with pipes and shell redirection from
  hg-fast-export.sh), that encoding is None, which implies ASCII.
- This raises UnicodeEncodeError if the unicode strings passed to
  `stderr.write()` have non-ascii characters.

We cannot fix this problem simply by encoding UTF8 again before writing
to stderr on Python 2. This is because the *decoding* of filenames with
the UTF8 codec may fail - filenames may not even be valid UTF8 desite
this being the declared filesystem encoding.

We could `fsdecode()` filenames on Python 3, which would use the
`surrogateescape` error handler, but stderr does not use this error
handler for output, meaning we would just have to encode again (with the
same error handler) anyway. And Python 2 lacks the `surrogateescape`
error handler in any case - we would need to reimplement it just to do a
round-trip decode and encode for no reason.

This commit leaves filenames and other repository data as bytestrings,
and simply writes them to `sys.stderr.buffer` on Python 3 or
`sys.stderr` on Python 2 as-is, after `%` formatting with bytestring
literals. This avoids encoding issues of filenames altogether.

Other writing to stderr that does not involve repository data has been
left with "native" strings, i.e.
`sys.stderr.write("a string literal %s" % a_command_line_arg)`. These
will still fail on Python 3 if the user passes a non-UTF filename as a
command line argument or similar. This is acceptable IMHO - although
`hg-fast-export` may encounter invalid UTF8 in mercurial repositories,
it is not too much to impose that the user name their branch mapping
files etc with valid UTF8!
2020-02-19 12:18:00 -05:00
Frej Drejhammar
160aa3c9ef Add a reference to hg-export-tool in the documentation
Add pointers to hg-export-tool as a way to batch convert multiple
Mercurial repos, and deal with duplicate heads.
2020-02-14 17:16:18 +01:00
Frej Drejhammar
883474184d Merge branch 'PR/201'
Closes 201
2020-02-14 17:01:35 +01:00
chrisjbillington
b961f146df Support Python 3
Port hg-fast-import to Python 2/3 polyglot code.

Since mercurial accepts and returns bytestrings for all repository data,
the approach I've taken here is to use bytestrings throughout the
hg-fast-import code. All strings pertaining to repository data are
bytestrings. This means the code is using the same string datatype for
this data on Python 3 as it did (and still does) on Python 2.

Repository data coming from subprocess calls to git, or read from files,
is also left as the bytestrings either returned from
subprocess.check_output or as read from the file in 'rb' mode.

Regexes and string literals that are used with repository data have
all had a b'' prefix added.

When repository data is used in error/warning messages, it is decoded
with the UTF8 codec for printing.

With this patch, hg-fast-export.py writes binary output to
sys.stdout.buffer on Python 3 - on Python 2 this doesn't exist and it
still uses sys.stdout.

The only strings that are left as "native" strings and not coerced to
bytestrings are filepaths passed in on the command line, and dictionary
keys for internal data structures used by hg-fast-import.py, that do
not originate in repository data.

Mapping files are read in 'rb' mode, and thus bytestrings are read from
them. When an encoding is given, their contents are decoded with that
encoding, but then immediately encoded again with UTF8 and they are
returned as the resulting bytestrings

Other necessary changes were:

 - indexing byestrings with a single index returns an integer on Python.
   These indexing operations have been replaced with a one-element
   slice: x[0] -> x[0:1] or x[-1] -> [-1:] so at to return a bytestring.

 - raw_hash.encode('hex_codec') replaced with binascii.hexlify(raw_hash)

 - str(integer) -> b'%d' % integer

 - 'string_escape' codec replaced with 'unicode_escape' (which was
    backported to python 2.7). Strings decoded with this codec were then
    immediately re-encoded with UTF8.

 - Calls to map() intended to execute their contents immediately were
   unwrapped or converted to list comprehensions, since map() is an
   iterator and does not execute until iterated over.

hg-fast-export.sh has been modified to not require Python 2. Instead, if
PYTHON has not been defined, it checks python2, python, then python3,
and uses the first one that exists and can import the mercurial module.
2020-02-13 14:35:19 -05:00
17 changed files with 731 additions and 211 deletions

71
.github/workflows/codeql-analysis.yml vendored Normal file
View File

@@ -0,0 +1,71 @@
# For most projects, this workflow file will not need changing; you simply need
# to commit it to your repository.
#
# You may wish to alter this file to override the set of languages analyzed,
# or to provide custom queries or build logic.
name: "CodeQL"
on:
push:
branches: [master]
pull_request:
# The branches below must be a subset of the branches above
branches: [master]
schedule:
- cron: '0 15 * * 4'
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
# Override automatic language detection by changing the below list
# Supported options are ['csharp', 'cpp', 'go', 'java', 'javascript', 'python']
language: ['python']
# Learn more...
# https://docs.github.com/en/github/finding-security-vulnerabilities-and-errors-in-your-code/configuring-code-scanning#overriding-automatic-language-detection
steps:
- name: Checkout repository
uses: actions/checkout@v2
with:
# We must fetch at least the immediate parents so that if this is
# a pull request then we can checkout the head.
fetch-depth: 2
# If this run was triggered by a pull request event, then checkout
# the head of the pull request instead of the merge commit.
- run: git checkout HEAD^2
if: ${{ github.event_name == 'pull_request' }}
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v1
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.
# queries: ./path/to/local/query, your-org/your-repo/queries@main
# Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v1
# Command-line programs to run using the OS shell.
# 📚 https://git.io/JvXDl
# ✏️ If the Autobuild fails above, remove it and uncomment the following three lines
# and modify them (or add more) to build your code if your project
# uses a compiled language
#- run: |
# make bootstrap
# make release
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v1

View File

@@ -29,9 +29,10 @@ first time.
System Requirements System Requirements
------------------- -------------------
This project depends on Python 2.7 and the Mercurial >= 4.6 This project depends on Python 2.7 or 3.5+, and the Mercurial >= 4.6
package. If Python is not installed, install it before proceeding. The package (>= 5.2, if Python 3.5+). If Python is not installed, install
Mercurial package can be installed with `pip install mercurial`. it before proceeding. The Mercurial package can be installed with `pip
install mercurial`.
On windows the bash that comes with "Git for Windows" is known to work On windows the bash that comes with "Git for Windows" is known to work
well. well.
@@ -79,10 +80,10 @@ author information than git, an author mapping file can be given to
hg-fast-export to fix up malformed author strings. The file is hg-fast-export to fix up malformed author strings. The file is
specified using the -A option. The file should contain lines of the specified using the -A option. The file should contain lines of the
form `"<key>"="<value>"`. Inside the key and value strings, all escape form `"<key>"="<value>"`. Inside the key and value strings, all escape
sequences understood by the python `string_escape` encoding are sequences understood by the python `unicode_escape` encoding are
supported. (Versions of fast-export prior to v171002 had a different supported; strings are otherwise assumed to be UTF8-encoded.
syntax, the old syntax can be enabled by the flag (Versions of fast-export prior to v171002 had a different syntax, the
`--mappings-are-raw`.) old syntax can be enabled by the flag `--mappings-are-raw`.)
The example authors.map below will translate `User The example authors.map below will translate `User
<garbage<tab><user@example.com>` to `User <user@example.com>`. <garbage<tab><user@example.com>` to `User <user@example.com>`.
@@ -93,6 +94,9 @@ The example authors.map below will translate `User
-- End of authors.map -- -- End of authors.map --
``` ```
If you have many Mercurial repositories, Chris J Billington's
[hg-export-tool] allows you to batch convert them.
Tag and Branch Naming Tag and Branch Naming
--------------------- ---------------------
@@ -163,7 +167,7 @@ defined filter methods in the [dos2unix](./plugins/dos2unix) and
[branch_name_in_commit](./plugins/branch_name_in_commit) plugins. [branch_name_in_commit](./plugins/branch_name_in_commit) plugins.
``` ```
commit_data = {'branch': branch, 'parents': parents, 'author': author, 'desc': desc} commit_data = {'branch': branch, 'parents': parents, 'author': author, 'desc': desc, 'revision': revision, 'hg_hash': hg_hash, 'committer': 'committer'}
def commit_message_filter(self,commit_data): def commit_message_filter(self,commit_data):
``` ```
@@ -194,7 +198,11 @@ Notes/Limitations
hg-fast-export supports multiple branches but only named branches with hg-fast-export supports multiple branches but only named branches with
exactly one head each. Otherwise commits to the tip of these heads exactly one head each. Otherwise commits to the tip of these heads
within the branch will get flattened into merge commits. within the branch will get flattened into merge commits. Chris J
Billington's [hg-export-tool] can help you to handle branches with
duplicate heads.
Alternatively, you can use the [head2branch plugin](./plugins/head2branch)
to create a new named branch from an unnamed head.
hg-fast-export will ignore any files or directories tracked by mercurial hg-fast-export will ignore any files or directories tracked by mercurial
called `.git`, and will print a warning if it encounters one. Git cannot called `.git`, and will print a warning if it encounters one. Git cannot
@@ -223,15 +231,33 @@ saw never get modified.
Submitting Patches Submitting Patches
------------------ ------------------
Please use the [issue-tracker](https://github.com/frej/fast-export) at Please create a pull request at
github to report bugs and submit patches. [Github](https://github.com/frej/fast-export/pulls) to submit patches.
When submitting a patch make sure the commits in your pull request:
* Have good commit messages
Please read Chris Beams' blog post [How to Write a Git Commit
Message](https://chris.beams.io/posts/git-commit/) on how to write a
good commit message. Although the article recommends at most 50
characters for the subject, up to 72 characters are frequently
accepted for fast-export.
* Adhere to good [commit
hygiene](http://www.ericbmerritt.com/2011/09/21/commit-hygiene-and-git.html)
When developing a pull request for hg-fast-export, base your work on
the current `master` branch and rebase your work if it no longer can
be merged into the current `master` without conflicts. Never merge
`master` into your development branch, rebase if your work needs
updates from `master`.
When a pull request is modified due to review feedback, please
incorporate the changes into the proper commit. A good reference on
how to modify history is in the [Pro Git book, Section
7.6](https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History).
Please read
[https://chris.beams.io/posts/git-commit/](https://chris.beams.io/posts/git-commit/)
on how to write a good commit message before submitting a pull request
for review. Although the article recommends at most 50 characters for
the subject, up to 72 characters are frequently accepted for
fast-export.
Frequent Problems Frequent Problems
================= =================
@@ -274,3 +300,5 @@ Frequent Problems
By design fast export does not touch your working directory, so to By design fast export does not touch your working directory, so to
git it looks like you have deleted all files, when in fact they have git it looks like you have deleted all files, when in fact they have
never been checked out. Just do a checkout of the branch you want. never been checked out. Just do a checkout of the branch you want.
[hg-export-tool]: https://github.com/chrisjbillington/hg-export-tool

View File

@@ -11,9 +11,13 @@ from optparse import OptionParser
import re import re
import sys import sys
import os import os
from binascii import hexlify
import pluginloader import pluginloader
PY2 = sys.version_info.major == 2
if PY2:
str = unicode
if sys.platform == "win32": if PY2 and sys.platform == "win32":
# On Windows, sys.stdout is initially opened in text mode, which means that # On Windows, sys.stdout is initially opened in text mode, which means that
# when a LF (\n) character is written to sys.stdout, it will be converted # when a LF (\n) character is written to sys.stdout, it will be converted
# into CRLF (\r\n). That makes git blow up, so use this platform-specific # into CRLF (\r\n). That makes git blow up, so use this platform-specific
@@ -22,7 +26,7 @@ if sys.platform == "win32":
msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY) msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
# silly regex to catch Signed-off-by lines in log message # silly regex to catch Signed-off-by lines in log message
sob_re=re.compile('^Signed-[Oo]ff-[Bb]y: (.+)$') sob_re=re.compile(b'^Signed-[Oo]ff-[Bb]y: (.+)$')
# insert 'checkpoint' command after this many commits or none at all if 0 # insert 'checkpoint' command after this many commits or none at all if 0
cfg_checkpoint_count=0 cfg_checkpoint_count=0
# write some progress message every this many file contents written # write some progress message every this many file contents written
@@ -35,30 +39,34 @@ submodule_mappings=None
# author/branch/tag names. # author/branch/tag names.
auto_sanitize = None auto_sanitize = None
stdout_buffer = sys.stdout if PY2 else sys.stdout.buffer
stderr_buffer = sys.stderr if PY2 else sys.stderr.buffer
def gitmode(flags): def gitmode(flags):
return 'l' in flags and '120000' or 'x' in flags and '100755' or '100644' return b'l' in flags and b'120000' or b'x' in flags and b'100755' or b'100644'
def wr_no_nl(msg=''): def wr_no_nl(msg=b''):
assert isinstance(msg, bytes)
if msg: if msg:
sys.stdout.write(msg) stdout_buffer.write(msg)
def wr(msg=''): def wr(msg=b''):
wr_no_nl(msg) wr_no_nl(msg)
sys.stdout.write('\n') stdout_buffer.write(b'\n')
#map(lambda x: sys.stderr.write('\t[%s]\n' % x),msg.split('\n')) #map(lambda x: sys.stderr.write('\t[%s]\n' % x),msg.split('\n'))
def checkpoint(count): def checkpoint(count):
count=count+1 count=count+1
if cfg_checkpoint_count>0 and count%cfg_checkpoint_count==0: if cfg_checkpoint_count>0 and count%cfg_checkpoint_count==0:
sys.stderr.write("Checkpoint after %d commits\n" % count) stderr_buffer.write(b"Checkpoint after %d commits\n" % count)
wr('checkpoint') wr(b'checkpoint')
wr() wr()
return count return count
def revnum_to_revref(rev, old_marks): def revnum_to_revref(rev, old_marks):
"""Convert an hg revnum to a git-fast-import rev reference (an SHA1 """Convert an hg revnum to a git-fast-import rev reference (an SHA1
or a mark)""" or a mark)"""
return old_marks.get(rev) or ':%d' % (rev+1) return old_marks.get(rev) or b':%d' % (rev+1)
def file_mismatch(f1,f2): def file_mismatch(f1,f2):
"""See if two revisions of a file are not equal.""" """See if two revisions of a file are not equal."""
@@ -87,7 +95,7 @@ def get_filechanges(repo,revision,parents,mleft):
l,c,r=[],[],[] l,c,r=[],[],[]
for p in parents: for p in parents:
if p<0: continue if p<0: continue
mright=revsymbol(repo,str(p)).manifest() mright=revsymbol(repo,b"%d" %p).manifest()
l,c,r=split_dict(mleft,mright,l,c,r) l,c,r=split_dict(mleft,mright,l,c,r)
l.sort() l.sort()
c.sort() c.sort()
@@ -110,7 +118,7 @@ def get_author(logmessage,committer,authors):
"Signed-off-by: foo" and thus matching our detection regex. Prevent "Signed-off-by: foo" and thus matching our detection regex. Prevent
that.""" that."""
loglines=logmessage.split('\n') loglines=logmessage.split(b'\n')
i=len(loglines) i=len(loglines)
# from tail walk to top skipping empty lines # from tail walk to top skipping empty lines
while i>=0: while i>=0:
@@ -138,23 +146,23 @@ def remove_gitmodules(ctx):
# be to only remove the submodules of the first parent. # be to only remove the submodules of the first parent.
for parent_ctx in ctx.parents(): for parent_ctx in ctx.parents():
for submodule in parent_ctx.substate.keys(): for submodule in parent_ctx.substate.keys():
wr('D %s' % submodule) wr(b'D %s' % submodule)
wr('D .gitmodules') wr(b'D .gitmodules')
def refresh_git_submodule(name,subrepo_info): def refresh_git_submodule(name,subrepo_info):
wr('M 160000 %s %s' % (subrepo_info[1],name)) wr(b'M 160000 %s %s' % (subrepo_info[1],name))
sys.stderr.write("Adding/updating submodule %s, revision %s\n" stderr_buffer.write(
% (name,subrepo_info[1])) b"Adding/updating submodule %s, revision %s\n" % (name, subrepo_info[1])
return '[submodule "%s"]\n\tpath = %s\n\turl = %s\n' % (name,name, )
subrepo_info[0]) return b'[submodule "%s"]\n\tpath = %s\n\turl = %s\n' % (name, name, subrepo_info[0])
def refresh_hg_submodule(name,subrepo_info): def refresh_hg_submodule(name,subrepo_info):
gitRepoLocation=submodule_mappings[name] + "/.git" gitRepoLocation=submodule_mappings[name] + b"/.git"
# Populate the cache to map mercurial revision to git revision # Populate the cache to map mercurial revision to git revision
if not name in subrepo_cache: if not name in subrepo_cache:
subrepo_cache[name]=(load_cache(gitRepoLocation+"/hg2git-mapping"), subrepo_cache[name]=(load_cache(gitRepoLocation+b"/hg2git-mapping"),
load_cache(gitRepoLocation+"/hg2git-marks", load_cache(gitRepoLocation+b"/hg2git-marks",
lambda s: int(s)-1)) lambda s: int(s)-1))
(mapping_cache,marks_cache)=subrepo_cache[name] (mapping_cache,marks_cache)=subrepo_cache[name]
@@ -162,30 +170,34 @@ def refresh_hg_submodule(name,subrepo_info):
if subrepo_hash in mapping_cache: if subrepo_hash in mapping_cache:
revnum=mapping_cache[subrepo_hash] revnum=mapping_cache[subrepo_hash]
gitSha=marks_cache[int(revnum)] gitSha=marks_cache[int(revnum)]
wr('M 160000 %s %s' % (gitSha,name)) wr(b'M 160000 %s %s' % (gitSha,name))
sys.stderr.write("Adding/updating submodule %s, revision %s->%s\n" stderr_buffer.write(
% (name,subrepo_hash,gitSha)) b"Adding/updating submodule %s, revision %s->%s\n"
return '[submodule "%s"]\n\tpath = %s\n\turl = %s\n' % (name,name, % (name, subrepo_hash, gitSha)
)
return b'[submodule "%s"]\n\tpath = %s\n\turl = %s\n' % (name,name,
submodule_mappings[name]) submodule_mappings[name])
else: else:
sys.stderr.write("Warning: Could not find hg revision %s for %s in git %s\n" % stderr_buffer.write(
(subrepo_hash,name,gitRepoLocation)) b"Warning: Could not find hg revision %s for %s in git %s\n"
return '' % (subrepo_hash, name, gitRepoLocation,)
)
return b''
def refresh_gitmodules(ctx): def refresh_gitmodules(ctx):
"""Updates list of ctx submodules according to .hgsubstate file""" """Updates list of ctx submodules according to .hgsubstate file"""
remove_gitmodules(ctx) remove_gitmodules(ctx)
gitmodules="" gitmodules=b""
# Create the .gitmodules file and all submodules # Create the .gitmodules file and all submodules
for name,subrepo_info in ctx.substate.items(): for name,subrepo_info in ctx.substate.items():
if subrepo_info[2]=='git': if subrepo_info[2]==b'git':
gitmodules+=refresh_git_submodule(name,subrepo_info) gitmodules+=refresh_git_submodule(name,subrepo_info)
elif submodule_mappings and name in submodule_mappings: elif submodule_mappings and name in submodule_mappings:
gitmodules+=refresh_hg_submodule(name,subrepo_info) gitmodules+=refresh_hg_submodule(name,subrepo_info)
if len(gitmodules): if len(gitmodules):
wr('M 100644 inline .gitmodules') wr(b'M 100644 inline .gitmodules')
wr('data %d' % (len(gitmodules)+1)) wr(b'data %d' % (len(gitmodules)+1))
wr(gitmodules) wr(gitmodules)
def export_file_contents(ctx,manifest,files,hgtags,encoding='',plugins={}): def export_file_contents(ctx,manifest,files,hgtags,encoding='',plugins={}):
@@ -193,19 +205,21 @@ def export_file_contents(ctx,manifest,files,hgtags,encoding='',plugins={}):
max=len(files) max=len(files)
is_submodules_refreshed=False is_submodules_refreshed=False
for file in files: for file in files:
if not is_submodules_refreshed and (file=='.hgsub' or file=='.hgsubstate'): if not is_submodules_refreshed and (file==b'.hgsub' or file==b'.hgsubstate'):
is_submodules_refreshed=True is_submodules_refreshed=True
refresh_gitmodules(ctx) refresh_gitmodules(ctx)
# Skip .hgtags files. They only get us in trouble. # Skip .hgtags files. They only get us in trouble.
if not hgtags and file == ".hgtags": if not hgtags and file == b".hgtags":
sys.stderr.write('Skip %s\n' % (file)) stderr_buffer.write(b'Skip %s\n' % file)
continue continue
if encoding: if encoding:
filename=file.decode(encoding).encode('utf8') filename=file.decode(encoding).encode('utf8')
else: else:
filename=file filename=file
if '.git' in filename.split(os.path.sep): if b'.git' in filename.split(b'/'): # Even on Windows, the path separator is / here.
sys.stderr.write('Ignoring file %s which cannot be tracked by git\n' % filename) stderr_buffer.write(
b'Ignoring file %s which cannot be tracked by git\n' % filename
)
continue continue
file_ctx=ctx.filectx(file) file_ctx=ctx.filectx(file)
d=file_ctx.data() d=file_ctx.data()
@@ -218,15 +232,15 @@ def export_file_contents(ctx,manifest,files,hgtags,encoding='',plugins={}):
filename=file_data['filename'] filename=file_data['filename']
file_ctx=file_data['file_ctx'] file_ctx=file_data['file_ctx']
wr('M %s inline %s' % (gitmode(manifest.flags(file)), wr(b'M %s inline %s' % (gitmode(manifest.flags(file)),
strip_leading_slash(filename))) strip_leading_slash(filename)))
wr('data %d' % len(d)) # had some trouble with size() wr(b'data %d' % len(d)) # had some trouble with size()
wr(d) wr(d)
count+=1 count+=1
if count%cfg_export_boundary==0: if count%cfg_export_boundary==0:
sys.stderr.write('Exported %d/%d files\n' % (count,max)) stderr_buffer.write(b'Exported %d/%d files\n' % (count,max))
if max>cfg_export_boundary: if max>cfg_export_boundary:
sys.stderr.write('Exported %d/%d files\n' % (count,max)) stderr_buffer.write(b'Exported %d/%d files\n' % (count,max))
def sanitize_name(name,what="branch", mapping={}): def sanitize_name(name,what="branch", mapping={}):
"""Sanitize input roughly according to git-check-ref-format(1)""" """Sanitize input roughly according to git-check-ref-format(1)"""
@@ -246,25 +260,27 @@ def sanitize_name(name,what="branch", mapping={}):
def dot(name): def dot(name):
if not name: return name if not name: return name
if name[0] == '.': return '_'+name[1:] if name[0:1] == b'.': return b'_'+name[1:]
return name return name
if not auto_sanitize: if not auto_sanitize:
return mapping.get(name,name) return mapping.get(name,name)
n=mapping.get(name,name) n=mapping.get(name,name)
p=re.compile('([[ ~^:?\\\\*]|\.\.)') p=re.compile(b'([[ ~^:?\\\\*]|\.\.)')
n=p.sub('_', n) n=p.sub(b'_', n)
if n[-1] in ('/', '.'): n=n[:-1]+'_' if n[-1:] in (b'/', b'.'): n=n[:-1]+b'_'
n='/'.join(map(dot,n.split('/'))) n=b'/'.join([dot(s) for s in n.split(b'/')])
p=re.compile('_+') p=re.compile(b'_+')
n=p.sub('_', n) n=p.sub(b'_', n)
if n!=name: if n!=name:
sys.stderr.write('Warning: sanitized %s [%s] to [%s]\n' % (what,name,n)) stderr_buffer.write(
b'Warning: sanitized %s [%s] to [%s]\n' % (what.encode(), name, n)
)
return n return n
def strip_leading_slash(filename): def strip_leading_slash(filename):
if filename[0] == '/': if filename[0:1] == b'/':
return filename[1:] return filename[1:]
return filename return filename
@@ -272,7 +288,7 @@ def export_commit(ui,repo,revision,old_marks,max,count,authors,
branchesmap,sob,brmap,hgtags,encoding='',fn_encoding='', branchesmap,sob,brmap,hgtags,encoding='',fn_encoding='',
plugins={}): plugins={}):
def get_branchname(name): def get_branchname(name):
if brmap.has_key(name): if name in brmap:
return brmap[name] return brmap[name]
n=sanitize_name(name, "branch", branchesmap) n=sanitize_name(name, "branch", branchesmap)
brmap[name]=n brmap[name]=n
@@ -286,29 +302,34 @@ def export_commit(ui,repo,revision,old_marks,max,count,authors,
parents = [p for p in repo.changelog.parentrevs(revision) if p >= 0] parents = [p for p in repo.changelog.parentrevs(revision) if p >= 0]
author = get_author(desc,user,authors) author = get_author(desc,user,authors)
hg_hash=revsymbol(repo,b"%d" % revision).hex()
if plugins and plugins['commit_message_filters']: if plugins and plugins['commit_message_filters']:
commit_data = {'branch': branch, 'parents': parents, 'author': author, 'desc': desc} commit_data = {'branch': branch, 'parents': parents,
'author': author, 'desc': desc,
'revision': revision, 'hg_hash': hg_hash,
'committer': user}
for filter in plugins['commit_message_filters']: for filter in plugins['commit_message_filters']:
filter(commit_data) filter(commit_data)
branch = commit_data['branch'] branch = commit_data['branch']
parents = commit_data['parents'] parents = commit_data['parents']
author = commit_data['author'] author = commit_data['author']
user = commit_data['committer']
desc = commit_data['desc'] desc = commit_data['desc']
if len(parents)==0 and revision != 0: if len(parents)==0 and revision != 0:
wr('reset refs/heads/%s' % branch) wr(b'reset refs/heads/%s' % branch)
wr('commit refs/heads/%s' % branch) wr(b'commit refs/heads/%s' % branch)
wr('mark :%d' % (revision+1)) wr(b'mark :%d' % (revision+1))
if sob: if sob:
wr('author %s %d %s' % (author,time,timezone)) wr(b'author %s %d %s' % (author,time,timezone))
wr('committer %s %d %s' % (user,time,timezone)) wr(b'committer %s %d %s' % (user,time,timezone))
wr('data %d' % (len(desc)+1)) # wtf? wr(b'data %d' % (len(desc)+1)) # wtf?
wr(desc) wr(desc)
wr() wr()
ctx=revsymbol(repo,str(revision)) ctx=revsymbol(repo, b"%d" % revision)
man=ctx.manifest() man=ctx.manifest()
added,changed,removed,type=[],[],[],'' added,changed,removed,type=[],[],[],''
@@ -318,7 +339,7 @@ def export_commit(ui,repo,revision,old_marks,max,count,authors,
added.sort() added.sort()
type='full' type='full'
else: else:
wr('from %s' % revnum_to_revref(parents[0], old_marks)) wr(b'from %s' % revnum_to_revref(parents[0], old_marks))
if len(parents) == 1: if len(parents) == 1:
# later non-merge revision: feed in changed manifest # later non-merge revision: feed in changed manifest
# if we have exactly one parent, just take the changes from the # if we have exactly one parent, just take the changes from the
@@ -327,23 +348,25 @@ def export_commit(ui,repo,revision,old_marks,max,count,authors,
added,changed,removed=f.added,f.modified,f.removed added,changed,removed=f.added,f.modified,f.removed
type='simple delta' type='simple delta'
else: # a merge with two parents else: # a merge with two parents
wr('merge %s' % revnum_to_revref(parents[1], old_marks)) wr(b'merge %s' % revnum_to_revref(parents[1], old_marks))
# later merge revision: feed in changed manifest # later merge revision: feed in changed manifest
# for many files comparing checksums is expensive so only do it for # for many files comparing checksums is expensive so only do it for
# merges where we really need it due to hg's revlog logic # merges where we really need it due to hg's revlog logic
added,changed,removed=get_filechanges(repo,revision,parents,man) added,changed,removed=get_filechanges(repo,revision,parents,man)
type='thorough delta' type='thorough delta'
sys.stderr.write('%s: Exporting %s revision %d/%d with %d/%d/%d added/changed/removed files\n' % stderr_buffer.write(
(branch,type,revision+1,max,len(added),len(changed),len(removed))) b'%s: Exporting %s revision %d/%d with %d/%d/%d added/changed/removed files\n'
% (branch, type.encode(), revision + 1, max, len(added), len(changed), len(removed))
)
for filename in removed: for filename in removed:
if fn_encoding: if fn_encoding:
filename=filename.decode(fn_encoding).encode('utf8') filename=filename.decode(fn_encoding).encode('utf8')
filename=strip_leading_slash(filename) filename=strip_leading_slash(filename)
if filename=='.hgsub': if filename==b'.hgsub':
remove_gitmodules(ctx) remove_gitmodules(ctx)
wr('D %s' % filename) wr(b'D %s' % filename)
export_file_contents(ctx,man,added,hgtags,fn_encoding,plugins) export_file_contents(ctx,man,added,hgtags,fn_encoding,plugins)
export_file_contents(ctx,man,changed,hgtags,fn_encoding,plugins) export_file_contents(ctx,man,changed,hgtags,fn_encoding,plugins)
@@ -358,52 +381,49 @@ def export_note(ui,repo,revision,count,authors,encoding,is_first):
parents = [p for p in repo.changelog.parentrevs(revision) if p >= 0] parents = [p for p in repo.changelog.parentrevs(revision) if p >= 0]
wr('commit refs/notes/hg') wr(b'commit refs/notes/hg')
wr('committer %s %d %s' % (user,time,timezone)) wr(b'committer %s %d %s' % (user,time,timezone))
wr('data 0') wr(b'data 0')
if is_first: if is_first:
wr('from refs/notes/hg^0') wr(b'from refs/notes/hg^0')
wr('N inline :%d' % (revision+1)) wr(b'N inline :%d' % (revision+1))
hg_hash=revsymbol(repo,str(revision)).hex() hg_hash=revsymbol(repo,b"%d" % revision).hex()
wr('data %d' % (len(hg_hash))) wr(b'data %d' % (len(hg_hash)))
wr_no_nl(hg_hash) wr_no_nl(hg_hash)
wr() wr()
return checkpoint(count) return checkpoint(count)
wr('data %d' % (len(desc)+1)) # wtf?
wr(desc)
wr()
def export_tags(ui,repo,old_marks,mapping_cache,count,authors,tagsmap): def export_tags(ui,repo,old_marks,mapping_cache,count,authors,tagsmap):
l=repo.tagslist() l=repo.tagslist()
for tag,node in l: for tag,node in l:
# Remap the branch name # Remap the branch name
tag=sanitize_name(tag,"tag",tagsmap) tag=sanitize_name(tag,"tag",tagsmap)
# ignore latest revision # ignore latest revision
if tag=='tip': continue if tag==b'tip': continue
# ignore tags to nodes that are missing (ie, 'in the future') # ignore tags to nodes that are missing (ie, 'in the future')
if node.encode('hex_codec') not in mapping_cache: if hexlify(node) not in mapping_cache:
sys.stderr.write('Tag %s refers to unseen node %s\n' % (tag, node.encode('hex_codec'))) stderr_buffer.write(b'Tag %s refers to unseen node %s\n' % (tag, hexlify(node)))
continue continue
rev=int(mapping_cache[node.encode('hex_codec')]) rev=int(mapping_cache[hexlify(node)])
ref=revnum_to_revref(rev, old_marks) ref=revnum_to_revref(rev, old_marks)
if ref==None: if ref==None:
sys.stderr.write('Failed to find reference for creating tag' stderr_buffer.write(
' %s at r%d\n' % (tag,rev)) b'Failed to find reference for creating tag %s at r%d\n' % (tag, rev)
)
continue continue
sys.stderr.write('Exporting tag [%s] at [hg r%d] [git %s]\n' % (tag,rev,ref)) stderr_buffer.write(b'Exporting tag [%s] at [hg r%d] [git %s]\n' % (tag, rev, ref))
wr('reset refs/tags/%s' % tag) wr(b'reset refs/tags/%s' % tag)
wr('from %s' % ref) wr(b'from %s' % ref)
wr() wr()
count=checkpoint(count) count=checkpoint(count)
return count return count
def load_mapping(name, filename, mapping_is_raw): def load_mapping(name, filename, mapping_is_raw):
raw_regexp=re.compile('^([^=]+)[ ]*=[ ]*(.+)$') raw_regexp=re.compile(b'^([^=]+)[ ]*=[ ]*(.+)$')
string_regexp='"(((\\.)|(\\")|[^"])*)"' string_regexp=b'"(((\\.)|(\\")|[^"])*)"'
quoted_regexp=re.compile('^'+string_regexp+'[ ]*=[ ]*'+string_regexp+'$') quoted_regexp=re.compile(b'^'+string_regexp+b'[ ]*=[ ]*'+string_regexp+b'$')
def parse_raw_line(line): def parse_raw_line(line):
m=raw_regexp.match(line) m=raw_regexp.match(line)
@@ -411,26 +431,34 @@ def load_mapping(name, filename, mapping_is_raw):
return None return None
return (m.group(1).strip(), m.group(2).strip()) return (m.group(1).strip(), m.group(2).strip())
def process_unicode_escape_sequences(s):
# Replace unicode escape sequences in the otherwise UTF8-encoded bytestring s with
# the UTF8-encoded characters they represent. We need to do an additional
# .decode('utf8').encode('unicode-escape') to convert any non-ascii characters into
# their escape sequences so that the subsequent .decode('unicode-escape') succeeds:
return s.decode('utf8').encode('unicode-escape').decode('unicode-escape').encode('utf8')
def parse_quoted_line(line): def parse_quoted_line(line):
m=quoted_regexp.match(line) m=quoted_regexp.match(line)
if m==None: if m==None:
return None return
return (m.group(1).decode('string_escape'),
m.group(5).decode('string_escape')) return (process_unicode_escape_sequences(m.group(1)),
process_unicode_escape_sequences(m.group(5)))
cache={} cache={}
if not os.path.exists(filename): if not os.path.exists(filename):
sys.stderr.write('Could not open mapping file [%s]\n' % (filename)) sys.stderr.write('Could not open mapping file [%s]\n' % (filename))
return cache return cache
f=open(filename,'r') f=open(filename,'rb')
l=0 l=0
a=0 a=0
for line in f.readlines(): for line in f.readlines():
l+=1 l+=1
line=line.strip() line=line.strip()
if l==1 and line[0]=='#' and line=='# quoted-escaped-strings': if l==1 and line[0:1]==b'#' and line==b'# quoted-escaped-strings':
continue continue
elif line=='' or line[0]=='#': elif line==b'' or line[0:1]==b'#':
continue continue
m=parse_raw_line(line) if mapping_is_raw else parse_quoted_line(line) m=parse_raw_line(line) if mapping_is_raw else parse_quoted_line(line)
if m==None: if m==None:
@@ -452,7 +480,7 @@ def branchtip(repo, heads):
break break
return tip return tip
def verify_heads(ui,repo,cache,force,branchesmap): def verify_heads(ui,repo,cache,force,ignore_unnamed_heads,branchesmap):
branches={} branches={}
for bn, heads in repo.branchmap().iteritems(): for bn, heads in repo.branchmap().iteritems():
branches[bn] = branchtip(repo, heads) branches[bn] = branchtip(repo, heads)
@@ -466,25 +494,31 @@ def verify_heads(ui,repo,cache,force,branchesmap):
sha1=get_git_sha1(sanitized_name) sha1=get_git_sha1(sanitized_name)
c=cache.get(sanitized_name) c=cache.get(sanitized_name)
if sha1!=c: if sha1!=c:
sys.stderr.write('Error: Branch [%s] modified outside hg-fast-export:' stderr_buffer.write(
'\n%s (repo) != %s (cache)\n' % (b,sha1,c)) b'Error: Branch [%s] modified outside hg-fast-export:'
b'\n%s (repo) != %s (cache)\n' % (b, b'<None>' if sha1 is None else sha1, c)
)
if not force: return False if not force: return False
# verify that branch has exactly one head # verify that branch has exactly one head
t={} t={}
for h in repo.filtered('visible').heads(): unnamed_heads=False
for h in repo.filtered(b'visible').heads():
(_,_,_,_,_,_,branch,_)=get_changeset(ui,repo,h) (_,_,_,_,_,_,branch,_)=get_changeset(ui,repo,h)
if t.get(branch,False): if t.get(branch,False):
sys.stderr.write('Error: repository has at least one unnamed head: hg r%s\n' % stderr_buffer.write(
repo.changelog.rev(h)) b'Error: repository has an unnamed head: hg r%d\n'
if not force: return False % repo.changelog.rev(h)
)
unnamed_heads=True
if not force and not ignore_unnamed_heads: return False
t[branch]=True t[branch]=True
if unnamed_heads and not force and not ignore_unnamed_heads: return False
return True return True
def hg2git(repourl,m,marksfile,mappingfile,headsfile,tipfile, def hg2git(repourl,m,marksfile,mappingfile,headsfile,tipfile,
authors={},branchesmap={},tagsmap={}, authors={},branchesmap={},tagsmap={},
sob=False,force=False,hgtags=False,notes=False,encoding='',fn_encoding='', sob=False,force=False,ignore_unnamed_heads=False,hgtags=False,notes=False,encoding='',fn_encoding='',
plugins={}): plugins={}):
def check_cache(filename, contents): def check_cache(filename, contents):
if len(contents) == 0: if len(contents) == 0:
@@ -505,7 +539,7 @@ def hg2git(repourl,m,marksfile,mappingfile,headsfile,tipfile,
ui,repo=setup_repo(repourl) ui,repo=setup_repo(repourl)
if not verify_heads(ui,repo,heads_cache,force,branchesmap): if not verify_heads(ui,repo,heads_cache,force,ignore_unnamed_heads,branchesmap):
return 1 return 1
try: try:
@@ -519,20 +553,20 @@ def hg2git(repourl,m,marksfile,mappingfile,headsfile,tipfile,
max=tip max=tip
for rev in range(0,max): for rev in range(0,max):
(revnode,_,_,_,_,_,_,_)=get_changeset(ui,repo,rev,authors) (revnode,_,_,_,_,_,_,_)=get_changeset(ui,repo,rev,authors)
if repo[revnode].hidden(): if repo[revnode].hidden():
continue continue
mapping_cache[revnode.encode('hex_codec')] = str(rev) mapping_cache[hexlify(revnode)] = b"%d" % rev
if submodule_mappings: if submodule_mappings:
# Make sure that all submodules are registered in the submodule-mappings file # Make sure that all mercurial submodules are registered in the submodule-mappings file
for rev in range(0,max): for rev in range(0,max):
ctx=revsymbol(repo,str(rev)) ctx=revsymbol(repo,b"%d" % rev)
if ctx.hidden(): if ctx.hidden():
continue continue
if ctx.substate: if ctx.substate:
for key in ctx.substate: for key in ctx.substate:
if key not in submodule_mappings: if ctx.substate[key][2]=='hg' and key not in submodule_mappings:
sys.stderr.write("Error: %s not found in submodule-mappings\n" % (key)) sys.stderr.write("Error: %s not found in submodule-mappings\n" % (key))
return 1 return 1
@@ -591,7 +625,9 @@ if __name__=='__main__':
parser.add_option("-T","--tags",dest="tagsfile", parser.add_option("-T","--tags",dest="tagsfile",
help="Read tags map from TAGSFILE") help="Read tags map from TAGSFILE")
parser.add_option("-f","--force",action="store_true",dest="force", parser.add_option("-f","--force",action="store_true",dest="force",
default=False,help="Ignore validation errors by force") default=False,help="Ignore validation errors by force, implies --ignore-unnamed-heads")
parser.add_option("--ignore-unnamed-heads",action="store_true",dest="ignore_unnamed_heads",
default=False,help="Ignore unnamed head errors")
parser.add_option("-M","--default-branch",dest="default_branch", parser.add_option("-M","--default-branch",dest="default_branch",
help="Set the default branch") help="Set the default branch")
parser.add_option("-o","--origin",dest="origin_name", parser.add_option("-o","--origin",dest="origin_name",
@@ -687,6 +723,8 @@ if __name__=='__main__':
sys.exit(hg2git(options.repourl,m,options.marksfile,options.mappingfile, sys.exit(hg2git(options.repourl,m,options.marksfile,options.mappingfile,
options.headsfile, options.statusfile, options.headsfile, options.statusfile,
authors=a,branchesmap=b,tagsmap=t, authors=a,branchesmap=b,tagsmap=t,
sob=options.sob,force=options.force,hgtags=options.hgtags, sob=options.sob,force=options.force,
ignore_unnamed_heads=options.ignore_unnamed_heads,
hgtags=options.hgtags,
notes=options.notes,encoding=encoding,fn_encoding=fn_encoding, notes=options.notes,encoding=encoding,fn_encoding=fn_encoding,
plugins=plugins_dict)) plugins=plugins_dict))

View File

@@ -28,29 +28,24 @@ SFX_STATE="state"
GFI_OPTS="" GFI_OPTS=""
if [ -z "${PYTHON}" ]; then if [ -z "${PYTHON}" ]; then
# $PYTHON is not set, so we try to find a working python 2.7 to # $PYTHON is not set, so we try to find a working python with mercurial:
# use. PEP 394 tells us to use 'python2', otherwise try plain for python_cmd in python2 python python3; do
# 'python'. if command -v $python_cmd > /dev/null; then
if command -v python2 > /dev/null; then $python_cmd -c 'from mercurial.scmutil import revsymbol' 2> /dev/null
PYTHON="python2" if [ $? -eq 0 ]; then
elif command -v python > /dev/null; then PYTHON=$python_cmd
PYTHON="python" break
else fi
echo "Could not find any python interpreter, please use the 'PYTHON'" \ fi
"environment variable to specify the interpreter to use." done
exit 1
fi
fi fi
if [ -z "${PYTHON}" ]; then
# Check that the python specified by the user or autodetected above is echo "Could not find a python interpreter with the mercurial module >= 4.6 available. " \
# >= 2.7 and < 3. "Please use the 'PYTHON' environment variable to specify the interpreter to use."
if ! ${PYTHON} -c 'import sys; v=sys.version_info; exit(0 if v.major == 2 and v.minor >= 7 else 1)' > /dev/null 2>&1 ; then
echo "${PYTHON} is not a working python 2.7 interpreter, please use the" \
"'PYTHON' environment variable to specify the interpreter to use."
exit 1 exit 1
fi fi
USAGE="[--quiet] [-r <repo>] [--force] [-m <max>] [-s] [--hgtags] [-A <file>] [-B <file>] [-T <file>] [-M <name>] [-o <name>] [--hg-hash] [-e <encoding>]" USAGE="[--quiet] [-r <repo>] [--force] [--ignore-unnamed-heads] [-m <max>] [-s] [--hgtags] [-A <file>] [-B <file>] [-T <file>] [-M <name>] [-o <name>] [--hg-hash] [-e <encoding>]"
LONG_USAGE="Import hg repository <repo> up to either tip or <max> LONG_USAGE="Import hg repository <repo> up to either tip or <max>
If <repo> is omitted, use last hg repository as obtained from state file, If <repo> is omitted, use last hg repository as obtained from state file,
GIT_DIR/$PFX-$SFX_STATE by default. GIT_DIR/$PFX-$SFX_STATE by default.

View File

@@ -7,6 +7,7 @@ from mercurial import node
from hg2git import setup_repo,load_cache,get_changeset,get_git_sha1 from hg2git import setup_repo,load_cache,get_changeset,get_git_sha1
from optparse import OptionParser from optparse import OptionParser
import sys import sys
from binascii import hexlify
def heads(ui,repo,start=None,stop=None,max=None): def heads(ui,repo,start=None,stop=None,max=None):
# this is copied from mercurial/revlog.py and differs only in # this is copied from mercurial/revlog.py and differs only in
@@ -24,7 +25,7 @@ def heads(ui,repo,start=None,stop=None,max=None):
heads = {startrev: 1} heads = {startrev: 1}
parentrevs = repo.changelog.parentrevs parentrevs = repo.changelog.parentrevs
for r in xrange(startrev + 1, max): for r in range(startrev + 1, max):
for p in parentrevs(r): for p in parentrevs(r):
if p in reachable: if p in reachable:
if r not in stoprevs: if r not in stoprevs:
@@ -33,7 +34,7 @@ def heads(ui,repo,start=None,stop=None,max=None):
if p in heads and p not in stoprevs: if p in heads and p not in stoprevs:
del heads[p] del heads[p]
return [(repo.changelog.node(r),str(r)) for r in heads] return [(repo.changelog.node(r), b"%d" % r) for r in heads]
def get_branches(ui,repo,heads_cache,marks_cache,mapping_cache,max): def get_branches(ui,repo,heads_cache,marks_cache,mapping_cache,max):
h=heads(ui,repo,max=max) h=heads(ui,repo,max=max)
@@ -44,11 +45,11 @@ def get_branches(ui,repo,heads_cache,marks_cache,mapping_cache,max):
_,_,user,(_,_),_,desc,branch,_=get_changeset(ui,repo,rev) _,_,user,(_,_),_,desc,branch,_=get_changeset(ui,repo,rev)
del stale[branch] del stale[branch]
git_sha1=get_git_sha1(branch) git_sha1=get_git_sha1(branch)
cache_sha1=marks_cache.get(str(int(rev)+1)) cache_sha1=marks_cache.get(b"%d" % (int(rev)+1))
if git_sha1!=None and git_sha1==cache_sha1: if git_sha1!=None and git_sha1==cache_sha1:
unchanged.append([branch,cache_sha1,rev,desc.split('\n')[0],user]) unchanged.append([branch,cache_sha1,rev,desc.split(b'\n')[0],user])
else: else:
changed.append([branch,cache_sha1,rev,desc.split('\n')[0],user]) changed.append([branch,cache_sha1,rev,desc.split(b'\n')[0],user])
changed.sort() changed.sort()
unchanged.sort() unchanged.sort()
return stale,changed,unchanged return stale,changed,unchanged
@@ -57,20 +58,20 @@ def get_tags(ui,repo,marks_cache,mapping_cache,max):
l=repo.tagslist() l=repo.tagslist()
good,bad=[],[] good,bad=[],[]
for tag,node in l: for tag,node in l:
if tag=='tip': continue if tag==b'tip': continue
rev=int(mapping_cache[node.encode('hex_codec')]) rev=int(mapping_cache[hexlify(node)])
cache_sha1=marks_cache.get(str(int(rev)+1)) cache_sha1=marks_cache.get(b"%d" % (int(rev)+1))
_,_,user,(_,_),_,desc,branch,_=get_changeset(ui,repo,rev) _,_,user,(_,_),_,desc,branch,_=get_changeset(ui,repo,rev)
if int(rev)>int(max): if int(rev)>int(max):
bad.append([tag,branch,cache_sha1,rev,desc.split('\n')[0],user]) bad.append([tag,branch,cache_sha1,rev,desc.split(b'\n')[0],user])
else: else:
good.append([tag,branch,cache_sha1,rev,desc.split('\n')[0],user]) good.append([tag,branch,cache_sha1,rev,desc.split(b'\n')[0],user])
good.sort() good.sort()
bad.sort() bad.sort()
return good,bad return good,bad
def mangle_mark(mark): def mangle_mark(mark):
return str(int(mark)-1) return b"%d" % (int(mark)-1)
if __name__=='__main__': if __name__=='__main__':
def bail(parser,opt): def bail(parser,opt):
@@ -107,7 +108,7 @@ if __name__=='__main__':
state_cache=load_cache(options.statusfile) state_cache=load_cache(options.statusfile)
mapping_cache = load_cache(options.mappingfile) mapping_cache = load_cache(options.mappingfile)
l=int(state_cache.get('tip',options.revision)) l=int(state_cache.get(b'tip',options.revision))
if options.revision+1>l: if options.revision+1>l:
sys.stderr.write('Revision is beyond last revision imported: %d>%d\n' % (options.revision,l)) sys.stderr.write('Revision is beyond last revision imported: %d>%d\n' % (options.revision,l))
sys.exit(1) sys.exit(1)
@@ -117,19 +118,39 @@ if __name__=='__main__':
stale,changed,unchanged=get_branches(ui,repo,heads_cache,marks_cache,mapping_cache,options.revision+1) stale,changed,unchanged=get_branches(ui,repo,heads_cache,marks_cache,mapping_cache,options.revision+1)
good,bad=get_tags(ui,repo,marks_cache,mapping_cache,options.revision+1) good,bad=get_tags(ui,repo,marks_cache,mapping_cache,options.revision+1)
print "Possibly stale branches:" print("Possibly stale branches:")
map(lambda b: sys.stdout.write('\t%s\n' % b),stale.keys()) for b in stale:
sys.stdout.write('\t%s\n' % b.decode('utf8'))
print "Possibly stale tags:" print("Possibly stale tags:")
map(lambda b: sys.stdout.write('\t%s on %s (r%s)\n' % (b[0],b[1],b[3])),bad) for b in bad:
sys.stdout.write(
'\t%s on %s (r%s)\n'
% (b[0].decode('utf8'), b[1].decode('utf8'), b[3].decode('utf8'))
)
print "Unchanged branches:" print("Unchanged branches:")
map(lambda b: sys.stdout.write('\t%s (r%s)\n' % (b[0],b[2])),unchanged) for b in unchanged:
sys.stdout.write('\t%s (r%s)\n' % (b[0].decode('utf8'),b[2].decode('utf8')))
print "Unchanged tags:" print("Unchanged tags:")
map(lambda b: sys.stdout.write('\t%s on %s (r%s)\n' % (b[0],b[1],b[3])),good) for b in good:
sys.stdout.write(
'\t%s on %s (r%s)\n'
% (b[0].decode('utf8'), b[1].decode('utf8'), b[3].decode('utf8'))
)
print "Reset branches in '%s' to:" % options.headsfile print("Reset branches in '%s' to:" % options.headsfile)
map(lambda b: sys.stdout.write('\t:%s %s\n\t\t(r%s: %s: %s)\n' % (b[0],b[1],b[2],b[4],b[3])),changed) for b in changed:
sys.stdout.write(
'\t:%s %s\n\t\t(r%s: %s: %s)\n'
% (
b[0].decode('utf8'),
b[1].decode('utf8'),
b[2].decode('utf8'),
b[4].decode('utf8'),
b[3].decode('utf8'),
)
)
print "Reset ':tip' in '%s' to '%d'" % (options.statusfile,options.revision) print("Reset ':tip' in '%s' to '%d'" % (options.statusfile,options.revision))

View File

@@ -11,7 +11,24 @@ SFX_MAPPING="mapping"
SFX_HEADS="heads" SFX_HEADS="heads"
SFX_STATE="state" SFX_STATE="state"
QUIET="" QUIET=""
PYTHON=${PYTHON:-python}
if [ -z "${PYTHON}" ]; then
# $PYTHON is not set, so we try to find a working python with mercurial:
for python_cmd in python2 python python3; do
if command -v $python_cmd > /dev/null; then
$python_cmd -c 'import mercurial' 2> /dev/null
if [ $? -eq 0 ]; then
PYTHON=$python_cmd
break
fi
fi
done
fi
if [ -z "${PYTHON}" ]; then
echo "Could not find a python interpreter with the mercurial module available. " \
"Please use the 'PYTHON'environment variable to specify the interpreter to use."
exit 1
fi
USAGE="[-r <repo>] -R <rev>" USAGE="[-r <repo>] -R <rev>"
LONG_USAGE="Print SHA1s of latest changes per branch up to <rev> useful LONG_USAGE="Print SHA1s of latest changes per branch up to <rev> useful

View File

@@ -12,18 +12,25 @@ import os
import sys import sys
import subprocess import subprocess
PY2 = sys.version_info.major < 3
if PY2:
str = unicode
fsencode = lambda s: s.encode(sys.getfilesystemencoding())
else:
from os import fsencode
# default git branch name # default git branch name
cfg_master='master' cfg_master=b'master'
# default origin name # default origin name
origin_name='' origin_name=b''
# silly regex to see if user field has email address # silly regex to see if user field has email address
user_re=re.compile('([^<]+) (<[^>]*>)$') user_re=re.compile(b'([^<]+) (<[^>]*>)$')
# silly regex to clean out user names # silly regex to clean out user names
user_clean_re=re.compile('^["]([^"]+)["]$') user_clean_re=re.compile(b'^["]([^"]+)["]$')
def set_default_branch(name): def set_default_branch(name):
global cfg_master global cfg_master
cfg_master = name cfg_master = name.encode('utf8') if not isinstance(name, bytes) else name
def set_origin_name(name): def set_origin_name(name):
global origin_name global origin_name
@@ -34,26 +41,26 @@ def setup_repo(url):
myui=ui.ui(interactive=False) myui=ui.ui(interactive=False)
except TypeError: except TypeError:
myui=ui.ui() myui=ui.ui()
myui.setconfig('ui', 'interactive', 'off') myui.setconfig(b'ui', b'interactive', b'off')
# Avoids a warning when the repository has obsolete markers # Avoids a warning when the repository has obsolete markers
myui.setconfig('experimental', 'evolution.createmarkers', True) myui.setconfig(b'experimental', b'evolution.createmarkers', True)
return myui,hg.repository(myui,url).unfiltered() return myui,hg.repository(myui, fsencode(url)).unfiltered()
def fixup_user(user,authors): def fixup_user(user,authors):
user=user.strip("\"") user=user.strip(b"\"")
if authors!=None: if authors!=None:
# if we have an authors table, try to get mapping # if we have an authors table, try to get mapping
# by defaulting to the current value of 'user' # by defaulting to the current value of 'user'
user=authors.get(user,user) user=authors.get(user,user)
name,mail,m='','',user_re.match(user) name,mail,m=b'',b'',user_re.match(user)
if m==None: if m==None:
# if we don't have 'Name <mail>' syntax, extract name # if we don't have 'Name <mail>' syntax, extract name
# and mail from hg helpers. this seems to work pretty well. # and mail from hg helpers. this seems to work pretty well.
# if email doesn't contain @, replace it with devnull@localhost # if email doesn't contain @, replace it with devnull@localhost
name=templatefilters.person(user) name=templatefilters.person(user)
mail='<%s>' % templatefilters.email(user) mail=b'<%s>' % templatefilters.email(user)
if '@' not in mail: if b'@' not in mail:
mail = '<devnull@localhost>' mail = b'<devnull@localhost>'
else: else:
# if we have 'Name <mail>' syntax, everything is fine :) # if we have 'Name <mail>' syntax, everything is fine :)
name,mail=m.group(1),m.group(2) name,mail=m.group(1),m.group(2)
@@ -62,15 +69,15 @@ def fixup_user(user,authors):
m2=user_clean_re.match(name) m2=user_clean_re.match(name)
if m2!=None: if m2!=None:
name=m2.group(1) name=m2.group(1)
return '%s %s' % (name,mail) return b'%s %s' % (name,mail)
def get_branch(name): def get_branch(name):
# 'HEAD' is the result of a bug in mutt's cvs->hg conversion, # 'HEAD' is the result of a bug in mutt's cvs->hg conversion,
# other CVS imports may need it, too # other CVS imports may need it, too
if name=='HEAD' or name=='default' or name=='': if name==b'HEAD' or name==b'default' or name==b'':
name=cfg_master name=cfg_master
if origin_name: if origin_name:
return origin_name + '/' + name return origin_name + b'/' + name
return name return name
def get_changeset(ui,repo,revision,authors={},encoding=''): def get_changeset(ui,repo,revision,authors={},encoding=''):
@@ -79,16 +86,16 @@ def get_changeset(ui,repo,revision,authors={},encoding=''):
# how it fails # how it fails
try: try:
node=repo.lookup(revision) node=repo.lookup(revision)
except hgerror.ProgrammingError: except (TypeError, hgerror.ProgrammingError):
node=binnode(revsymbol(repo,str(revision))) # We were given a numeric rev node=binnode(revsymbol(repo, b"%d" % revision)) # We were given a numeric rev
except hgerror.RepoLookupError: except hgerror.RepoLookupError:
node=revision # We got a raw hash node=revision # We got a raw hash
(manifest,user,(time,timezone),files,desc,extra)=repo.changelog.read(node) (manifest,user,(time,timezone),files,desc,extra)=repo.changelog.read(node)
if encoding: if encoding:
user=user.decode(encoding).encode('utf8') user=user.decode(encoding).encode('utf8')
desc=desc.decode(encoding).encode('utf8') desc=desc.decode(encoding).encode('utf8')
tz="%+03d%02d" % (-timezone / 3600, ((-timezone % 3600) / 60)) tz=b"%+03d%02d" % (-timezone // 3600, ((-timezone % 3600) // 60))
branch=get_branch(extra.get('branch','master')) branch=get_branch(extra.get(b'branch', b'master'))
return (node,manifest,fixup_user(user,authors),(time,tz),files,desc,branch,extra) return (node,manifest,fixup_user(user,authors),(time,tz),files,desc,branch,extra)
def mangle_key(key): def mangle_key(key):
@@ -98,29 +105,35 @@ def load_cache(filename,get_key=mangle_key):
cache={} cache={}
if not os.path.exists(filename): if not os.path.exists(filename):
return cache return cache
f=open(filename,'r') f=open(filename,'rb')
l=0 l=0
for line in f.readlines(): for line in f.readlines():
l+=1 l+=1
fields=line.split(' ') fields=line.split(b' ')
if fields==None or not len(fields)==2 or fields[0][0]!=':': if fields==None or not len(fields)==2 or fields[0][0:1]!=b':':
sys.stderr.write('Invalid file format in [%s], line %d\n' % (filename,l)) sys.stderr.write('Invalid file format in [%s], line %d\n' % (filename,l))
continue continue
# put key:value in cache, key without ^: # put key:value in cache, key without ^:
cache[get_key(fields[0][1:])]=fields[1].split('\n')[0] cache[get_key(fields[0][1:])]=fields[1].split(b'\n')[0]
f.close() f.close()
return cache return cache
def save_cache(filename,cache): def save_cache(filename,cache):
f=open(filename,'w+') f=open(filename,'wb')
map(lambda x: f.write(':%s %s\n' % (str(x),str(cache.get(x)))),cache.keys()) for key, value in cache.items():
if not isinstance(key, bytes):
key = str(key).encode('utf8')
if not isinstance(value, bytes):
value = str(value).encode('utf8')
f.write(b':%s %s\n' % (key, value))
f.close() f.close()
def get_git_sha1(name,type='heads'): def get_git_sha1(name,type='heads'):
try: try:
# use git-rev-parse to support packed refs # use git-rev-parse to support packed refs
ref="refs/%s/%s" % (type,name) ref="refs/%s/%s" % (type,name.decode('utf8'))
l=subprocess.check_output(["git", "rev-parse", "--verify", "--quiet", ref]) l=subprocess.check_output(["git", "rev-parse", "--verify",
"--quiet", ref.encode('utf8')])
if l == None or len(l) == 0: if l == None or len(l) == 0:
return None return None
return l[0:40] return l[0:40]

View File

@@ -15,9 +15,11 @@ class Filter:
raise ValueError("Unknown args: " + ','.join(args)) raise ValueError("Unknown args: " + ','.join(args))
def commit_message_filter(self, commit_data): def commit_message_filter(self, commit_data):
if not (self.skip_master and commit_data['branch'] == 'master'): if not (self.skip_master and commit_data['branch'] == b'master'):
if self.start: if self.start:
sep = ': ' if self.sameline else '\n' sep = b': ' if self.sameline else b'\n'
commit_data['desc'] = commit_data['branch'] + sep + commit_data['desc'] commit_data['desc'] = commit_data['branch'] + sep + commit_data['desc']
if self.end: if self.end:
commit_data['desc'] = commit_data['desc'] + '\n' + commit_data['branch'] commit_data['desc'] = (
commit_data['desc'] + b'\n' + commit_data['branch']
)

View File

@@ -8,4 +8,4 @@ class Filter():
def file_data_filter(self,file_data): def file_data_filter(self,file_data):
file_ctx = file_data['file_ctx'] file_ctx = file_data['file_ctx']
if not file_ctx.isbinary(): if not file_ctx.isbinary():
file_data['data'] = file_data['data'].replace('\r\n', '\n') file_data['data'] = file_data['data'].replace(b'\r\n', b'\n')

12
plugins/drop/README.md Normal file
View File

@@ -0,0 +1,12 @@
## Drop commits from output
To use the plugin, add the command line flag `--plugin drop=<spec>`.
The flag can be given multiple times to drop more than one commit.
The <spec> value can be either
- a comma-separated list of hg hashes in the full form (40
hexadecimal characters) to drop the corresponding changesets, or
- a regular expression pattern to drop all changesets with matching
descriptions.

61
plugins/drop/__init__.py Normal file
View File

@@ -0,0 +1,61 @@
from __future__ import print_function
import sys, re
def build_filter(args):
if re.match(r'([A-Fa-f0-9]{40}(,|$))+$', args):
return RevisionIdFilter(args.split(','))
else:
return DescriptionFilter(args)
def log(fmt, *args):
print(fmt % args, file=sys.stderr)
sys.stderr.flush()
class FilterBase(object):
def __init__(self):
self.remapped_parents = {}
def commit_message_filter(self, commit_data):
rev = commit_data['revision']
mapping = self.remapped_parents
parent_revs = [rp for p in commit_data['parents']
for rp in mapping.get(p, [p])]
commit_data['parents'] = parent_revs
if self.should_drop_commit(commit_data):
log('Dropping revision %i.', rev)
self.remapped_parents[rev] = parent_revs
# Head commits cannot be dropped because they have no
# children, so detach them to a separate branch.
commit_data['branch'] = b'dropped-hg-head'
commit_data['parents'] = []
def should_drop_commit(self, commit_data):
return False
class RevisionIdFilter(FilterBase):
def __init__(self, revision_hash_list):
super(RevisionIdFilter, self).__init__()
self.unwanted_hg_hashes = {h.encode('ascii', 'strict')
for h in revision_hash_list}
def should_drop_commit(self, commit_data):
return commit_data['hg_hash'] in self.unwanted_hg_hashes
class DescriptionFilter(FilterBase):
def __init__(self, pattern):
super(DescriptionFilter, self).__init__()
self.pattern = re.compile(pattern.encode('ascii', 'strict'))
def should_drop_commit(self, commit_data):
return self.pattern.match(commit_data['desc'])

View File

@@ -0,0 +1,13 @@
## Convert Head to Branch
`fast-export` can only handle one head per branch. This plugin makes it possible
to create a new branch from a head by specifying the new branch name and
the first divergent commit for that head.
Note: the hg hash must be in the full form, 40 hexadecimal characters.
Note: you must run `fast-export` with `--ignore-unnamed-heads` option,
otherwise, the conversion will fail.
To use the plugin, add the command line flag `--plugin head2branch=name,<hg_hash>`.
The flag can be given multiple times to name more than one head.

View File

@@ -0,0 +1,24 @@
import sys
def build_filter(args):
return Filter(args)
class Filter:
def __init__(self, args):
args = args.split(',')
self.branch_name = args[0].encode('ascii', 'replace')
self.starting_commit_hash = args[1].encode('ascii', 'strict')
self.branch_parents = set()
def commit_message_filter(self, commit_data):
hg_hash = commit_data['hg_hash']
rev = commit_data['revision']
rev_parents = commit_data['parents']
if (hg_hash == self.starting_commit_hash
or any(rp in self.branch_parents for rp in rev_parents)
):
self.branch_parents.add(rev)
commit_data['branch'] = self.branch_name
sys.stderr.write('\nchanging r%s to branch %r\n' % (rev, self.branch_name))
sys.stderr.flush()

View File

@@ -7,9 +7,11 @@ def build_filter(args):
class Filter: class Filter:
def __init__(self, args): def __init__(self, args):
if not isinstance(args, bytes):
args = args.encode('utf8')
self.prefix = args self.prefix = args
def commit_message_filter(self, commit_data): def commit_message_filter(self, commit_data):
for match in re.findall('#[1-9][0-9]+', commit_data['desc']): for match in re.findall(b'#[1-9][0-9]+', commit_data['desc']):
commit_data['desc'] = commit_data['desc'].replace( commit_data['desc'] = commit_data['desc'].replace(
match, '#%s%s' % (self.prefix, match[1:])) match, b'#%s%s' % (self.prefix, match[1:]))

View File

@@ -4,13 +4,13 @@ def build_filter(args):
class Filter: class Filter:
def __init__(self, args): def __init__(self, args):
if args == '': if args == '':
message = '<empty commit message>' message = b'<empty commit message>'
else: else:
message = args message = args.encode('utf8')
self.message = message self.message = message
def commit_message_filter(self,commit_data): def commit_message_filter(self,commit_data):
# Only write the commit message if the recorded commit # Only write the commit message if the recorded commit
# message is null. # message is null.
if commit_data['desc'] == '\x00': if commit_data['desc'] == b'\x00':
commit_data['desc'] = self.message commit_data['desc'] = self.message

0
tests/__init__.py Normal file
View File

223
tests/test_drop_plugin.py Normal file
View File

@@ -0,0 +1,223 @@
import sys, os, subprocess
from tempfile import TemporaryDirectory
from unittest import TestCase
from pathlib import Path
class CommitDropTest(TestCase):
def test_drop_single_commit_by_hash(self):
hash1 = self.create_commit('commit 1')
self.create_commit('commit 2')
self.drop(hash1)
self.assertEqual(['commit 2'], self.git.log())
def test_drop_commits_by_desc(self):
self.create_commit('commit 1 is good')
self.create_commit('commit 2 is bad')
self.create_commit('commit 3 is good')
self.create_commit('commit 4 is bad')
self.drop('.*bad')
expected = ['commit 1 is good', 'commit 3 is good']
self.assertEqual(expected, self.git.log())
def test_drop_sequential_commits_in_single_plugin_instance(self):
self.create_commit('commit 1')
hash2 = self.create_commit('commit 2')
hash3 = self.create_commit('commit 3')
hash4 = self.create_commit('commit 4')
self.create_commit('commit 5')
self.drop(','.join((hash2, hash3, hash4)))
expected = ['commit 1', 'commit 5']
self.assertEqual(expected, self.git.log())
def test_drop_sequential_commits_in_multiple_plugin_instances(self):
self.create_commit('commit 1')
hash2 = self.create_commit('commit 2')
hash3 = self.create_commit('commit 3')
hash4 = self.create_commit('commit 4')
self.create_commit('commit 5')
self.drop(hash2, hash3, hash4)
expected = ['commit 1', 'commit 5']
self.assertEqual(expected, self.git.log())
def test_drop_nonsequential_commits(self):
self.create_commit('commit 1')
hash2 = self.create_commit('commit 2')
self.create_commit('commit 3')
hash4 = self.create_commit('commit 4')
self.drop(','.join((hash2, hash4)))
expected = ['commit 1', 'commit 3']
self.assertEqual(expected, self.git.log())
def test_drop_head(self):
self.create_commit('first')
self.create_commit('middle')
hash_last = self.create_commit('last')
self.drop(hash_last)
self.assertEqual(['first', 'middle'], self.git.log())
def test_drop_merge_commit(self):
initial_hash = self.create_commit('initial')
self.create_commit('branch A')
self.hg.checkout(initial_hash)
self.create_commit('branch B')
self.hg.merge()
merge_hash = self.create_commit('merge to drop')
self.create_commit('last')
self.drop(merge_hash)
expected_commits = ['initial', 'branch A', 'branch B', 'last']
self.assertEqual(expected_commits, self.git.log())
self.assertEqual(['branch B', 'branch A'], self.git_parents('last'))
def test_drop_different_commits_in_multiple_plugin_instances(self):
self.create_commit('good commit')
bad_hash = self.create_commit('bad commit')
self.create_commit('awful commit')
self.create_commit('another good commit')
self.drop('^awful.*', bad_hash)
expected = ['good commit', 'another good commit']
self.assertEqual(expected, self.git.log())
def test_drop_same_commit_in_multiple_plugin_instances(self):
self.create_commit('good commit')
bad_hash = self.create_commit('bad commit')
self.create_commit('another good commit')
self.drop('^bad.*', bad_hash)
expected = ['good commit', 'another good commit']
self.assertEqual(expected, self.git.log())
def setUp(self):
self.tempdir = TemporaryDirectory()
self.hg = HgDriver(Path(self.tempdir.name) / 'hgrepo')
self.hg.init()
self.git = GitDriver(Path(self.tempdir.name) / 'gitrepo')
self.git.init()
self.export = ExportDriver(self.hg.repodir, self.git.repodir)
def tearDown(self):
self.tempdir.cleanup()
def create_commit(self, message):
self.write_file_data('Data for %r.' % message)
return self.hg.commit(message)
def write_file_data(self, data, filename='test_file.txt'):
path = self.hg.repodir / filename
with path.open('w') as f:
print(data, file=f)
def drop(self, *spec):
self.export.run_with_drop(*spec)
def git_parents(self, message):
matches = self.git.grep_log(message)
if len(matches) != 1:
raise Exception('No unique commit with message %r.' % message)
subject, parents = self.git.details(matches[0])
return [self.git.details(p)[0] for p in parents]
class ExportDriver:
def __init__(self, sourcedir, targetdir, *, quiet=True):
self.sourcedir = Path(sourcedir)
self.targetdir = Path(targetdir)
self.quiet = quiet
self.python_executable = str(
Path.cwd() / os.environ.get('PYTHON', sys.executable))
self.script = Path(__file__).parent / '../hg-fast-export.sh'
def run_with_drop(self, *plugin_args):
cmd = [self.script, '-r', str(self.sourcedir)]
for arg in plugin_args:
cmd.extend(['--plugin', 'drop=' + arg])
output = subprocess.DEVNULL if self.quiet else None
subprocess.run(cmd, check=True, cwd=str(self.targetdir),
env={'PYTHON': self.python_executable},
stdout=output, stderr=output)
class HgDriver:
def __init__(self, repodir):
self.repodir = Path(repodir)
def init(self):
self.repodir.mkdir()
self.run_command('init')
def commit(self, message):
self.run_command('commit', '-A', '-m', message)
return self.run_command('id', '--id', '--debug').strip()
def log(self):
output = self.run_command('log', '-T', '{desc}\n')
commits = output.strip().splitlines()
commits.reverse()
return commits
def checkout(self, rev):
self.run_command('checkout', '-r', rev)
def merge(self):
self.run_command('merge', '--tool', ':local')
def run_command(self, *args):
p = subprocess.run(('hg', '-yq') + args,
cwd=str(self.repodir),
check=True,
text=True,
capture_output=True)
return p.stdout
class GitDriver:
def __init__(self, repodir):
self.repodir = Path(repodir)
def init(self):
self.repodir.mkdir()
self.run_command('init')
def log(self):
output = self.run_command('log', '--format=%s', '--reverse')
return output.strip().splitlines()
def grep_log(self, pattern):
output = self.run_command('log', '--format=%H',
'-F', '--grep', pattern)
return output.strip().splitlines()
def details(self, commit_hash):
fmt = '%s%n%P'
output = self.run_command('show', '-s', '--format=' + fmt,
commit_hash)
subject, parents = output.splitlines()
return subject, parents.split()
def run_command(self, *args):
p = subprocess.run(('git', '--no-pager') + args,
cwd=str(self.repodir),
check=True,
text=True,
capture_output=True)
return p.stdout